2. Simulation studies: OBP vs EBLUP
Jiming Jiang, Thuan Nguyen and J. Sunil Rao
Previous | Next
2.1 A
demonstration
We
first use a simple simulated example to demonstrate the potential impact of
model misspecification in terms of the design-based predictive performance of
the OBP and the EBLUP. Consider a case where a single covariate, is thought to be
linearly associated with the response through the
following NER model:
(so we have in this case),
where is an unknown
coefficient, and are the same as
in (1.1). Thus, in particular, there is a belief that the mean response should
be zero when the value of the covariate is zero.
We
consider three different sample sizes: or in conjunction with two different true values of or , where is defined
below. Thus, there are six cases, each being a combination of the sample size
and value. In each case,
an subpopulation is
generated from the normal distribution with mean equal to 1 and standard
deviation equal to The subpopulation is
then generated from the following super-population heteroscedastic NER model:
(so the subpopulation size is where is generated
from the normal distribution with mean 0 and standard deviation is generated
from the normal distribution with mean 0 and standard deviation where are generated
independently from the Uniform distribution (so
that range for is approximately
from 0.22 to 0.39); and the and are generated
independently. It is seen that the assumed NER model is misspecified in terms
of both the mean and the variance functions. Once the and subpopulations
are generated, they are fixed throughout the simulations.
In
each simulation, we draw a simple random sample of size 5 from that determines
the samples and for each This is repeated
for simulation runs.
We make same-data comparisons of the OBP and EBLUP, with the ML estimator of for the latter,
in terms of both the overall and area-specific MSPEs. The overall MSPE is
defined as where is the vector of
true small area means with and is the vector of
predicted values (either by OBP or by EBLUP). Note that the same measure has
been used in Jiang et al.
(2011). Table 2.1 reports the overall MSPE results, where the MSPE is evaluated
empirically by and and are the and in the simulation run,
respectively. It is seen that the percentage increase in the overall MSPE of
the EBLUP over the OBP ranges between around 20% to almost 1,000%, depending on
the sample size and value of The patterns
shown here are consistent with those in Jiang et al. (2011) under the Fay-Herriot model, where
model-based predictive performances are evaluated. However, the gain by the OBP
is much more significant, for and than those
reported in Jiang et al.
(2011).
Table 2.1
Overall empirical MSPE (% Increase is EBLUP over OBP)
Table summary
This table displays the results of Overall empirical MSPE (% Increase is EBLUP over OBP). The information is grouped by
(appearing as row headers), , OBP, EBLUP and % Increase (appearing as column headers).
|
|
OBP |
EBLUP |
% Increase |
50 |
0.5 |
0.130 |
0.161 |
24 |
50 |
1.0 |
0.503 |
0.598 |
19 |
100 |
0.5 |
0.076 |
0.277 |
264 |
100 |
1.0 |
0.396 |
1.077 |
172 |
400 |
0.5 |
0.096 |
0.965 |
905 |
400 |
1.0 |
0.393 |
4.046 |
930 |
As
for the area-specific MSPEs, following Jiang et al. (2011), we use boxplots to exhibit the distributions
of the area-specific MSPEs associated with both methods. See Figure 2.1. The
plots reveal details not shown by the overall MSPEs. For example, it might be
wondered whether the percentage increase by the EBLUP in the overall MSPE is
simply due to the increased number of areas adding together. A simple
calculation suggests that this may not be true, for example, is only A more explicit
explanation is given in Figure 2.1. For example, comparing the case of with that of it is seen that
while there is a considerable overlap between the boxplots of OBP and EBLUP in
the former case, the boxplots are completely separated in the latter case; in
other words, the largest area-specific MSPE of the OBP is smaller than the
smallest area-specific MSPE of the EBLUP. This pattern cannot be simply
credited to adding or duplicating the areas. In fact, in the latter case, the
OBP is doing much better than the EBLUP not just overall, but also for every
one of the 400 small areas. This is clearly something never reported before.
For example, in the first simulated example of Jiang et al. (2011), the authors found that the OBP has smaller
MSPE compared to the EBLUP for half of the small areas while the EBLUP has
smaller MSPE compared to the OBP for the other half; similar patterns were
found in the second simulated examples in Jiang et al. (2011).
The
estimation of the area-specific MSPEs of the OBP is considered in Section 3.
Figure 2.1 Area-specific Empirical MSPEs (Boxplots). Upper Left:
Upper Right:
Middle Left:
Lower Left:
Lower Right:
Description for Figure 2.1
2.2 Further considerations
The
situation considered in Subseciton 2.1 might be a little extreme (and this is
why we call it a "theoretical demonstration�). In practice, the assumed model
may not be completely wrong, or may be close to be correct. In this subsection
we first consider a case where the assumed model is "partially correct�.
Namely, the slope in (2.1) is nonzero (so the assumed model is correct in this
regard); the intercept is nonzero, but its value is much smaller compared to
those considered in Subsection 2.1 (so the assumed model is wrong, but not
"terribly wrong�). More specifically, the true underlying model is
as opposed to (2.2), where the are generated
independently from the normal distribution with mean 0 and standard deviation
0.1; and are generated
from the heteroscedastic normal distribution as in Subseciton 2.1. In addition
to the overall MSPE, we also report contribution to the MSPE due to "bias� and
"variance�. Let and be based on the simulated data
set, We define the
empirical bias and variance for the small area as and respectively.
Let denote the
empirical MSPE for the small area. It
is easy to show that the overall empirical MSPE is
Thus, the bias and variance contribution to the overall MSPE are defined
as and respectively.
Results based on simulation
runs are presented in Table 2.2. As we can see, for the smaller OBP performs
(slightly) worse than the EBLUP, but for the larger and OBP performs
(slightly) better, and its advantage increases with As for the bias,
variance contribution, OBP seems to have smaller bias, and smaller variance for
larger
Table 2.2
Overall Empirical MSPE (bias, variance contribution): Assumed model is partially correct; % Increase is MSPE of EBLUP over MSPE of OBP (negative number indicates decrease)
Table summary
This table displays the results of Overall Empirical MSPE (bias. The information is grouped by
(appearing as row headers), OBP, EBLUP and % Increase (appearing as column headers).
|
OBP |
EBLUP |
% Increase |
50 |
0.421 (0.224, 0.197) |
0.405 (0.238, 0.167) |
-4.0 |
100 |
0.733 (0.448, 0.285) |
0.748 (0.457, 0.291) |
2.1 |
400 |
2.745 (1.847, 0.899) |
2.848 (1.878, 0.971) |
3.8 |
Next,
we consider a case where the assumed model is actually correct. Namely, the
true underlying model is (2.3) with the errors are
homoscedastic with variance equal to 0.1, and everything else is the same as
the case considered above. Results based on simulation
runs are presented in Table 2.3. This time, we see that the EBLUP performs
slightly better than OBP under different but the
difference is diminishing as the sample size increases. As for the bias,
variance contribution, EBLUP seems to have smaller variance, and smaller bias
for larger but its
advantages in both bias and variance shrink as increases.
Table 2.3
Overall Empirical MSPE (bias, variance contribution): Assumed model is correct; % Increase is MSPE of EBLUP over MSPE of OBP (negative number indicates decrease)
Table summary
This table displays the results of Overall Empirical MSPE (bias. The information is grouped by
(appearing as row headers), OBP, EBLUP and % Increase (appearing as column headers).
|
OBP |
EBLUP |
% Increase |
50 |
0.335 (0.204, 0.131) |
0.330 (0.205, 0.125) |
-1.4 |
100 |
0.749 (0.457, 0.292) |
0.746 (0.456, 0.290) |
-0.4 |
400 |
2.796 (1.800, 0.997) |
2.794 (1.799, 0.996) |
-0.1 |
In
summary, the simulation results suggest that, when the assumed model is
slightly misspecified, OBP may not outperform EBLUP when the number of
small areas, is relatively small; however, OBP is expected to outperform EBLUP
when is relatively
large, and the advantage of OBP over EBLUP increases with (recall the
definition of the overall MSPE). On the other hand, when the assumed model is
correct, EBLUP is expected to perform better than OBP, although the difference
may be ignorable; and the advantage of EBLUP over OBP is disappearing as increases. These
findings, along with those in Subsection 2.1, are very much in line with those
of Jiang et al. (2011; Section
4) under the Fay-Herriot model.
Previous | Next