Bayesian benchmarking of the Fay-Herriot model using random deletion
Section 4. Empirical studies

Table of contents

The purposes of these empirical studies are twofold. First, it is demonstrated that the BFH model can be fit as stated in Section 2 and the deleting the last one benchmarking and random benchmarking methods are performed. Second, the benchmarking methods are compared in a simulation study that uses a well-used dataset in the small area literature.

In the data generation process, we use the data on corn and soybean acres in Battese, Harter and Fuller (1988), available for 12 counties (areas) in Iowa. The resulting county-level corn and soybean acreages are constructed using a number of segments sampled from the population (known number of segments). Landsat satellite data on the number of pixels of corn and soybean in the sampled segments (i.e., two covariates) are also available. The finite population means of the number of pixels classified as corn and soybean for each county are also reported. Starting with this dataset, we construct new datasets with any number of areas.

The data generation process has two steps. In the first step, the unit-level model $y_{i j} = x_{i j}^{'} β + e_{i j},$ $i = 1, \dots, l, j = 1, \dots, n_{i},$ where $e_{i j} \overset{iid}{\sim} (0, σ^{2}),$ is fit to the data available for the $l = 12$ counties in Iowa. The area sample sizes are $n_{1} = n_{2} = n_{3} = 1,$ $n_{4} = 2, n_{5} = n_{6} = n_{7} = n_{8} = 3,$ $n_{9} = 4, n_{10} = n_{11} = 5,$ and $n_{12} = 6.$ Using least squares, we estimate $β$ and $σ^{2}$ by $\hat{β}$ and ${\hat{σ}}^{2},$ respectively. For the areas with sample size greater than one, we set $s_{i}^{2}$ equal to the estimated variance of the sample mean ${\bar{y}}_{i} ({\bar{y}}_{i} = \sum_{j = 1}^{n_{i}} y_{i j} / n_{i})$ and we let $S^{2}$ be their geometric mean. For the areas with sample size equal to one, we set $s_{i}^{2}$ equal to $S^{2} .$ The vector of covariates ${\bar{X}}_{i}$ has three elements, the integer one (for the intercept), followed by the population means of pixels classified as corn and soybean.

In the second step, the data generation process for any desired number $l$ of small areas is illustrated. The covariates $x_{i}, i = 1, \dots, l,$ are sampled with replacement from ${\bar{X}}_{i}, i = 1, \dots, 12.$ Then, the area-level means are drawn using

$θ_{i} \overset{ind}{\sim} Normal (x_{i}^{'} \hat{β}, {\hat{σ}}^{2}), i = 1, \dots, l,$

where $\hat{β}$ and ${\hat{σ}}^{2}$ are the least squares estimates defined above. The sample variances $s_{i}^{2}$ are generated in two steps. First, the sample sizes are drawn from a uniform distribution, $n_{i} \overset{iid}{\sim} Uniform (5, 25),$ $i =1, \dots, l .$ Second, let $s_{i}^{2} = S^{2} V_{i} / (n_{i} - 1),$ where $V_{i} \overset{ind}{\sim} χ_{n_{i} - 1}^{2}$ and $S^{2}$ defined above. Finally, the small area survey estimates are drawn using ${\hat{θ}}_{i} \overset{ind}{\sim} Normal (θ_{i}, s_{i}^{2}), i =1, \dots, l .$ The benchmarking target is set equal to the sum of the ${\hat{θ}}_{i}$ and variants of this value, $\sum_{i =1}^{l} {\hat{θ}}_{i}$ scaled up or down by 50%. In NASS’s practice, for crop county estimates, this target is an already set state value. To evaluate the benchmarking methods in extreme cases, we consider additional simulation scenarios, where an area sample size is set to 2 or 50, or where the factor $S^{2}$ is multiplied by ten.

In what follows, we report empirical results mostly for a simulation scenario using 12 areas. Examples using larger number of areas are briefly discussed. For example, Iowa has 99 counties, and one of NASS’s interests is in benchmarking county estimates for planted acres, harvested acres and production (bushels) to the predefined state-level total. For such small numbers of areas, no adjustment is needed to the benchmarking procedures, deleting the last one or random deletion, introduced in the previous sections. However, the computation may be intolerable for an extremely large number of areas (say, one million), and some adjustments would be needed to the current procedures.

It is pertinent to discuss the computations for the simulation scenario with 12 areas. For posterior inference under the BFH model, we have used 1,000 random draws, and this runs in just a few seconds. On the other hand, it is more difficult to run a Gibbs sampler for deleting one at a time or random deletion benchmarking. However, we have provided an efficient Gibbs sampler as follows. We used a long run of 20,000 iterations, with a “burn in” of the first 10,000 iterations, choosing every tenth iterate thereafter. This was obtained by trial and error that is gauged by the autocorrelations, the Geweke test for stationarity and the effective sample sizes. For the 1,000 selected iterations, the autocorrelations are all negligible. For random deletion benchmarking, the p-values of the Geweke test for the three regression coefficients and $δ^{2}$ are, respectively, 0.651, 0.087, 0.828 and 0.699 (i.e., stationarity is not rejected), and the effective sample sizes are all 1,000. Also, the trace plots show no evidence of nonstationarity. Therefore, the Gibbs sampler is efficient, taking a few seconds despite the large number of runs.

The performance of benchmarking methods is assessed using a set of metrics that include posterior means (PM) and posterior standard deviations (PSD), and when it is convenient, posterior coefficients of variation (PCV), numerical standard errors (NSE) of the estimates and 95% highest posterior density intervals (95% HPD). Numerical results are presented in Tables 4.1-4.8.

A summarized version of the basic results is presented in Table 4.1, and serves for comparison of the average, standard error and coefficient of variation of the observed data with the PMs, PSDs, PCVs from the BFH model, benchmarking (deleting the last one, LO) model and random benchmarking (RD) model. The results in Table 4.1 apply to two simulation scenarios, where $S^{2} =163,$ small variation in the observed data, and where $S^{2} = 1,630,$ relatively larger variation in the observed data. When $S^{2} =163,$ there are very little differences between the observed data and the posterior quantities from the BFH, LO and RD models. Given the small coefficients of variation for the survey estimates, it is difficult for any model to further reduce variability. Hence, the PCVs are comparable to the CVs of the survey estimates. On the other hand, three interesting points can be made for the scenario where $S^{2} = 1,630 .$ First, the PMs under the BFH model can be very different from those of LO and RD models and these latter two PMs are very close. Second, the PSDs are much smaller than the standard errors of the observed data; there are substantial gains in precision under the BFH model. However, the PSDs are about four to five times smaller than those for the observed data and the PSDs under the LO and RD model are about twice those of the BFH model. The PCVs follow the same pattern. Third, LO and RD are very close in all three measures (PMs, PSDs, PCVs) with RD model having just slightly smaller PSDs. As expected, there is small difference between the LO model and the RD model. But one must also observe that benchmarking the BFH model is important because we can get answers that are different from the BFH model at least in terms of posterior standard deviations and coefficients of variation. Benchmarking is a jittering procedure, which helps to protect the model from misspecification, and therefore it must lead to increased variability in the small area estimates.

Table 4.1
Comparison of BFH model with no benchmarking, deleting the last one benchmarking and random benchmarking via posterior mean (PM), posterior standard deviation (PSD) and posterior coefficient of variation (PCV) for two values of $S^{2}$
Table summary
This table displays the results of Comparison of BFH model with no benchmarking A, PM, PSD and PCV (appearing as column headers).
	A	OB	BFH	LO	RD	OB	BFH	LO	RD	OB	BFH	LO	RD
	A	PM				PSD				PCV
a. $S^{2} = 163; a = 1,435$	1	135.6	134.0	133.8	133.5	6.03	5.62	5.47	5.41	0.044	0.042	0.041	0.041
	2	102.0	103.5	103.1	103.0	7.10	6.50	6.11	5.82	0.070	0.063	0.059	0.057
	3	117.7	121.0	120.7	120.5	7.31	6.72	6.55	6.25	0.062	0.056	0.054	0.052
	4	77.0	81.5	81.4	81.0	5.88	6.00	5.46	5.53	0.076	0.074	0.067	0.068
	5	126.9	127.8	127.5	127.5	5.63	5.25	5.25	5.06	0.044	0.041	0.041	0.040
	6	113.1	113.4	112.9	113.1	8.06	7.15	6.82	6.74	0.071	0.063	0.060	0.060
	7	137.2	133.7	133.5	133.9	6.74	6.38	5.93	6.02	0.049	0.048	0.044	0.045
	8	124.8	124.7	124.7	124.7	4.03	3.91	3.83	3.76	0.032	0.031	0.031	0.030
	9	118.3	116.5	115.8	116.6	7.54	6.79	6.29	6.65	0.064	0.058	0.054	0.057
	10	156.5	153.4	153.3	153.3	4.37	4.45	4.12	4.18	0.028	0.029	0.027	0.027
	11	109.5	110.3	110.3	110.2	4.88	4.64	4.70	4.70	0.045	0.042	0.043	0.043
	12	116.3	118.1	117.9	117.7	7.23	6.62	6.26	6.00	0.062	0.056	0.053	0.051
b. $S^{2} = 1,630; a = 1,482$	1	129.1	129.8	127.2	126.5	19.07	4.64	10.71	10.45	0.148	0.036	0.084	0.083
	2	117.3	126.3	122.1	122.1	22.46	5.08	12.73	12.51	0.191	0.040	0.104	0.102
	3	120.0	145.5	137.3	136.9	23.11	5.93	12.91	12.68	0.193	0.041	0.094	0.093
	4	68.8	107.3	94.0	93.6	18.60	7.47	12.04	11.86	0.270	0.070	0.128	0.127
	5	142.4	146.4	142.3	142.2	17.80	4.52	11.98	11.15	0.125	0.031	0.084	0.078
	6	108.8	120.2	115.2	115.4	25.49	5.43	11.75	11.66	0.234	0.045	0.102	0.101
	7	136.8	116.2	118.2	119.0	21.31	5.37	11.32	11.90	0.156	0.046	0.096	0.100
	8	124.5	132.5	127.3	127.3	12.76	4.39	9.00	8.91	0.102	0.033	0.071	0.070
	9	144.2	127.5	128.0	129.5	23.86	5.33	12.74	14.00	0.165	0.042	0.100	0.108
	10	172.9	129.2	145.5	145.3	13.81	9.23	10.28	10.37	0.080	0.071	0.071	0.071
	11	109.1	114.7	110.6	110.2	15.42	4.31	10.53	10.43	0.141	0.038	0.095	0.095
	12	108.4	120.3	114.6	114.2	22.87	5.10	12.42	12.01	0.211	0.042	0.108	0.105
Note: OB: observed data; BFH: Bayesian Fay-Herriot model; LO: benchmarking (deleting the last one) model; RD: random benchmarking model; a is the target. For OB, the direct estimate, standard error and coefficient of variation are presented under PM, PSD and PCV, respectively. Under the DGSM benchmarking procedure, at $S^{2} = 163,$ the benchmarking values are 133.7, 103.3, 120.8, 81.3, 127.6, 113.2, 133.4, 124.5, 116.3, 153.1, 110.1, 117.9, and at $S^{2} = 1,630,$ the benchmarking values are 126.9, 123.5, 142.3, 105.0, 143.2, 117.5, 113.6, 129.5, 124.7, 126.4, 112.1, 117.6.

Under the basic simulation scenario, we compare the deletion benchmarking methods to one of the methods in DGSM that provides benchmarked posterior estimates without deletion. To match the notation in DGSM, the benchmarking equation must be rewritten as

$\sum_{i =1}^{l} ω_{i} θ_{i} = \frac{a}{l} = t,$

where $ω_{i} = 1 / l, \sum_{i =1}^{l} ω_{i} =1.$ Let ${\hat{θ}}_{i}^{(B)}$ denote the posterior means from the BFH model. Now, define ${\bar{\hat{θ}}}_{B} = \sum_{i =1}^{l} ω_{i} {\hat{θ}}_{i}^{(B)},$

$ϕ_{i} = \frac{ω_{i}}{{\hat{θ}}_{i}^{(B)}}, r_{i} = \frac{ω_{i}}{ϕ_{i}}, i =1, \dots, l,$

and $S^{*} = \sum_{i =1}^{l} ω_{i}^{2} / ϕ_{i} .$ Note that among the several specifications in DGSM, we have selected $ϕ_{i}$ at random (no preference). Then, the benchmarked Bayes estimators of DGSM are

${\hat{θ}}_{i}^{(B M)} = {\hat{θ}}_{i}^{(B)} + (t - {\bar{\hat{θ}}}_{B}) r_{i} / S^{*} , i =1, \dots, l .$

Empirical results using the estimator ${\hat{θ}}_{i}^{(B M)}$ are presented in the note to Table 4.1. The largest difference between the benchmarked estimates under different benchmarking methods is for area 10 ( OB: 172.9; BFH: 129.2; LO: 145.5; RD: 145.3; DGSM: 126.4). In general, the PMs from LO and RD are closer to OB (observed data). Otherwise, these estimates compare reasonably well with the LO benchmarking and RD deletion although there are some small differences; DGSM does not provide posterior standard deviations and credible intervals.

More detailed results for $S^{2} =163,$ are presented in Tables 4.2-4.8 and in Figures 4.1-4.4. Our interest is mainly to compare deletion of a single area (e.g., LO) and RD.

Using the results in Table 4.2, we conclude that the PMs from the BFH model (without benchmarking) are slightly different from the direct estimates, and as expected, larger than the smaller direct estimates and smaller than the larger ones. Except for two areas, as expected, the PSDs are smaller than the direct standard deviations. For example, the smallest direct estimate (76.997) has the largest shrinkage with a larger standard deviation (5.881 vs. 5.995); the results are consistent with the standard shrinkage that occurs in small area estimation. We note that the PCVs are all small and the NSEs are reasonably small, too.

Table 4.2
Comparison of the direct estimator with posterior inference from the Bayesian Fay-Herriot model for the area parameters
Table summary
This table displays the results of Comparison of the direct estimator with posterior inference from the Bayesian Fay-Herriot model for the area parameters. The information is grouped by Area (appearing as row headers), $n$ , $\hat{θ}$ , $s$ , PM, PSD, PCV, NSE and 95% HPD (appearing as column headers).
Area	$n$	$\hat{θ}$	$s$	PM	PSD	PCV	NSE	95% HPD
1	5	135.575	6.031	133.985	5.617	0.042	0.057	(123.422, 145.402)
2	7	101.980	7.101	103.461	6.498	0.063	0.065	(90.598, 116.134)
3	24	117.655	7.309	121.006	6.716	0.056	0.066	(107.730, 134.124)
4	23	76.997	5.881	81.473	5.995	0.074	0.058	(69.046, 92.578)
5	21	126.917	5.629	127.832	5.248	0.041	0.052	(117.850, 138.406)
6	9	113.132	8.061	113.393	7.147	0.063	0.068	(99.441, 127.451)
7	5	137.236	6.739	133.661	6.378	0.048	0.064	(121.771, 146.662)
8	20	124.839	4.034	124.732	3.906	0.031	0.039	(117.233, 132.309)
9	16	118.306	7.544	116.479	6.785	0.058	0.071	(103.225, 130.003)
10	9	156.503	4.368	153.355	4.449	0.029	0.045	(144.785, 162.031)
11	23	109.546	4.877	110.348	4.637	0.042	0.047	(101.179, 119.294)
12	9	116.314	7.232	118.098	6.623	0.056	0.068	(105.135, 131.186)
Note: $n$ is the area sample size, $\hat{θ}$ is the direct estimator and $s$ its standard error. PM is the posterior mean, PSD is the posterior standard deviation and HPD is highest posterior density interval. NSE is the numerical standard errors of the posterior means. The benchmarking value is 1,435 and the sum of the posterior mean is 1,437.823 (not benchmarked).

The estimates from the BFH model with deleting the last area and with random deletion under a uniform prior (equal weights) are presented in Tables 4.3 and 4.4. The posterior weights barely differ from 0.083 with the largest one (0.097) of the last area and smallest one (0.056) of the $8^{th}$ area. Both random deletion and deleting the last one provide improved precision, as the PSDs of the benchmarked estimates are all smaller than the observed standard errors, for both benchmarking methods. The NSEs are larger than for no benchmarking, but this barely matters as these are errors of the PMs (the characteristic of the PM has three digits).

Table 4.3
Comparison of the direct estimator with posterior inference from the Bayesian Fay-Herriot model for the area parameters under random deletion benchmarking
Table summary
This table displays the results of Comparison of the direct estimator with posterior inference from the Bayesian Fay-Herriot model for the area parameters under random deletion benchmarking. The information is grouped by Area (appearing as row headers), $n$ , $\hat{θ}$ , $s$ , PM, PSD, PCV, NSE and 95% HPD (appearing as column headers).
Area	$n$	$\hat{θ}$	$s$	PM	PSD	PCV	NSE	95% HPD
1	5	135.575	6.031	133.516	5.431	0.041	0.171	(123.414, 143.541)
2	7	101.980	7.101	102.903	5.793	0.056	0.199	(92.378, 114.250)
3	24	117.655	7.309	120.671	6.237	0.052	0.194	(107.744, 132.190)
4	23	76.997	5.881	81.170	5.597	0.069	0.202	(69.781, 91.177)
5	21	126.917	5.629	127.652	5.036	0.039	0.170	(118.293, 137.228)
6	9	113.132	8.061	112.805	6.707	0.059	0.223	(100.926, 126.074)
7	5	137.236	6.739	133.908	6.007	0.045	0.177	(122.135, 145.344)
8	20	124.839	4.034	124.703	3.757	0.030	0.120	(117.962, 132.304)
9	16	118.306	7.544	116.451	6.650	0.057	0.249	(103.400, 129.316)
10	9	156.503	4.368	153.222	4.216	0.028	0.134	(144.392, 160.854)
11	23	109.546	4.877	110.221	4.694	0.043	0.150	(101.038, 119.570)
12	9	116.314	7.232	117.780	5.997	0.051	0.208	(104.619, 128.158)
Note: $n$ is the area sample size, $\hat{θ}$ is the direct estimator and $s$ its standard error. PM is the posterior mean, PSD is the posterior standard deviation and HPD is highest posterior density interval. NSE is the numerical standard errors of the posterior means. The benchmarking value is 1,435. Under a uniform prior (equal weights) the posterior probabilities that the areas 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 are deleted are respectively 0.090, 0.084, 0.095, 0.077, 0.066, 0.093, 0.097, 0.056, 0.098, 0.068, 0.079, 0.097.

Table 4.4
Comparison of the direct estimator with posterior inference from the Bayesian Fay-Herriot model for the area parameters under deleting the last area
Table summary
This table displays the results of Comparison of the direct estimator with posterior inference from the Bayesian Fay-Herriot model for the area parameters under deleting the last area. The information is grouped by Area (appearing as row headers), $n$ , $\hat{θ}$ , $s$ , PM, PSD, PCV, NSE and 95% HPD (appearing as column headers).
Area	$n$	$\hat{θ}$	$s$	PM	PSD	PCV	NSE	95% HPD
1	5	135.575	6.031	133.772	5.519	0.041	0.151	(122.213, 143.991)
2	7	101.980	7.101	103.026	6.319	0.061	0.171	(89.424, 113.857)
3	24	117.655	7.309	120.470	6.458	0.054	0.209	(108.783, 134.261)
4	23	76.997	5.881	81.391	5.906	0.073	0.171	(69.636, 92.634)
5	21	126.917	5.629	127.883	5.158	0.040	0.142	(117.282, 137.305)
6	9	113.132	8.061	112.895	6.270	0.056	0.216	(100.664, 124.320)
7	5	137.236	6.739	133.298	5.948	0.045	0.178	(121.831, 144.727)
8	20	124.839	4.034	124.664	3.810	0.031	0.124	(117.321, 131.941)
9	16	118.306	7.544	116.542	6.531	0.056	0.203	(104.238, 129.622)
10	9	156.503	4.368	153.229	4.353	0.028	0.132	(144.443, 161.593)
11	23	109.546	4.877	109.997	4.563	0.041	0.168	(101.428, 118.953)
12	9	116.314	7.232	117.835	6.344	0.054	0.215	(106.421, 131.483)
Note: $n$ is the area sample size, $\hat{θ}$ is the direct estimator and $s$ its standard error. PM is the posterior mean, PSD is the posterior standard deviation and HPD is highest posterior density interval. NSE is the numerical standard errors of the posterior means. The benchmarking value is 1,435.

The three methods (BFH, RD, LO) are compared using the results in Table 4.5. The PMs are comparable, so that benchmarking (RD, LO) does not distort (shrink) the estimates much beyond the shrinkage under the BFH model. Also, the PSDs under LO and RD are almost always smaller than those under the BFH model. For eight of the twelve areas, RD has smaller PSDs than LO; in these areas, RD shows roughly 1% decrease in PSD over LO and roughly 4% over the PSDs from BFH.

To investigate how sensitive the PSDs are to different benchmarking targets, we present results using three choices of targets in Table 4.6. The PSDs change only slightly over different targets and are still better than the standard errors of the direct estimates.

As part of designing a complex set of simulations, we consider using unequal probabilities (weights) in the random deletion benchmarking, and present results in Table 4.7. Uniform weights (EW) are compared to weights inversely proportional (IW) to the sample sizes and to weights directly proportional (DW) to the samples sizes. Again, small differences are present among the three PMs and among the three PSDs. The PSDs are still smaller than those of the direct estimates.

Using the results in Table 4.8, we study how extreme sample sizes in the last county (to be deleted) affect posterior inference. For this, we set the sample size of the last county to be outside the simulation range (5-25), at 2 and 50. First, consider the case in which the sample size of the last county is 2. Consistent with previous findings, there are minor differences of the PMs over no benchmarking, deleting the last one and random deletion for all counties. The PSDs for LO and RD are smaller than those of BFH with nine of these PSDs for RD smaller than LO. However, for the last county, we observe relatively large posterior standard deviations (10.00, 8.771, 8.525), roughly 15% decrease in PSD of RD over no benchmarking. Next, consider the case in which the sample size of the last county is 50. The patterns are similar, except the PSDs for the last county are comparable to the others under BFH, LO and RD and again there is an approximately 10% decrease (6.282, 5.958, 5.702) in PSD of RD over no benchmarking. It appears that deliberately putting the county with the most extreme sample size (small or large) as the last county can affect the benchmarking procedure. In contrast, minor changes are observed when the areas with extreme sample size are not systematically deleted. When the sample size is 2, the new PMs and PSDs are the following, BFH: 124.307, 9.993; LO: 123.371, 9.000 RD: 123.540, 8.887. When the sample size is 50, the new PMs and PSDs are the following, BFH: 118.167, 6.284; LO: 117.802, 6.094; RD: 117.716, 5.948.

Table 4.5
A summary of the comparison of inference from the direct estimator, the Bayesian Fay-Herriot (BFH) model, random deletion (RD) benchmarking and deleting the last one (LO)
Table summary
This table displays the results of A summary of the comparison of inference from the direct estimator. The information is grouped by Area (appearing as row headers), $n$ , $\hat{θ}$ , $s$ , BFH, RD and LO (appearing as column headers).
Area	$n$	$\hat{θ}$	$s$	BFH		RD		LO
Area	$n$	$\hat{θ}$	$s$	PM	PSD	PM	PSD	PM	PSD
1	5	135.575	6.031	133.985	5.617	133.516	5.431	133.772	5.519
2	7	101.980	7.101	103.461	6.498	102.903	5.793	103.026	6.319
3	24	117.655	7.309	121.006	6.716	120.671	6.237	120.470	6.458
4	23	76.997	5.881	81.473	5.995	81.170	5.597	81.391	5.906
5	21	126.917	5.629	127.832	5.248	127.652	5.036	127.883	5.158
6	9	113.132	8.061	113.393	7.147	112.805	6.707	112.895	6.270
7	5	137.236	6.739	133.661	6.378	133.908	6.007	133.298	5.948
8	20	124.839	4.034	124.732	3.906	124.703	3.757	124.664	3.810
9	16	118.306	7.544	116.479	6.785	116.451	6.650	116.542	6.531
10	9	156.503	4.368	153.355	4.449	153.222	4.216	153.229	4.353
11	23	109.546	4.877	110.348	4.637	110.221	4.694	109.997	4.563
12	9	116.314	7.232	118.098	6.623	117.780	5.997	117.835	6.344
Note: $n$ is the area sample size, $\hat{θ}$ is the direct estimator and $s$ its standard error. PM is the posterior mean and PSD is the posterior standard deviation. The benchmarking value is 1,435. Under a uniform prior, the posterior probabilities that the areas 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 are deleted are respectively 0.090, 0.084, 0.095, 0.077, 0.066, 0.093, 0.097, 0.056, 0.098, 0.068, 0.079, 0.097.

Table 4.6
Comparison of posterior inference of the area parameters under random deletion benchmarking with different targets (a = 1,435)
Table summary
This table displays the results of Comparison of posterior inference of the area parameters under random deletion benchmarking with different targets (a = 1. The information is grouped by Area (appearing as row headers), $n$ , $\hat{θ}$ , $s$ , a, 1.5a and 0.5a (appearing as column headers).
Area	$n$	$\hat{θ}$	$s$	a		1.5a		0.5a
Area	$n$	$\hat{θ}$	$s$	PM	PSD	PM	PSD	PM	PSD
1	5	135.575	6.031	133.516	5.431	189.249	5.385	77.769	5.561
2	7	101.980	7.101	102.903	5.793	175.963	5.794	29.847	5.899
3	24	117.655	7.309	120.671	6.237	197.219	6.099	44.145	6.461
4	23	76.997	5.881	81.170	5.597	134.628	5.871	27.771	5.460
5	21	126.917	5.629	127.652	5.036	177.209	5.165	78.125	5.053
6	9	113.132	8.061	112.805	6.707	201.949	7.145	23.614	6.995
7	5	137.236	6.739	133.908	6.007	200.989	6.018	66.781	6.024
8	20	124.839	4.034	124.703	3.757	151.951	3.952	97.484	3.924
9	16	118.306	7.544	116.451	6.650	196.849	6.990	35.990	6.607
10	9	156.503	4.368	153.222	4.216	184.720	4.019	121.708	4.706
11	23	109.546	4.877	110.221	4.694	148.724	4.966	71.752	4.760
12	9	116.314	7.232	117.780	5.997	193.050	5.954	42.514	6.081
Note: $n$ is the area sample size, $\hat{θ}$ is the direct estimator and $s$ its standard error. PM is the posterior mean and PSD is the posterior standard deviation. The benchmarking value is 1,435. Under a uniform prior, the posterior probabilities that the areas 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 are deleted are respectively 0.090, 0.084, 0.095, 0.077, 0.066, 0.093, 0.097, 0.056, 0.098, 0.068, 0.079, 0.097. When the benchmarking value is increased by 50%, these probabilities are 0.090, 0.084, 0.095, 0.079, 0.064, 0.093, 0.097, 0.056, 0.098, 0.068, 0.079, 0.097. When the benchmarking value is decreased by 50%, these probabilities are 0.090, 0.084, 0.095, 0.077, 0.066, 0.093, 0.097, 0.057, 0.097, 0.068, 0.079, 0.097.

Table 4.7
Comparison of posterior inference of the area parameters under random deletion benchmarking with equal weights (EW), weights inversely proportional sample sizes (IW) and weights directly proportional to sample sizes (DW)
Table summary
This table displays the results of Comparison of posterior inference of the area parameters under random deletion benchmarking with equal weights (EW). The information is grouped by Area (appearing as row headers), $n$ , $\hat{θ}$ , $s$ , EW, IW and DW (appearing as column headers).
Area	$n$	$\hat{θ}$	$s$	EW		IW		DW
Area	$n$	$\hat{θ}$	$s$	PM	PSD	PM	PSD	PM	PSD
1	5	135.575	6.031	133.516	5.431	133.508	5.518	133.436	5.404
2	7	101.980	7.101	102.903	5.793	103.042	5.737	103.049	5.809
3	24	117.655	7.309	120.671	6.237	120.529	6.176	120.634	6.247
4	23	76.997	5.881	81.170	5.597	81.167	5.571	81.111	5.567
5	21	126.917	5.629	127.652	5.036	127.669	5.079	127.541	5.055
6	9	113.132	8.061	112.805	6.707	112.762	6.704	113.074	6.716
7	5	137.236	6.739	133.908	6.007	133.965	5.968	133.798	6.027
8	20	124.839	4.034	124.703	3.757	124.829	3.734	124.719	3.757
9	16	118.306	7.544	116.451	6.650	116.300	6.707	116.502	6.640
10	9	156.503	4.368	153.222	4.216	153.238	4.198	153.204	4.220
11	23	109.546	4.877	110.221	4.694	110.190	4.697	110.208	4.690
12	9	116.314	7.232	117.780	5.997	117.802	6.010	117.726	5.989
Note: $n$ is the area sample size, $\hat{θ}$ is the direct estimator and $s$ its standard error. PM is the posterior mean and PSD is the posterior standard deviation. The benchmarking value is 1,435. Under a uniform prior, the posterior probabilities that the areas 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 are deleted are respectively 0.090, 0.084, 0.095, 0.077, 0.066, 0.093, 0.097, 0.056, 0.098, 0.068, 0.079, 0.097. When the benchmarking is done using weights inversely proportional to sample sizes, these probabilities are 0.167, 0.127, 0.039, 0.037, 0.026, 0.105, 0.184, 0.030, 0.061, 0.078, 0.039, 0.107. When the benchmarking is done using weights directly proportional to sample sizes, these probabilities are 0.032, 0.048, 0.168, 0.124, 0.103, 0.061, 0.036, 0.083, 0.112, 0.044, 0.123, 0.066.

For comparison, different posterior densities are presented in Figures 4.1-4.4. In Figures 4.1 and 4.2, we present posterior densities of all twelve area parameters when each area, in turn, is deleted. We observe that the posterior densities are slightly different around the modes, but nothing remarkable. In Figures 4.3 and 4.4, we present posterior densities of all twelve area parameters under the FH model (unconstrained), random deletion benchmarking and deleting the last one. There are some differences among the three densities, but again these are not alarmingly different.

Finally, empirical results are presented for a simulation scenario with 99 areas, reflecting the 99 counties in Iowa. The data are generated as previously described, and the BFH model without benchmarking, with random deletion benchmarking, and with deleting the last one benchmarking is fit using 20,000 iterations for the Gibbs sampler. For each model fit, the first 10,000 iterations are used as a burn-in and every tenth iteration is kept thereafter. The BFH model fitting takes 15 seconds, while the deletion benchmarking models takes slightly less than three minutes each. For the random deletion benchmarking model parameters, the regression coefficients $β$ and the variance $σ^{2},$ the p-values of the Geweke test are, respectively, 0.822, 0.128, 0.752 and 0.219, and the effective sample sizes are all 1,000 for the 1,000 selected iterations (i.e., an efficient Gibbs sampler). Note that the target is 12,162.93 and the sum of the PMs from the BFH model is 12,168.49, a difference of 5.56. In Figure 4.5, we present a plot of the coefficients of variation under random deletion benchmarking, deleting the last one benchmarking and BFH model versus the direct estimates by area. The differences among these models are not remarkable. Most of the points with direct CVs larger than about 0.04 fall below the $45^{o}$ straight line. However, some points (diamond) under the BFH model are above the $45^{o}$ line, four of them are noticeable, possibly shrinking too much. We conclude that it is sensible to perform the random deletion benchmarking.

Table 4.8
A summary of the comparison of inference from the direct estimator, the Bayesian Fay-Herriot (BFH) model, deleting the last one (LO) and random deletion (RD) benchmarking when the last county is extreme
Table summary
This table displays the results of A summary of the comparison of inference from the direct estimator Area, $n$ , $\hat{θ}$ , $s$ , BFH, LO and RD (appearing as column headers).
	Area	$n$	$\hat{θ}$	$s$	PM	PSD	PM	PSD	PM	PSD
	Area	$n$	$\hat{θ}$	$s$	BFH		LO		RD
a. The last county size is 2.	1	5	135.575	6.031	134.116	5.607	133.772	5.473	133.510	5.409
	2	7	101.980	7.101	103.205	6.482	102.818	6.118	102.745	5.837
	3	24	117.655	7.309	121.110	6.730	120.911	6.577	120.666	6.260
	4	23	76.997	5.881	81.586	6.021	81.741	5.544	81.196	5.631
	5	21	126.917	5.629	127.901	5.252	127.552	5.264	127.619	5.041
	6	9	113.132	8.061	113.454	7.147	112.889	6.818	113.074	6.815
	7	5	137.236	6.739	133.938	6.339	133.479	5.968	133.947	5.994
	8	20	124.839	4.034	124.753	3.906	124.699	3.824	124.738	3.735
	9	16	118.306	7.544	116.199	6.806	115.329	6.327	116.065	6.785
	10	9	156.503	4.368	153.419	4.434	153.148	4.174	153.240	4.213
	11	23	109.546	4.877	110.512	4.645	110.473	4.696	110.324	4.686
	12	2	121.881	12.75	124.243	10.00	123.755	8.771	123.444	8.525
b. The last county size is 50.	1	5	135.575	6.031	133.984	5.618	133.745	5.461	133.452	5.385
	2	7	101.980	7.101	103.462	6.499	103.136	6.086	103.044	5.780
	3	24	117.655	7.309	121.006	6.716	120.832	6.536	120.698	6.232
	4	23	76.997	5.881	81.473	5.995	81.596	5.512	81.162	5.728
	5	21	126.917	5.629	127.832	5.248	127.519	5.238	127.661	5.001
	6	9	113.132	8.061	113.393	7.146	112.929	6.777	112.899	6.675
	7	5	137.236	6.739	133.659	6.380	133.351	5.947	133.851	5.941
	8	20	124.839	4.034	124.732	3.906	124.713	3.821	124.726	3.825
	9	16	118.306	7.544	116.480	6.785	115.766	6.269	116.319	6.601
	10	9	156.503	4.368	153.355	4.449	153.225	4.173	153.306	4.230
	11	23	109.546	4.877	110.347	4.637	110.378	4.692	110.155	4.689
	12	50	116.538	6.791	118.117	6.282	118.035	5.958	117.952	5.702
Note: $n$ is the area sample size, $\hat{θ}$ is the direct estimator and $s$ its standard error. PM is the posterior mean and PSD is the posterior standard deviation. When the sample size of the last county is 50 (2), the benchmarking value is 1,435 (1,441). The uniform prior is used in the random benchmarking.

Figure 4.1 Comparison of the posterior densities for 01 to 06 when each area is deleted at a time (e.g., the first area is deleted in the first panel etc.)

Description for Figure 4.1

Figure presenting the posterior densities for $θ_{1}$ to $θ_{6}$ when each area is deleted at a time (e.g., the first area is deleted in the first panel etc.). There are six graphs, one for each theta, overlapping the density curves of the twelve areas. The posterior density is on the y-axis, ranging from 0.0 to 0.12. Theta is on the x-axis, ranging from 60 to 180. The posterior densities are similar in width, but the modes differ. It’s around theta = 130 for theta_1 and theta_5; around theta = 105 for theta_2; around theta = 120 for theta_3; around theta = 80 for theta_4 and around theta = 110 for theta_6.

Figure 4.2 Comparison of the posterior densities for 07 to 012 when each area is deleted at a time

Description for Figure 4.2

Figure presenting the posterior densities for $θ_{7}$ to $θ_{12}$ when each area is deleted at a time (e.g., the first area is deleted in the first panel etc.). There are six graphs, one for each theta, overlapping the density curves of the twelve areas. The posterior density is on the y-axis, ranging from 0.0 to 0.12. Theta is on the x-axis, ranging from 60 to 180. The posterior densities are similar, but there are slight differences. The mode is around theta = 130 for theta_7; around theta = 120 for theta_8, theta_9 and theta_12; around theta = 150 for theta_10 and around theta = 110 for theta_11. Densities are narrower and higher for theta_8, theta_10 and theta_11.

Figure 4.3 Comparison of the posterior densities for 01 to 06 under the Fay-Herriot model (-1), random deletion benchmarking (0) and area-12 deletion

Description for Figure 4.3

Figure presenting the posterior densities for $θ_{1}$ to $θ_{6}$ under the Fay-Herriot model, under random deletion benchmarking and for area-12 deletion. There are six graphs, one for each theta, overlapping the density curves of the three deletion types. The posterior density is on the y-axis, ranging from 0.0 to 0.10. Theta is on the x-axis, ranging from 60 to 180. The posterior densities are similar in width, but the modes differ. It’s around theta = 130 for theta_1 and theta_5; around theta = 105 for theta_2; around theta = 120 for theta_3; around theta = 80 for theta_4 and around theta = 110 for theta_6.

Figure 4.4 Comparison of the posterior densities of 07 to 012 under the Fay-Herriot model (-1), random deletion benchmarking (0) and area-12 deletion

Description for Figure 4.4

Figure presenting the posterior densities for $θ_{7}$ to $θ_{12}$ under the Fay-Herriot model, under random deletion benchmarking and for area-12 deletion. There are six graphs, one for each theta, overlapping the density curves of the three deletion types. The posterior density is on the y-axis, ranging from 0.0 to 0.10. Theta is on the x-axis, ranging from 60 to 180. The posterior densities are similar, but there are slight differences. The mode is around theta = 130 for theta_7; around theta = 120 for theta_8, theta_9 and theta_12; around theta = 150 for theta_10 and around theta = 110 for theta_11. Densities are narrower and higher for theta_8, theta_10 and theta_11.

Figure 4.5 Plot of the coefficients of variation under the random deletion benchmarking, deleting the last one and the Bayesian Fay-Herriot model for 99 areas

Description for Figure 4.5

Figure presenting a scatter plot of the coefficients of variation under the random deletion benchmarking, deleting the last one and the Bayesian Fay-Herriot model for 99 areas. The posterior CV is on the y-axis, ranging from 0.0 to 0.10. The direct CV is on the x-axis, ranging from 0.0 to 0.10. A 45° straight line is added to the graph. The differences among these models are not remarkable. Most of the points with direct CVs larger than about 0.04 fall below the 45° straight line. However, some points under the BFH model are above the 45° line, four of them are noticeable, possibly shrinking too much.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-07-04

Language selection

Search and menus

Search

Bayesian benchmarking of the Fay-Herriot model using random deletion
Section 4. Empirical studies

Bayesian benchmarking of the Fay-Herriot model using random deletion Section 4. Empirical studies

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Bayesian benchmarking of the Fay-Herriot model using random deletion
Section 4. Empirical studies