Browse by

6. A simulation study

Takis Merkouris

We have conducted a simulation to study the relative performance of the various composite estimators for the nested version of the basic design (c). Values of correlated scalar variables $x$ and $y$ were generated from a bivariate log-normal distribution with mean and variance parameters $(μ_{x}, μ_{y})$ and $(σ_{x}^{2}, σ_{y}^{2}) .$ With fixed $μ_{x} = 3,$ $μ_{y} = 5,$ four combinations of variances $(σ_{x}^{2}, σ_{y}^{2})$ (5 and 10) and three values of the correlation $ρ (x, y)$ (0.5, 0.7, 0.9) were considered. Variances $σ_{x}^{2} = 5,$ $σ_{x}^{2} = 10$ imply skewness 2.65 and 4.33, respectively, while variances $σ_{y}^{2} = 5,$ $σ_{y}^{2} = 10$ imply skewness 1.43 and 2.15. For each of these twelve settings, a population of size $N = 1,000,000$ was created. From each of the twelve populations a simple random sample $S$ of size $n = 5,000$ was drawn without replacement, and split into three simple random subsamples $(S_{1}, S_{2}, S_{3})$ with two different allocations, namely, $(n_{1} = 2,000, n_{2} = 2,000, n_{3} = 1,000)$ and $(n_{1} = 1,500, n_{2} = 1,500, n_{3} = 2,000),$ the second allocation giving larger combined samples $S_{1} \cup S_{3}$ and $S_{2} \cup S_{3} .$ Thus, a total of 24 simulation settings were created. For each such setting, we computed the HT estimators of the totals $t_{x}$ and $t_{y}$ using the full sample $S,$ as well as the HT estimator of $t_{x}$ using $S_{1}$ and $S_{3}$ and the HT estimator of $t_{y}$ using $S_{2}$ and $S_{3} .$ For the HT estimators based on two subsamples, we employed the simple method for combining two subsamples (Gonzales and Eltinge 2008) by a weighting adjustment involving the probability of selection of a population unit in $S_{1}$ or in $S_{3}$ and in $S_{2}$ or in $S_{3} .$ In addition, for both $t_{x}$ and $t_{y}$ we computed the CGR and COR estimators. Each simulation sampling setting was repeated 10,000 times.

The simulated bias (in percent) of all estimators was smaller than 0.05%, with the exception of two settings involving $σ_{x}^{2} = 10,$ with associated population skewness of 4.33, where the largest observed values 0.14% and 0.17% correspond to CGR and COR for $t_{x},$ respectively, in the sample allocation (2,000, 2,000, 1,000), dropping to 0.10% and 0.13% in the more favorable allocation (1,500, 1,500, 2,000). Thus the relative efficiencies of the estimators are evaluated using their simulated design variances.

Table 6.1 shows the efficiency of the composite estimators CGR and COR relative to the HT estimators that use $S_{1} \cup S_{3}$ and $S_{2} \cup S_{3} .$ The measure of this relative efficiency is the percent relative difference of variances [V(CGR)-V(HT)]/V(HT) and [V(COR)-V(HT)]/V(HT). A negative value of this measure indicates the efficiency gain achieved by the two composite estimators. Not shown in Table 6.1, the simulated loss of efficiency of the HT estimators of both $t_{x}$ and $t_{y}$ due to not using the full sample $S$ is very close to the nominal loss for SRS, that is, 66.8% for the allocation (2,000, 2,000, 1,000), and 43.1% for the allocation (1,500, 1,500, 2,000).

Table 6.1
Relative differences (in percent) of variances of CGR and COR to HT for x and y, based on 10,000 simulated samples with two different sample allocations
Table summary
This table displays the results of Relative differences (in percent) of variances of CGR and COR to HT for x and y. The information is grouped by (n1, n2, n3) (appearing as row headers), (2,000; 2,000; 1,000), (1,500; 1,500; 2,000) and XXXX (appearing as column headers).
(n1, n2, n3)	(2,000; 2,000; 1,000)				(1,500; 1,500; 2,000)
	$x$		$y$		$x$		$y$
	CGR	COR	CGR	COR	CGR	COR	CGR	COR
$σ_{x}^{2} = 5 σ_{y}^{2} = 5$
$ρ = 0.5$	-2.24	-6.86	26.39	-6.23	-5.19	-6.29	12.59	-6.52
$ρ = 0.7$	-11.90	-14.75	10.21	-13.96	-12.78	-13.24	0.25	-13.13
$ρ = 0.9$	-24.89	-28.57	-12.49	-28.10	-21.55	-23.37	-14.55	-23.03
$σ_{x}^{2} = 5 σ_{y}^{2} = 10$
$ρ = 0.5$	-0.27	-6.75	6.50	-6.26	-3.94	-6.60	0.50	-6.44
$ρ = 0.7$	-11.47	-14.56	-6.29	-14.04	-12.87	-13.51	-9.51	-13.10
$ρ = 0.9$	-28.14	-28.42	-25.74	-28.23	-23.70	-23.54	-22.07	-23.09
$σ_{x}^{2} = 10 σ_{y}^{2} = 5$
$ρ = 0.5$	-4.57	-6.51	28.64	-6.17	-5.90	-5.98	17.57	-6.44
$ρ = 0.7$	-11.29	-14.37	16.08	-13.92	-11.66	-12.90	6.69	-13.00
$ρ = 0.9$	-20.32	-28.09	-2.46	-28.19	-18.46	-22.97	-6.97	-22.91
$σ_{x}^{2} = 10 σ_{y}^{2} = 10$
$ρ = 0.5$	-4.79	-6.49	8.54	-6.13	-6.06	-6.22	3.41	-6.34
$ρ = 0.7$	-13.27	-14.28	-2.57	-13.95	-13.27	-13.15	-6.00	-12.93
$ρ = 0.9$	-26.01	-28.06	-20.37	-28.21	-22.18	-23.17	-18.48	-22.89

For the variable $x,$ using the CGR estimator at low correlation $ρ = 0.5$ and with allocation (2,000, 2,000, 1,000) leads to an efficiency gain that ranges from 0.27% to 4.79% at the four different variance settings; this gain reflects the amount of lost information recovered by the CGR estimator. Substantial gain is achieved at $ρ = 0.7,$ ranging from 11.29% to 13.27%, and more so at $ρ = 0.9,$ ranging from 20.32% to 28.14%. With sample allocation (1,500, 1,500, 2,000) the CGR estimator performs better at $ρ = 0.5,$ and $ρ = 0.7,$ but not at $ρ = 0.9.$ Additional gain is achieved by the COR estimator, which is more efficient than the CGR estimator in all but two settings (where the estimators are equally efficient, see column 7). The efficiency of the COR estimator relative to HT estimator is close to the nominal for SRS efficiency, which is 6.25, 13.92 and 28.12 at $ρ = 0.5,$ $ρ = 0.7,$ $ρ = 0.9,$ respectively, for allocation (2,000, 2,000, 1,000), and 6.417, 13.186 and 23.30 for allocation (1,500, 1,500, 2,000); see quantity E in Section 2, third last paragraph. As expected, the CGR estimator competes better with the COR estimator with increasing correlation and sample size.

For the variable $y,$ the CGR estimator is inferior to the HT estimator at correlation level $ρ = 0.5$ and in half of the simulated settings at $ρ = 0.7;$ see positive values in columns 4 and 8. This inefficiency of the CGR estimator ranges from 6.50% (at $ρ = 0.7)$ to 28.64% (at $ρ = 0.5)$ in the sample allocation (2,000, 2,000, 1,000), and reduces to 0.25% (at $ρ = 0.7)$ to 17.57% (at $ρ = 0.5)$ in the sample allocation (1,500, 1,500, 2,000). This is explained by the larger skewness of $x$ (the $x$ variable being used a auxiliary to $y$ in the regression procedure); the lower levels of inefficiency are observed at $σ_{y}^{2} = 10,$ when the differential in skewness between $x$ and $y$ is the smallest. On the other hand, at correlation $ρ = 0.9$ and with allocation (2,000, 2,000, 1,000), the efficiency gain of the CGR estimator relative to the HT estimator ranges from 2.46% (when the skewness differential is the largest) to 25.74% (when the skewness differential is the smallest), with similar efficiency levels displayed for allocation (1,500, 1,500, 2,000). The COR estimator is more efficient than the CGR estimator in all settings, the relative efficiency being close to the nominal one for SRS (same efficiency as with $x) .$ For $y$ too, the CGR estimator competes better with COR estimator with increasing correlation and sample size.

This limited empirical study, which essentially simulates the SRS version of Theorem $1 (a ’),$ confirms the theory on the efficiency of the optimal estimator COR, even for modest sample size, and shows the usefulness of the two composite estimators CGR and COR in partially recovering the information loss due to splitting the full questionnaire. It also shows that the practical CGR estimator is not always a good substitute of the COR estimator for small samples and low correlation between $x$ and $y .$

Previous | Next

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

6. A simulation study