Criteria for choosing between calibration weighting and survey weighting
Section 4. Simulation study
In order to evaluate the criterion (3.4), so that we can determine whether to use calibration weights or sampling weights, we conducted a series of simulations using data observed for a population of 5,800 cottage-industry units. We considered six calibration variables, from which several variables of interest were generated, with consideration for linear regression models, while accounting for the strength of the link between the variables of interest and the calibration variables through the choice of residual variance in the regression models. Furthermore, to study the impact of the heteroskedasticity of the model residuals on the results obtained for criterion we also considered the case where the variables of interest are generated using models with heteroskedastic residuals.
For the purposes of these simulations, we selected 10,000 samples using a simple random sampling design (SRSD), with three sample sizes: 100, 200 and 400 cottage-industry units, to study the impact of the sample size on the results obtained. Across the 10,000 samples selected, we calculated the following indicators:
- the AMSE for the calibration estimator, the expression of which is given by (2.5) and where and are determined respectively by the mean and the variance of weights considering all of the selected samples containing unit
- approximation (2.10) of the AMSE for the HT estimator. corresponds to (AMSE (2.7) for the HT estimator) that we were able to calculate in these simulations since the samples were selected using SRSD.
- Weff: the theoretical value of the Weff calculated using (3.1) and defined by the ratio of and
- the simulation mean for the estimator of where
- the simulation mean for the estimator of where
- the simulation mean for the estimator (3.4) of Weff.
- the MSE of simulations defined by
The simulation results for heteroskedastic regression models are presented in Table 4.1 below, while the results for homoskedastic models are given in Table A.1 in the appendix.
Variables of interest | |||||||
---|---|---|---|---|---|---|---|
Y1 | Y2 | Y3 | Y4 | Y5 | Y6 | ||
(R2 = 0.01) | (R2 = 0.10) | (R2 = 0.20) | (R2 = 0.50) | (R2 = 0.75) | (R2 = 0.98) | ||
n = 100 | (107) | 12,301.13 | 9,334.81 | 1,860.23 | 173.61 | 59.47 | 3.07 |
(107) | 11,285.46 | 8,643.37 | 1,841.84 | 323.46 | 212.69 | 160.35 | |
(107) | 11,285.44 | 8,643.34 | 1,841.81 | 323.43 | 212.66 | 160.32 | |
1.09 | 1.08 | 1.01 | 0.54 | 0.28 | 0.02 | ||
(107) | 12,463.22 | 9,484.87 | 1,984.51 | 180.37 | 62.07 | 3.21 | |
(107) | 11,856.45 | 9,068.99 | 1,929.87 | 330.59 | 215.13 | 160.07 | |
1.08 | 1.07 | 1.00 | 0.55 | 0.30 | 0.02 | ||
0.030 | 0.034 | 0.030 | 0.02 | 0.008 | 0.00005 | ||
n = 200 | (107) | 5,931.78 | 4,500.60 | 905.42 | 81.86 | 27.99 | 1.41 |
(107) | 5,543.74 | 4,245.87 | 904.76 | 158.89 | 104.48 | 78.77 | |
(107) | 5,543.72 | 4,245.85 | 904.75 | 158.88 | 104.46 | 78.75 | |
1.07 | 1.06 | 1.00 | 0.52 | 0.27 | 0.02 | ||
(107) | 5,770.29 | 4,382.31 | 969.57 | 83.81 | 28.68 | 1.48 | |
(107) | 5,673.08 | 4,341.19 | 924.64 | 160.71 | 105.06 | 78.71 | |
1.05 | 1.05 | 1.01 | 0.53 | 0.28 | 0.02 | ||
0.008 | 0.008 | 0.007 | 0.006 | 0.002 | 0.00005 | ||
n = 400 | (107) | 3,847.61 | 2,919.12 | 589.97 | 53.05 | 18.13 | 0.94 |
(107) | 3,629.83 | 2,780.03 | 592.40 | 104.04 | 68.41 | 51.57 | |
(107) | 3,629.82 | 2,780.02 | 592.39 | 104.03 | 68.40 | 51.56 | |
1.06 | 1.05 | 0.99 | 0.51 | 0.27 | 0.02 | ||
(107) | 3,718.79 | 2,889.81 | 594.01 | 53.89 | 18.44 | 0.95 | |
(107) | 3,687.44 | 2,821.34 | 602.39 | 104.83 | 68.68 | 51.60 | |
1.04 | 1.04 | 0.98 | 0.52 | 0.27 | 0.02 | ||
0.004 | 0.005 | 0.004 | 0.003 | 0.001 | 0.00001 |
Hence, the simulation results show that the Weff criterion proposed to measure the impact of using calibration weights helps us to identify situations where calibration weighting should not be used, i.e., when the variable of interest is weakly correlated with the calibration variables Furthermore, the estimator (3.4) proposed to estimate the Weff criterion proved to be an effective estimator, recording the same performances, regardless of the strength of the link between the variable of interest and the calibration variables. Heteroskedastic residuals for regression models, representing the link between the variable of interest and the calibration variables, had little impact on the performances of the Weff criterion and the estimator. We also noted a lack of impact in using approximation (2.8) for the variance under design since the impact of the deviation between the AMSE for the HT estimator and its approximation (2.10) was negligible in the results for the Weff criterion. This was predictable since the design being considered was a SRSD.
- Date modified: