Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 4. Simulation study

Table of contents

We report in this section the results of a design-based simulation study as it is in line with measures that are computed by the National Statistical Offices. A design-based study is one where a fixed finite population is first generated using an assumed model, and then for each simulation run, a sample is drawn employing the fixed finite population. The aim of the simulation study is to evaluate the properties of the benchmarked estimators described in Section 3 in terms of design bias and design mean squared error. We considered two scenarios: Scenario 1 corresponds to the case of correct modeling, whereas Scenario 2 corresponds to the case of incorrect modelling. Model diagnostics such as those given in Rao and Molina (2015, pages 114-118), can be used to test whether the models are correct or not. Such model diagnostics include residual analysis to detect departures from the assumed model, selection of auxiliary variables for the model, and case-deletion diagnostics to detect influential observations.

4.1 Simulation set-up for generating the finite populations

For each scenario, we considered five populations. Each population had $m =$ 30 small areas, with $N_{i} =$ 100 population units within each small area. The populations corresponding to Scenario 1 were created using the following model

$y_{i j} = β_{0} + x_{i j} β_{1} + v_{i} + e_{i j}, i = 1, \dots, m; j = 1, \dots, N_{i}, (4.1)$

where $β_{0} = 10$ and $β_{1} = 5.$ For generating the populations in Scenario 2, we split the 30 small areas into three equal groups of small areas, denoted as $G_{l},$ for $l = 1, 2, 3.$ The first group $G_{1}$ contains areas $i = 1, \dots, 10,$ the second group $G_{2}$ contains areas $i = 11, \dots, 20,$ and the third group $G_{3}$ contains areas $i = 21, \dots, 30.$ The model within a given group is given by

$y_{i j} = β_{0, l} + x_{i j} β_{1, l} + v_{i} + e_{i j}, i \in G_{l}; j = 1, \dots, N_{i}, (4.2)$

where $(β_{0, 1} = 10, β_{1, 1} = 1)$ for areas $i \in G_{1},$ $(β_{0, 2} = 20, β_{1, 2} = 5)$ for areas $i \in G_{2},$ and $(β_{0, 3} = 30, β_{1, 3} = 10)$ for areas $i \in G_{3} .$ Both (4.1) and (4.2) use the auxiliary variable $x_{i j} = {(1, x_{i j})}^{T}$ whose values $x_{i j}, j = 1, \dots, N_{i}$ were generated from an exponential distribution with mean equal to 5 and variance equal to 25.

The random components in (4.1) and (4.2) were generated from the normal distributions $v_{i} ~ N (0, σ_{v}^{2})$ and $e_{i j} ~ N (0, σ_{e}^{2}) .$ The five populations corresponding to Scenario 1, denoted as A1, B1, C1, D1 and E1, were generated based on (4.1) and the following variance parameters doublets: i. $(σ_{A v}^{2} = 0.2, σ_{e}^{2} = 20)$ for population A1; ii. $(σ_{B v}^{2} = 1, σ_{e}^{2} = 20)$ for population B1; iii. $(σ_{C v}^{2} = 2, σ_{e}^{2} = 20)$ for population C1; iv. $(σ_{D v}^{2} = 4, σ_{e}^{2} = 20)$ for population D1; and $(σ_{E v}^{2} = 20, σ_{e}^{2} = 20)$ for population E1. Note that, for populations A1 through E1, the value of $σ_{e}^{2}$ is kept fixed, whereas the values for $σ_{v}^{2}$ vary. The $σ_{v}^{2} ’ s$ are chosen to obtain the following variance ratios $δ = σ_{v}^{2} / σ_{e}^{2}$ as 0.01, 0.05, 0.1, 0.2 and 1. The five populations in Scenario 2, denoted as A2, B2, C2, D2 and E2, were generated based on (4.2) with the same variance parameters doublets as for Scenario 1.

A stratified sampling design was used by drawing independent probability proportional to size samples (pps) of size $n_{i}$ within the $i^{th}$ small area. The small area sample sizes were taken $n_{i} = 3$ for $i = 1, \dots, m .$ The selection probabilities were computed as $p_{i j} = b_{i j} / \sum_{j = 1}^{N_{i}} b_{i j},$ where the size measures are $b_{i j} = x_{i j} .$ We used Conditional Poisson Sampling (CPS) to select the pps samples within each small area (see Tillé (2006), Chapter 5). The basic design weights are given by $d_{i j} = 1 / (n_{i} p_{i j}) .$

In Scenario 1, we fitted the nested regression model (4.1) and its augmented version to pps sampling data selected from one of the five populations generated with model (4.1). This scenario represents correct modeling as the model fitted and the model used to generate the finite population coincide. In Scenario 2, we fitted the nested regression model (4.1) and its augmented version to pps sampling data selected from one of the five populations generated with model (4.2). This scenario represents incorrect modeling as the model fitted and the model used to generate the finite population do not coincide.

We selected $G =$ 30,000 stratified pps samples from each of the ten finite populations: populations A1 to E1 corresponding to Scenario 1, and populations A2 to E2 corresponding to Scenario 2. For $g = 1, \dots, G$ let $({\hat{σ}}_{v}^{2 RE(g)}, {\hat{σ}}_{e}^{2 RE(g)})$ and $({\hat{σ}}_{v}^{2 reRE(g)}, {\hat{σ}}_{e}^{2 reRE(g)})$ denote respectively the estimates of $(σ_{v}^{2}, σ_{e}^{2})$ given by the truncated REML method and its re-parameterized version, that correspond to the $g^{th}$ sample. The starting values in equation (B.2) were $α_{1}^{(0)} = \log (0 .1 + {\hat{σ}}_{v}^{2 RE(g)})$ and $α_{2}^{(0)} = \log ({\hat{σ}}_{e}^{2 RE(g)}) .$ Equation (B.2) reached convergence in less than 15 iterations for all the populations and both scenarios. Based on the $G$ simulated samples selected in each of the five populations corresponding to Scenario 1, we computed the Monte Carlo value of the probability to obtain a zero truncated REML estimate for $σ_{v}^{2}$ as

$P_{MC} ({\hat{σ}}_{v}^{2 RE} = 0) = \frac{1}{G} \sum_{g = 1}^{G} I ({\hat{σ}}_{v}^{2 RE(g)} = 0),$

where $I ( A )$ is an indicator function with value 1 if condition $A$ holds, and 0 otherwise.

Table 4.1 displays the Monte Carlo values of the probability to get a zero estimate for ${\hat{σ}}_{v}^{2 RE} .$ It can be seen that the simulated probability $P_{MC} ({\hat{σ}}_{v}^{2 RE} = 0)$ can be as high as 0.47 for $δ =$ 0.01. As $δ$ increases, this empirical probability decreases. Table 4.1 clearly shows that estimates $({\hat{σ}}_{v}^{2 RE}, {\hat{σ}}_{e}^{2 RE})$ cannot be used to compute the restricted EBLUP and YR estimators for samples selected in populations A1, B1, C1 and D1.

Table 4.1
Values of $P ({\hat{σ}}_{v}^{2 RE} = 0) :$ Scenario 1
Table summary
This table displays the results of Values of $P ({\hat{σ}}_{v}^{2 RE} = 0):$ Scenario 1 Pop A1
$δ = 0.01$ , Pop B1
$δ = 0.05$ , Pop C1
$δ = 0.1$ , Pop D1
$δ = 0.2$ and Pop E1
$δ = 1$ (appearing as column headers).
	Pop A1 $δ = 0.01$	Pop B1 $δ = 0.05$	Pop C1 $δ = 0.1$	Pop D1 $δ = 0.2$	Pop E1 $δ = 1$
$P_{MC} ({\hat{σ}}_{v}^{2 RE} = 0)$	0.47	0.40	0.21	0.06	0.00

Figure 4.1 displays the number of iterations to convergence of the Fisher-scoring algorithm for the estimate ${\hat{σ}}_{v}^{2 reRE}$ of $σ_{v}^{2} .$ The algorithm stops when the value of $| {\hat{σ}}_{v}^{2 reRE(r + 1)} - {\hat{σ}}_{v}^{2 reRE(r)} |$ is less than $10^{- 5},$ where ${\hat{σ}}_{v}^{2 reRE(r)}$ represents the $r^{th}$ iteration computed with equation (B.2) in Appendix B. The percentages of Figure 4.1 are based only on samples with a truncated REML estimate of $σ_{v}^{2},$ that is ${\hat{σ}}_{v}^{2 RE} = 0.$ We only considered populations A1, B1, C1 and D1, as these four populations have non‑negligible probabilities for ${\hat{σ}}_{v}^{2 RE}$ to be null. Figure 4.1 clearly shows that the convergence is attained in a maximum of 11 iterations.

Figure 4.1 Percentage of iterations to convergence in samples with (formule)

Description for Figure 4.1

Figure showing the percentage of iterations to converge in samples with ${\hat{σ}}_{ν}^{2 RE} = 0.$ Four bar charts are presented for Pop A1, Pop B1, Pop C1 and Pop D1. The percentage from 0 to 100 is on the y-axis. The number of iterations to converge is on the x-axis, ranging from 4 to 10 for Pop A1 and Pop B1, from 4 to 11 for Pop C1 and from 4 to 9 for Pop D1. The convergence is attained in a maximum of 11 iterations. The convergence appears to be faster for Pop A1 and Pop B1 and slightly slower for Pop D1.

4.2 Comparison between the benchmarked estimators

The aim of the simulation study is to compare the benchmarked estimators described in Section 3 in terms of design bias and design mean squared error. We used both scenarios as we wanted to check how benchmarking protects against incorrect modeling. Furthermore, we considered the benchmark to two GREG estimators: ${\hat{Y}}_{1}^{GREG}$ and ${\hat{Y}}_{2}^{GREG} .$ Estimator ${\hat{Y}}_{1}^{GREG}$ has weights given by (3.2) calibrated on the auxiliary vector $x_{i j} = (1, x_{i j})$ associated with the small area model. It follows that estimator ${\hat{Y}}_{1}^{GREG}$ corresponds to the case $x_{i j} \subseteq x_{i j}^{*} .$ The second GREG estimator ${\hat{Y}}_{2}^{GREG}$ has weights given by (3.2) based on auxiliary vector $x_{i j}^{*} = (1, x_{i j}^{*}),$ where the values $x_{i j}^{*}, j = 1, \dots, N_{i}$ were generated from an exponential distribution with mean equal to 5 and variance equal to 25, and independently of the values $x_{i j}, j = 1, \dots, N_{i} .$ It follows that estimator ${\hat{Y}}_{2}^{GREG}$ corresponds to the case $x_{i j} ⊄ x_{i j}^{*},$ since the auxiliary variable $x_{i j}$ associated with the unit-level model (4.1) do not belong to vector $x_{i j}^{*}$ used to obtain the weights associated with ${\hat{Y}}_{2}^{GREG} .$

For a fixed finite population, let ${\bar{Y}}_{i}$ be the mean of the small area $i$ and ${\hat{\bar{Y}}}_{i}$ a generic estimator of ${\bar{Y}}_{i} .$ We denote by ${\hat{\bar{Y}}}_{i}^{(g)}$ the value of ${\hat{\bar{Y}}}_{i}$ based on the $g^{th}$ simulated sample, for $g = 1, \dots, G .$ The estimators described in Section 3 respect the benchmark property regardless of the method used to estimate the variance components. Since the restricted benchmarked estimators are based on estimates $({\hat{σ}}_{v}^{2 reRE (g)}, {\hat{σ}}_{e}^{2 reRE (g)}),$ we decided to use reREML for computing ${\hat{\bar{Y}}}_{i}^{(g)}$ for each estimator ${\hat{\bar{Y}}}_{i}$ evaluated in this simulation study.

We considered the following performance measures:

Average Absolute Relative Bias

$\bar{ARB} = \frac{1}{m} \sum_{i = 1}^{m} {ARB}_{i} with {ARB}_{i} = | \frac{1}{G} \sum_{g = 1}^{G} \frac{{\hat{\bar{Y}}}_{i}^{(g)}}{{\bar{Y}}_{i}} - 1 |$

Average Relative Root Mean Squared Error

$\bar{RRMSE} = \frac{1}{m} \sum_{i = 1}^{m} {RRMSE}_{i} with {RRMSE}_{i} = \sqrt{\frac{1}{G} \sum_{g = 1}^{G} {(\frac{{\hat{\bar{Y}}}_{i}^{(g)}}{{\bar{Y}}_{i}} - 1)}^{2}} .$

This portion of the simulation is summarized in four tables. We provide the results separately for Scenarios 1 and 2. The results for the case when the benchmarking is to ${\hat{Y}}_{1}^{GREG}$ (the case $x_{i j} \subseteq x_{i j}^{*})$ are summarized in Tables 4.2 (Scenario 1) and 4.3 (Scenario 2). Those for the case when the benchmarking is to ${\hat{Y}}_{2}^{GREG}$ (the case $x_{i j} ⊄ x_{i j}^{*})$ are summarized in Tables 4.4 (Scenario 1) and 4.5 (Scenario 2).

Benchmarking to ${\hat{Y}}_{1}^{GREG}$ (the case $x_{i j} \subseteq x_{i j}^{*}$ )

We computed the $\bar{ARB}$ and $\bar{RRMSE}$ for two non benchmarked estimators, ${\hat{\bar{Y}}}_{i}^{EBLUP}$ and ${\hat{\bar{Y}}}_{i}^{YR},$ as well as their corresponding estimators benchmarked to ${\hat{Y}}_{1}^{GREG} .$ For ${\hat{\bar{Y}}}_{i}^{EBLUP},$ we have three benchmarked estimators ${\hat{\bar{Y}}}_{i b}^{EBRat},$ ${\hat{\bar{Y}}}_{i a b}^{EBLUP}$ and ${\hat{\bar{Y}}}_{i b}^{REBLUP},$ given respectively by equations (3.5), (3.8) and (3.13). For ${\hat{\bar{Y}}}_{i}^{YR},$ the corresponding benchmarked estimators are ${\hat{\bar{Y}}}_{i b}^{YRat},$ ${\hat{\bar{Y}}}_{i b}^{YR}$ and ${\hat{\bar{Y}}}_{i b}^{RYR},$ given respectively by equations (3.5), (3.9) and (3.15).

We first discuss their properties when the model is correct (Scenario 1). Comparing the $\bar{ARB} ’ s$ across all the estimators in Table 4.2, we observe that there is not much difference between the estimators. The EBLUP estimators have somewhat smaller $\bar{ARB} ’ s$ than the estimators based on the YR procedure. The benchmarked estimator ${\hat{\bar{Y}}}_{i a b}^{EBLUP}$ has the smallest $\bar{ARB} ’ s,$ whereas the $\bar{ARB} ’ s$ of the benchmarked estimators ${\hat{\bar{Y}}}_{i b}^{EBRat}$ and ${\hat{\bar{Y}}}_{i b}^{REBLUP}$ are identical to those of ${\hat{\bar{Y}}}_{i}^{EBLUP} .$ The $\bar{ARB}$ values associated with estimators ${\hat{\bar{Y}}}_{i}^{YR},$ ${\hat{\bar{Y}}}_{i b}^{YRat}$ and ${\hat{\bar{Y}}}_{i b}^{RYR}$ are close, whereas estimator ${\hat{\bar{Y}}}_{i b}^{YR}$ has a somewhat larger relative bias, especially for larger values of $δ = σ_{v}^{2} / σ_{e}^{2} .$ For all the estimators, the $\bar{ARB} ’ s$ increase as $δ$ increases: slight exceptions occur when $δ = 1.$

Next, we report on the $\bar{RRMSE} ’ s .$ As expected, the smallest $\bar{RRMSE} ’ s$ are associated with ${\hat{\bar{Y}}}_{i}^{EBLUP},$ whereas estimator ${\hat{\bar{Y}}}_{i}^{YR}$ has somewhat larger $\bar{RRMSE}$ values due to the use of survey weights under correct modeling. Benchmarking results in an increase of the $\bar{RRMSE} .$ Note that the $\bar{RRMSE} ’ s$ of the benchmarked estimators ${\hat{\bar{Y}}}_{i a b}^{EBLUP}$ and ${\hat{\bar{Y}}}_{i b}^{YR}$ given in Sections 3.1 and 3.2 respectively, are higher than those associated with the restricted methods ${\hat{\bar{Y}}}_{i b}^{REBLUP}$ and ${\hat{\bar{Y}}}_{i b}^{RYR}$ given in Sections 3.3 and 3.4 respectively. The naïve ratio procedures ${\hat{\bar{Y}}}_{i b}^{EBRat}$ and ${\hat{\bar{Y}}}_{i b}^{YRat}$ have $\bar{RRMSE} ’ s$ that are quite comparable to those of the benchmarked that use the restricted methods. The $\bar{RRMSE} ’ s$ increase as $δ$ increases.

We conclude the following in the case $x_{i j} \subseteq x_{i j}^{*}$ and when the small area model is correctly specified. The restricted benchmarked or ratio type estimators perform better than those that use an augmented model for EBLUP or a modified YR method. When the restricted or the ratio benchmarking techniques are used, the resulting estimators have bias values that are similar to those associated with their non benchmarked versions, whereas their mean squared error values are slightly larger than those of the non benchmarked versions. The small area estimators and the GREG estimator ${\hat{Y}}_{1}^{GREG}$ are based on the same auxiliary variables, whereas the model is correct. Consequently, ${\hat{\bar{Y}}}_{i}^{EBLUP}$ and ${\hat{\bar{Y}}}_{i}^{YR}$ do not have to be severely modified to achieve benchmarking to ${\hat{Y}}_{1}^{GREG} .$

Table 4.2
$\bar{ARB}$ (%) and $\bar{RRMSE}$ (%) for Scenario 1: the benchmark to ${\hat{Y}}_{1}^{GREG}$ $(x_{i j} \subseteq x_{i j}^{*})$
Table summary
This table displays the results of $\bar{ARB}$ (%) and $\bar{RRMSE}$ (%) for Scenario 1: the benchmark to ${\hat{Y}}_{1}^{GREG}$ $(x_{i j} \subseteq x_{i j}^{*})$ . The information is grouped by Estimator (appearing as row headers), Measure, Pop A1
$δ = 0.01$ , Pop B1
$δ = 0.05$ , Pop C1
$δ = 0.1$ , Pop D1
$δ = 0.2$ and Pop E1
$δ = 1$ (appearing as column headers).
Estimator	Measure	Pop A1 $δ = 0.01$	Pop B1 $δ = 0.05$	Pop C1 $δ = 0.1$	Pop D1 $δ = 0.2$	Pop E1 $δ = 1$
${\hat{\bar{Y}}}_{i}^{EBLUP}$	$\bar{ARB}$	1.1	1.9	2.3	2.7	2.6
${\hat{\bar{Y}}}_{i}^{EBLUP}$	$\bar{RRMSE}$	2.7	3.4	3.9	4.9	6.5
${\hat{\bar{Y}}}_{i}^{YR}$	$\bar{ARB}$	1.2	2.0	2.4	2.9	3.1
${\hat{\bar{Y}}}_{i}^{YR}$	$\bar{RRMSE}$	3.1	3.7	4.2	5.3	7.2
${\hat{\bar{Y}}}_{i b}^{EBRat}$	$\bar{ARB}$	1.1	1.9	2.3	2.7	2.6
${\hat{\bar{Y}}}_{i b}^{EBRat}$	$\bar{RRMSE}$	3.2	3.8	4.3	5.2	6.9
${\hat{\bar{Y}}}_{i b}^{YRat}$	$\bar{ARB}$	1.2	2.0	2.4	2.9	3.1
${\hat{\bar{Y}}}_{i b}^{YRat}$	$\bar{RRMSE}$	3.1	3.7	4.3	5.3	7.4
${\hat{\bar{Y}}}_{i a b}^{EBLUP}$	$\bar{ARB}$	1.0	1.6	2.1	2.4	2.3
${\hat{\bar{Y}}}_{i a b}^{EBLUP}$	$\bar{RRMSE}$	9.6	9.8	10.1	11.1	13.9
${\hat{\bar{Y}}}_{i b}^{YR}$	$\bar{ARB}$	1.2	2.0	2.5	3.0	3.7
${\hat{\bar{Y}}}_{i b}^{YR}$	$\bar{RRMSE}$	3.5	4.8	5.4	11.7	14.5
${\hat{\bar{Y}}}_{i b}^{REBLUP}$	$\bar{ARB}$	1.1	1.9	2.3	2.7	2.6
${\hat{\bar{Y}}}_{i b}^{REBLUP}$	$\bar{RRMSE}$	3.2	3.8	4.3	5.3	7.0
${\hat{\bar{Y}}}_{i b}^{RYR}$	$\bar{ARB}$	1.2	2.0	2.4	2.9	3.2
${\hat{\bar{Y}}}_{i b}^{RYR}$	$\bar{RRMSE}$	3.1	3.7	4.3	5.3	7.5

The results for not using the correct model are given in Table 4.3. The value of $δ$ does not have much impact on the $\bar{ARB} ’ s$ and $\bar{RRMSE} ’ s$ across all estimators. The $\bar{ARB} ’ s$ and $\bar{RRMSE} ’ s$ of the EBLUP estimators, whether they are benchmarked or not, are higher than those associated with the YR estimators. It follows that if we have incorrect modeling, the use of the YR estimators is recommended. Since ${\hat{Y}}_{1}^{GREG}$ and the estimators based on the YR procedure use the same vector of auxiliary information, it follows that there is not much difference in terms of $\bar{ARB}$ and $\bar{RRMSE}$ between the non benchmarked estimator ${\hat{\bar{Y}}}_{i}^{YR}$ and its benchmarked versions, ${\hat{\bar{Y}}}_{i b}^{YRat},$ ${\hat{\bar{Y}}}_{i b}^{YR}$ and ${\hat{\bar{Y}}}_{i b}^{RYR} .$ However, it can be noticed that the benchmarked estimator ${\hat{\bar{Y}}}_{i b}^{YR}$ has the smallest $\bar{ARB}$ values, whereas the restricted benchmarked estimator ${\hat{\bar{Y}}}_{i b}^{RYR}$ has the smallest $\bar{RRMSE} ’ s .$

Table 4.3
$\bar{ARB}$ (%) and $\bar{RRMSE}$ (%) for Scenario 2: the benchmark to ${\hat{Y}}_{1}^{GREG}$ $(x_{i j} \subseteq x_{i j}^{*})$
Table summary
This table displays the results of $\bar{ARB}$ (%) and $\bar{RRMSE}$ (%) for Scenario 2: the benchmark to ${\hat{Y}}_{1}^{GREG}$ $(x_{i j} \subseteq x_{i j}^{*})$ . The information is grouped by Estimator (appearing as row headers), Measure, Pop A2
$δ = 0.01$ , Pop B2
$δ = 0.05$ , Pop C2
$δ = 0.1$ , Pop D2
$δ = 0.2$ and Pop E2
$δ = 1$ (appearing as column headers).
Estimator	Measure	Pop A2 $δ = 0.01$	Pop B2 $δ = 0.05$	Pop C2 $δ = 0.1$	Pop D2 $δ = 0.2$	Pop E2 $δ = 1$
${\hat{\bar{Y}}}_{i}^{EBLUP}$	$\bar{ARB}$	42.3	42.7	43.2	43.0	41.5
${\hat{\bar{Y}}}_{i}^{EBLUP}$	$\bar{RRMSE}$	59.8	60.5	61.1	60.6	59.0
${\hat{\bar{Y}}}_{i}^{YR}$	$\bar{ARB}$	13.5	13.8	13.8	13.6	13.5
${\hat{\bar{Y}}}_{i}^{YR}$	$\bar{RRMSE}$	42.8	43.2	43.5	43.2	42.4
${\hat{\bar{Y}}}_{i b}^{EBRat}$	$\bar{ARB}$	42.9	43.4	43.9	43.6	42.1
${\hat{\bar{Y}}}_{i b}^{EBRat}$	$\bar{RRMSE}$	61.2	61.9	62.7	62.1	60.3
${\hat{\bar{Y}}}_{i b}^{YRat}$	$\bar{ARB}$	13.8	14.1	14.1	13.9	13.8
${\hat{\bar{Y}}}_{i b}^{YRat}$	$\bar{RRMSE}$	43.9	44.4	44.7	44.4	43.5
${\hat{\bar{Y}}}_{i a b}^{EBLUP}$	$\bar{ARB}$	19.8	20.2	20.2	20.2	19.6
${\hat{\bar{Y}}}_{i a b}^{EBLUP}$	$\bar{RRMSE}$	66.2	66.7	67.6	67.3	66.6
${\hat{\bar{Y}}}_{i b}^{YR}$	$\bar{ARB}$	10.9	10.6	11.5	12.5	10.7
${\hat{\bar{Y}}}_{i b}^{YR}$	$\bar{RRMSE}$	47.3	47.6	48.1	47.9	47.8
${\hat{\bar{Y}}}_{i b}^{REBLUP}$	$\bar{ARB}$	41.2	41.8	41.8	41.7	40.6
${\hat{\bar{Y}}}_{i b}^{REBLUP}$	$\bar{RRMSE}$	58.2	59.0	59.1	58.9	57.4
${\hat{\bar{Y}}}_{i b}^{RYR}$	$\bar{ARB}$	12.5	12.7	12.6	12.5	12.5
${\hat{\bar{Y}}}_{i b}^{RYR}$	$\bar{RRMSE}$	42.4	42.9	43.1	42.9	42.1

Benchmarking to ${\hat{Y}}_{2}^{GREG}$ (the case $x_{i j} ⊄ x_{i j}^{*}$ )

The results of this case are given in Tables 4.4 and 4.5 for Scenarios 1 and 2, respectively. The weighting is with respect to $w_{i j}^{GREG}$ given by (3.2) .We investigated the following four estimators $({\hat{\bar{Y}}}_{i b}^{EBRat},$ ${\hat{\bar{Y}}}_{i b}^{REBLUP},$ ${\hat{\bar{Y}}}_{i b}^{YRat},$ and ${\hat{\bar{Y}}}_{i b}^{RYR})$ that are benchmarked to ${\hat{Y}}_{2}^{GREG} .$ The first two estimators, ${\hat{\bar{Y}}}_{i b}^{EBRat}$ and ${\hat{\bar{Y}}}_{i b}^{REBLUP},$ are given by equations (3.5) and (3.13) respectively, while the last two, ${\hat{\bar{Y}}}_{i b}^{YRat}$ and ${\hat{\bar{Y}}}_{i b}^{RYR},$ are given by equations (3.5) and (3.15).

In Table 4.4, we summarize the average ARB and RRMSE values when the model is correct. That is, both the sample and the population data respect model (4.1). We first discuss their properties in terms of the $\bar{ARB} ’ s .$ Comparing the $\bar{ARB} ’ s$ across all the estimators in Table 4.4, we observe once more that, under correct modeling, the original EBLUP estimator, ${\hat{\bar{Y}}}_{i}^{EBLUP},$ has the smallest $\bar{ARB} ’ s .$ The $\bar{ARB} ’ s$ increase when benchmarking is required, and this is different from what we noticed from Table 4.2. There is not much difference in terms of $\bar{ARB}$ between the benchmarked estimators obtained using ratio adjustment methods, ${\hat{\bar{Y}}}_{i b}^{EBRat}$ and ${\hat{\bar{Y}}}_{i b}^{YRat},$ and those obtained by restricted methods, ${\hat{\bar{Y}}}_{i b}^{REBLUP}$ and ${\hat{\bar{Y}}}_{i b}^{RYR} .$ The $\bar{ARB} ’ s$ increase as $δ$ increases: slight exceptions occur when $δ = 1.$

Next, we report on the $\bar{RRMSE} ’ s .$ As expected, the smallest $\bar{RRMSE} ’ s$ are associated with ${\hat{\bar{Y}}}_{i}^{EBLUP}$ which is optimal under correct modeling. Benchmarking results in an increase of $\bar{RRMSE} .$ Note that the $\bar{RRMSE} ’ s$ associated with all four benchmarking procedures in Table 4.4 are quite high compared to the $\bar{RRMSE} ’ s$ associated with the non benchmarked estimators ${\hat{\bar{Y}}}_{i}^{EBLUP}$ and ${\hat{\bar{Y}}}_{i}^{YR} .$ The estimators ${\hat{\bar{Y}}}_{i b}^{EBRat}$ and ${\hat{\bar{Y}}}_{i b}^{YRat}$ have similar efficiency, whereas ${\hat{\bar{Y}}}_{i b}^{REBLUP}$ and ${\hat{\bar{Y}}}_{i b}^{RYR}$ have $\bar{RRMSE}$ values that are somewhat larger than those of ${\hat{\bar{Y}}}_{i b}^{EBRat}$ and ${\hat{\bar{Y}}}_{i b}^{YRat} .$ The $\bar{RRMSE} ’ s$ increase as $δ$ increases.

When $x_{i j} ⊄ x_{i j}^{*},$ there are larger differences between the small area estimators based on model (2.2) that uses the vector $x_{i j},$ and the GREG estimator that uses $x_{i j}^{*} .$ Notice that we considered a somewhat extreme situation when $x_{i j}$ and $x_{i j}^{*}$ have no variable in common. It follows that the modifications needed to obtain benchmarked estimators are more accentuated in this case as compared to the case $x_{i j} \subseteq x_{i j}^{*} .$ This explains why in Table 4.4 the benchmarked estimators have significantly larger $\bar{ARB}$ and $\bar{RRMSE}$ values than the estimators that are not benchmarked to ${\hat{Y}}_{2}^{GREG} .$

Table 4.4
$\bar{ARB}$ (%) and $\bar{RRMSE}$ (%) for Scenario 1: the benchmark to ${\hat{Y}}_{2}^{GREG}$ $(x_{i j} ⊄ x_{i j}^{*})$
Table summary
This table displays the results of $\bar{ARB}$ (%) and $\bar{RRMSE}$ (%) for Scenario 1: the benchmark to ${\hat{Y}}_{2}^{GREG}$ $(x_{i j} ⊄ x_{i j}^{*})$ . The information is grouped by Estimator (appearing as row headers), Measure, Pop A1
$δ = 0.01$ , Pop B1
$δ = 0.05$ , Pop C1
$δ = 0.1$ , Pop D1
$δ = 0.2$ and Pop E1
$δ = 1$ (appearing as column headers).
Estimator	Measure	Pop A1 $δ = 0.01$	Pop B1 $δ = 0.05$	Pop C1 $δ = 0.1$	Pop D1 $δ = 0.2$	Pop E1 $δ = 1$
${\hat{\bar{Y}}}_{i}^{EBLUP}$	$\bar{ARB}$	1.1	1.9	2.3	2.7	2.6
${\hat{\bar{Y}}}_{i}^{EBLUP}$	$\bar{RRMSE}$	2.7	3.4	3.9	4.9	6.5
${\hat{\bar{Y}}}_{i}^{YR}$	$\bar{ARB}$	1.2	2.0	2.4	2.9	3.1
${\hat{\bar{Y}}}_{i}^{YR}$	$\bar{RRMSE}$	3.1	3.7	4.2	5.3	7.2
${\hat{\bar{Y}}}_{i b}^{EBRat}$	$\bar{ARB}$	4.2	4.3	4.5	4.9	4.6
${\hat{\bar{Y}}}_{i b}^{EBRat}$	$\bar{RRMSE}$	13.0	13.2	13.5	14.0	14.6
${\hat{\bar{Y}}}_{i b}^{YRat}$	$\bar{ARB}$	4.2	4.3	4.5	5.0	4.8
${\hat{\bar{Y}}}_{i b}^{YRat}$	$\bar{RRMSE}$	13.0	13.2	13.5	14.0	14.0
${\hat{\bar{Y}}}_{i b}^{REBLUP}$	$\bar{ARB}$	4.2	4.3	4.5	5.0	4.8
${\hat{\bar{Y}}}_{i b}^{REBLUP}$	$\bar{RRMSE}$	13.1	13.3	13.5	14.1	15.0
${\hat{\bar{Y}}}_{i b}^{RYR}$	$\bar{ARB}$	4.2	4.3	4.6	5.1	5.0
${\hat{\bar{Y}}}_{i b}^{RYR}$	$\bar{RRMSE}$	13.5	13.7	13.8	14.5	16.2

The impact of using an incorrect model is given in Table 4.5. We see that ${\hat{\bar{Y}}}_{i}^{EBLUP}$ suffers the most in terms of both $\bar{ARB}$ and $\bar{RRMSE}$ because the EBLUP procedure assumes that the model is correct. The benchmarked versions of EBLUP, ${\hat{\bar{Y}}}_{i b}^{EBRat}$ and ${\hat{\bar{Y}}}_{i b}^{REBLUP},$ also have high $\bar{ARB} ’ s$ and $\bar{RRMSE} ’ s .$ Although the original You and Rao (2002) estimator, ${\hat{\bar{Y}}}_{i}^{YR},$ has much smaller $\bar{ARB}$ than the EBLUP estimator, its $\bar{RRMSE}$ is fairly high. The $\bar{ARB}$ and $\bar{RRMSE}$ associated with the ratio benchmarked version of ${\hat{\bar{Y}}}_{i}^{YR},$ ${\hat{\bar{Y}}}_{i b}^{YRat},$ are a bit higher than those associated with ${\hat{\bar{Y}}}_{i}^{YR} .$ The benchmarked YR estimator, ${\hat{\bar{Y}}}_{i b}^{RYR},$ which is based on the restricted procedure given in Section 3.4, has an $\bar{ARB}$ that is the smallest amongst the estimators in Table 4.5. Due to benchmarking, its $\bar{RRMSE}$ is slightly larger than the one associated with ${\hat{\bar{Y}}}_{i}^{YR} .$

Table 4.5
$\bar{ARB}$ (%) and $\bar{RRMSE}$ (%) for Scenario 2: the benchmark to ${\hat{Y}}_{2}^{GREG}$ $(x_{i j} ⊄ x_{i j}^{*})$
Table summary
This table displays the results of $\bar{ARB}$ (%) and $\bar{RRMSE}$ (%) for Scenario 2: the benchmark to ${\hat{Y}}_{2}^{GREG}$ $(x_{i j} ⊄ x_{i j}^{*})$ . The information is grouped by Estimator (appearing as row headers), Measure, Pop A2
$δ = 0.01$ , Pop B2
$δ = 0.05$ , Pop C2
$δ = 0.1$ , Pop D2
$δ = 0.2$ and Pop E2
$δ = 1$ (appearing as column headers).
Estimator	Measure	Pop A2 $δ = 0.01$	Pop B2 $δ = 0.05$	Pop C2 $δ = 0.1$	Pop D2 $δ = 0.2$	Pop E2 $δ = 1$
${\hat{\bar{Y}}}_{i}^{EBLUP}$	$\bar{ARB}$	42.3	42.6	43.2	43.0	41.6
${\hat{\bar{Y}}}_{i}^{EBLUP}$	$\bar{RRMSE}$	59.8	60.4	61.1	60.7	59.1
${\hat{\bar{Y}}}_{i}^{YR}$	$\bar{ARB}$	13.6	13.6	13.9	13.7	13.5
${\hat{\bar{Y}}}_{i}^{YR}$	$\bar{RRMSE}$	42.8	43.1	43.5	43.3	42.4
${\hat{\bar{Y}}}_{i b}^{EBRat}$	$\bar{ARB}$	43.8	44.4	44.9	44.6	43.3
${\hat{\bar{Y}}}_{i b}^{EBRat}$	$\bar{RRMSE}$	65.4	66.1	67.0	66.4	64.5
${\hat{\bar{Y}}}_{i b}^{YRat}$	$\bar{ARB}$	15.0	15.2	15.6	15.2	14.9
${\hat{\bar{Y}}}_{i b}^{YRat}$	$\bar{RRMSE}$	47.9	48.2	48.7	48.3	47.3
${\hat{\bar{Y}}}_{i b}^{REBLUP}$	$\bar{ARB}$	37.3	38.0	38.1	37.8	37.1
${\hat{\bar{Y}}}_{i b}^{REBLUP}$	$\bar{RRMSE}$	57.4	58.2	58.5	58.2	56.7
${\hat{\bar{Y}}}_{i b}^{RYR}$	$\bar{ARB}$	9.9	10.1	10.4	10.0	10.1
${\hat{\bar{Y}}}_{i b}^{RYR}$	$\bar{RRMSE}$	43.4	43.8	44.2	43.9	43.1

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2021-06-24

Language selection

Search and menus

Search

Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 4. Simulation study

4.1 Simulation set-up for generating the finite populations

4.2 Comparison between the benchmarked estimators

Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible Section 4. Simulation study

4.1 Simulation set-up for generating the finite populations

4.2 Comparison between the benchmarked estimators

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 4. Simulation study