Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model
Section 4. Two diagnostics to evaluate the local performance of the B estimator

4.1 An approach conditional on ${\hat{θ}}_{i}$

From expression (2.5) in Section 2 and noting that $γ_{i} ({\hat{θ}}_{i} - β^{T} z_{i}) = b_{i} σ_{v} \sqrt{γ_{i}} ε_{i},$ we obtain the conditional distribution of $v_{i}$ :

$v_{i} | Z, {\hat{θ}}_{i} ~ N (σ_{v} \sqrt{γ_{i}} ε_{i}, (1 - γ_{i}) σ_{v}^{2}) .$

Conditioning on ${\hat{θ}}_{i}$ gives a better idea of the possible values $v_{i}$ can take. In particular, when the value of $γ_{i}$ is strictly greater than 0, the conditional distribution of $v_{i}$ may deviate significantly from its unconditional distribution: $v_{i} | Z ~ N (0, σ_{v}^{2}) .$

The first diagnostic is defined as the conditional probability:

$\begin{array}{l} D_{1 i} & = Prob ({MSE}_{p} ({\hat{θ}}_{i}^{B}) \leq {MSE}_{p} ({\hat{θ}}_{i}) | Z, {\hat{θ}}_{i}) \\ = Prob (- v_{L, i} \leq v_{i} \leq v_{L, i} | Z, {\hat{θ}}_{i}) . (4.1) \end{array}$

This diagnostic can be written as a function of $γ_{i}$ and the standardized error (2.4):

$\begin{array}{l} D_{1 i} = D_{1 i} (γ_{i}, | ε_{i} |) & = Φ {\sqrt{\frac{γ_{i}}{1 - γ_{i}}} (| ε_{i} | + \frac{\sqrt{1 + γ_{i}}}{γ_{i}})} \\ - Φ {\sqrt{\frac{γ_{i}}{1 - γ_{i}}} (| ε_{i} | - \frac{\sqrt{1 + γ_{i}}}{γ_{i}})}, (4.2) \end{array}$

where $Φ (\cdot)$ is the distribution function of the standard normal distribution. The proof of result (4.2) is given in Appendix A.

When this diagnostic takes values close to 0, we may conclude that $| v_{i} |$ is most likely larger than $v_{L, i}$ and that the direct estimator is preferable to the B estimator. To obtain a decision rule associated with this diagnostic, it is necessary to choose a threshold below which we decide to choose the direct estimator and above which the B estimator is chosen. A 50% threshold seems quite natural. Another idea is to apply an empirical approach and identify a break in the distribution of the values of diagnostic $D_{1 i}$ for the $m$ domains.

This diagnostic is not entirely design-based because it involves the conditional distribution $v_{i} | Z, {\hat{θ}}_{i} .$ It is therefore necessary to validate carefully the Fay-Herriot model before using it. Unfortunately, it is not possible to validate the assumptions on both $v_{i}$ and $e_{i}$ because the values of the parameters $θ_{i}, i = 1, \dots, m$ are not observed. However, the combined Fay-Herriot model (2.3) can be validated using model residuals (see, for example, Hidiroglou, Beaumont and Yung, 2019). These residuals are obtained by replacing the unknown quantities in the standardized error (2.4) with their estimates (see Section 5). A graph of residuals versus model predicted values is often suggested to validate the linearity assumption of the model. The normality assumption of the error $b_{i} v_{i} + e_{i}$ can be verified by a Q-Q plot of the residuals or normality tests such as the Shapiro-Wilk test. In case the model is not completely satisfactory, a conservative threshold of 75% may be appropriate.

The diagnostic in the following section is entirely design-based. It is therefore not dependent on the validity of the linking model. In this sense, it is considered more robust than the diagnostic (4.2). However, it relies on assumptions about the sampling errors $e_{i},$ discussed in Section 2, including the normality assumption of $e_{i} .$

4.2 Use of a design-based hypothesis test on the parameter $v_{i}$

In the design-based approach to inference, $v_{i}$ is fixed and the standardized error (2.4) follows the distribution:

$ε_{i} | Ω ~ N (v_{i} \frac{\sqrt{γ_{i}}}{σ_{v}}, (1 - γ_{i})) . (4.3)$

We have a unique observation of this random variable. We use it to test if $| v_{i} |$ is larger than $v_{L, i} .$ We consider the test:

$H_{0} : | v_{i} | = v_{L, i} versus H_{1} : | v_{i} | > v_{L, i} .$

We use $| ε_{i} |$ as our test statistic. We expect that $| ε_{i} |$ will have smaller values under $H_{0}$ than under $H_{1} .$ Let $ε_{obs, i}$ be the observed value of the statistic $ε_{i}$ and $P_{i} (v_{i}) = Prob (| ε_{i} | > | ε_{obs, i} | | Ω; v_{i}) .$ The $p$ -value of the test is defined as the probability that the statistic $| ε_{i} |$ is greater than the observed value $| ε_{obs, i} |$ under the null hypothesis. Appendix B shows that the $p$ -value is:

$P_{i} (v_{L, i}) = P_{i} (- v_{L, i}) = Φ (- τ_{i}) + Φ (- τ_{i} - 2 \frac{\sqrt{1 + γ_{i}}}{\sqrt{1 - γ_{i}}}),$

where

$τ_{i} = \frac{| ε_{obs, i} | - \sqrt{1 + γ_{i}}}{\sqrt{1 - γ_{i}}} .$

Since the second term is often negligible compared to the first term, especially when $τ_{i} > 0$ or $γ_{i}$ is large, our second diagnostic is:

$D_{2 i} = D_{2 i} (γ_{i}, | ε_{obs, i} |) = Φ (\frac{\sqrt{1 + γ_{i}} - | ε_{obs, i} |}{\sqrt{1 - γ_{i}}}) . (4.4)$

This second diagnostic can be interpreted as follows: When $D_{2 i}$ is small, we can assume that $| v_{i} |$ is likely to be larger than $v_{L, i}$ and the direct estimator is then preferred to the B estimator. For the choice of a decision threshold, values typically used as levels for hypothesis testing (e.g., 5% or 10%) can be used as a guide. With these small values, the B estimator is favoured. As with the previous diagnostic, the threshold can be determined by locating a break in the distribution of the values of diagnostic $D_{2 i}$ for the $m$ domains.

4.3 Some properties of diagnostics 1 and 2

In this section, we study the behaviour of the functions $D_{1 i} (γ_{i}, | ε_{i} |)$ and $D_{2 i} (γ_{i}, | ε_{i} |)$ for limiting cases of $γ_{i}$ and $| ε_{i} |$ and note their similarities and differences.

Case 1: $0 < γ_{i} < 1$ is fixed and $| ε_{i} | \to \infty .$

From equations (4.2) and (4.4) it can be shown that, for $| ε_{i} | > 0,$ the two functions $D_{1 i} (γ_{i}, | ε_{i} |)$ and $D_{2 i} (γ_{i}, | ε_{i} |)$ decrease as $| ε_{i} |$ increases. In other words, the derivative of these functions with respect to $| ε_{i} |$ is negative. In addition, the limit when $| ε_{i} | \to \infty$ of these two functions tends toward 0. For a sufficiently large value of $| ε_{i} |,$ the two diagnostics will therefore favour the direct estimator.

Case 2: $0 < γ_{i} < 1$ is fixed and $| ε_{i} | = 0.$

From equation (4.2), we observe that

$D_{1 i} (γ_{i}, 0) = Φ (\sqrt{\frac{1 + γ_{i}}{γ_{i} (1 - γ_{i})}}) - Φ (- \sqrt{\frac{1 + γ_{i}}{γ_{i} (1 - γ_{i})}}) .$

We can show that $D_{1 i} (γ_{i}, 0)$ is minimized when $γ_{i} = - 1 + \sqrt{2} .$ Therefore, $D_{1 i} (γ_{i}, 0) \geq D_{1 i} (- 1 + \sqrt{2},0) =$ 0.98. Since this value is close to 1, diagnostic 1 leads to choosing the B estimator in this case if a threshold of 0.50 or even 0.75 is chosen.

From equation (4.4) we obtain:

$D_{2 i} (γ_{i}, 0) = Φ (\sqrt{\frac{1 + γ_{i}}{1 - γ_{i}}}) .$

We can show that, for $0 \leq γ_{i} < 1,$ the function $D_{2 i} (γ_{i}, 0)$ is minimized when $γ_{i} = 0.$ Hence, $D_{2 i} (γ_{i}, 0) \geq D_{2 i} (0, 0) =$ 0.84. With a threshold smaller than 0.50, diagnostic 2 leads to the same decision as diagnostic 1 in this case, i.e. to choose the B estimator.

Case 3: $| ε_{i} | < \sqrt{2}$ is fixed and $γ_{i} \to 1.$

The two functions $D_{1 i} (γ_{i}, | ε_{i} |)$ and $D_{2 i} (γ_{i}, | ε_{i} |)$ tend toward 1 in this case. Therefore, diagnostics 1 and 2 lead to choosing the B estimator.

Case 4: $| ε_{i} | > \sqrt{2}$ is fixed and $γ_{i} \to 1.$

The two functions $D_{1 i} (γ_{i}, | ε_{i} |)$ and $D_{2 i} (γ_{i}, | ε_{i} |)$ tend toward 0 in this case. Diagnostics 1 and 2 lead here to choosing the direct estimator.

Case 5: $| ε_{i} |$ is fixed and $γ_{i} \to 0.$

The function $D_{1 i} (γ_{i}, | ε_{i} |)$ tends toward 1 for any fixed value of $| ε_{i} | .$ Therefore, diagnostic 1 favours the B estimator for small values of $γ_{i} .$

We note that $D_{2 i} (0, | ε_{i} |) = Φ (1 - | ε_{i} |) .$ Therefore, contrary to Diagnostic 1, Diagnostic 2 will lead to choosing the direct estimator if $| ε_{i} |$ is sufficiently large even when $γ_{i}$ is infinitely close to 0. For example, with a decision threshold at 0.05 and $γ_{i} = 0,$ Diagnostic 2 favours the direct estimator when $| ε_{i} | > 1 - Φ^{- 1} (0 .05) =$ 2.64.

In the first four cases above, both diagnostics lead to the same decision. There is a difference only in Case 5 where $γ_{i} \to 0.$ We therefore expect that Diagnostic 2 will choose the direct estimator more often than Diagnostic 1 for small values of $γ_{i} .$ Consider, for example, a threshold of 0.5 for Diagnostic 1 and of 0.05 for Diagnostic 2. For a threshold of 0.5, we can show that Diagnostic 1 leads to choosing the direct estimator as soon as $| ε_{i} |$ is larger than a value approximately equal to $\frac{\sqrt{1 + γ_{i}}}{γ_{i}},$ i.e. as soon as $| ε_{i} | \underset{~}{>} \frac{\sqrt{1 + γ_{i}}}{γ_{i}} .$ As for Diagnostic 2, for a threshold of 0.05, it leads to choosing the direct estimator as soon as $| ε_{i} | > \sqrt{1 + γ_{i}} - \sqrt{1 - γ_{i}} Φ^{- 1} (0 .05) .$ For $γ_{i} =$ 0.01, Diagnostic 1 thus leads to choosing the direct estimator when $| ε_{i} | \underset{~}{>}$ 100.5, while Diagnostic 2 leads to choosing the direct estimator when $| ε_{i} | >$ 2.64. The gap narrows as $γ_{i}$ increases. For example, for $γ_{i} =$ 0.2, Diagnostic 1 chooses the direct estimator when $| ε_{i} | \underset{~}{>}$ 5.48 and Diagnostic 2 chooses the direct estimator when $| ε_{i} | >$ 2.57. The above discussion seems to suggest that Diagnostic 2 leads to choosing the direct estimator more often than Diagnostic 1. However, there are cases where Diagnostic 1 chooses the direct estimator contrary to Diagnostic 2. These cases generally occur for fairly large values of $γ_{i} .$ For example, for $γ_{i} =$ 0.8, Diagnostic 1 chooses the direct estimator when $| ε_{i} | \underset{~}{>}$ 1.68, while Diagnostic 2 chooses the direct estimator only when $| ε_{i} | >$ 2.08.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2022-01-06

Language selection

Search and menus

Search

Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model
Section 4. Two diagnostics to evaluate the local performance of the B estimator

4.1 An approach conditional on ${\hat{θ}}_{i}$

4.2 Use of a design-based hypothesis test on the parameter $v_{i}$

4.3 Some properties of diagnostics 1 and 2

Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model Section 4. Two diagnostics to evaluate the local performance of the B estimator

4.3 Some properties of diagnostics 1 and 2

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model
Section 4. Two diagnostics to evaluate the local performance of the B estimator