Publications

Survey Methodology

Browse by

2 Consistency and asymptotic normality

Jun Shao, Eric Slud, Yang Cheng, Sheng Wang, and Carma Hogue

To consider asymptotics, we view the population $U$ as one of a sequence of populations ${U^{(m)}, m = 1, 2, \dots},$ where the number of units in $U^{(m)}$ increases to infinity as $m \to \infty .$ This paper treats only the case of strata in which a large sample $n_{h}$ is drawn; that is, we assume that for each stratum $h,$ the sample size $n_{h}$ depends on $m$ and increases to infinity as $m \to \infty,$ but we omit the index $m$ for simplicity. All limiting processes are considered as $m \to \infty .$ Following authors such as Isaki and Fuller (1982) and Deville and Särndal (1992), we term this a superpopulation asymptotic framework. Under the design-based framework considered in Section 2.1, the attribute vectors in the underlying populations need not be viewed as random vectors. However, under the model-assisted framework considered in Section 2.2, regression models are assumed for attribute vectors.

Since each estimator is a sum of independent estimators constructed within each stratum, for simplicity we present asymptotic results for the case of $H = 1.$ The results and conclusions immediately apply to the case of a fixed $H$ and can also be extended to the situation where $H$ increases to infinity. (It is typical for large-scale surveys to have many strata, although the number of ASPEP government-by-type strata that were split into substrata was somewhat less than 100.) Since we only consider $H = 1,$ we omit the index $h$ for stratum in this section, e.g., $n_{h j} = n_{j},$ $n_{h} = n,$ $N_{h j} = N_{j},$ and $N_{h} = N .$ Also, for $j = 1, 2,$ the estimators ${\hat{β}}_{j}$ and $\hat{β}$ are defined by the displayed formulas following equations (1.2) and (1.3), with subscript $h$ suppressed, together with

$\begin{array}{l} {\hat{μ}}_{x j} & = & {\hat{X}}_{j} / {\hat{N}}_{j}, {\hat{α}}_{j} = {\hat{Y}}_{j} / {\hat{N}}_{j} - {\hat{β}}_{j} {\hat{μ}}_{x j}, {\hat{σ}}_{x j}^{2} = {\hat{N}}_{j}^{- 1} \sum_{i \in S_{j}} π_{i}^{- 1} {(x_{i} - {\hat{μ}}_{x j})}^{2} \\ {\hat{σ}}_{x e, j}^{2} & = & n_{j} \sum_{i \in S_{j}} {(x_{i} - {\hat{μ}}_{x j})}^{2} {(y_{i} - {\hat{α}}_{j} - {\hat{β}}_{j} x_{i})}^{2} / (π_{i}^{2} {\hat{N}}_{j}^{2}) . \end{array}$

Furthermore, for simplicity we consider asymptotics only under with-replacement sampling. The results can be applied to the case of without replacement sampling if the sampling fraction $n / N$ is negligible.

2.1 Design-based asymptotic framework

First, we establish the asymptotic normality of ${\hat{Y}}_{reg,1}$ and ${\hat{Y}}_{reg,2}$ under repeated sampling, that is, when $y_{i}$ and $x_{i}$ are fixed for $i \in U,$ and $S_{j}$ is a random PPS sample.

Theorem 1 Suppose that $S_{1}$ and $S_{2}$ are independent PPS samples with replacement from $U_{1}$ and $U_{2},$ respectively, where unit $i \in U_{j}$ has probability $p_{i j} = z_{i} / \sum_{i \in U_{j}} z_{i} > 0$ of being selected, and sampling weight $π_{i}^{- 1} = 1 / (n_{j} p_{i j})$ for $j = 1, 2,$ and that the following four conditions hold, as the population sequence index $m$ goes to $\infty .$

(C1) There exist constants $φ_{j}$ and $ω_{j}$ such that $\sqrt{n / n_{j}} \to φ_{j}$ and $N_{j} / N \to ω_{j} .$

(C2) For $j = 1, 2,$ there exist constants $μ_{y j}, μ_{x j}$ and $β_{j}$ such that

${\bar{Y}}_{j} = Y_{j} / N_{j} = \sum_{i \in U_{j}} y_{i} / N_{j} \to μ_{y j}, {\bar{X}}_{j} = X_{j} / N_{j} = \sum_{i \in U_{j}} x_{i} / N_{j} \to μ_{x j}$

exist, as do the limits $N_{j}^{- 1} \sum_{i \in U_{j}} {(x_{i} - μ_{x j})}^{2} \to σ_{x j}^{2} > 0,$ and in addition,

$(\sqrt{n_{j}} / N_{j}) \sum_{i \in U_{j}} x_{i} (y_{i} - Y_{j} / N_{j} - β_{j} (x_{i} - X_{j} / N_{j})) \to 0 as n, N \to \infty .$

(C3) The limits $D_{N_{j}} = \sum_{i \in U_{j}} p_{i j} b_{i j} b_{i j}^{T} / N_{j}^{2} \to D_{j}$ exist, where for $i \in U_{j},$

$b_{i j} = {[1 / p_{i j} - N_{j}, x_{i} / p_{i j} - X_{j}, y_{i} / p_{i j} - Y_{j}]}^{T},$

$v^{T}$ denotes the vector transpose, and $D_{j}$ is positive definite. The limit $σ_{x e, j}^{2} =$ $\lim N_{j}^{- 2} \sum_{i \in U_{j}} {(x_{i} - μ_{x j})}^{2} {(y_{i} - α_{j} - β_{j} x_{i})}^{2} / p_{i j}$ also exists, for $α_{j} = μ_{y j} - β_{j} μ_{x j} .$

(C4) The elements of $Λ_{j} = \sum_{i \in U_{j}} p_{i j} c_{i j} c_{i j}^{T} / N_{j}^{4}$ form a bounded sequence, where for $i \in U_{j},$

$c_{i j} = {[{(1 / p_{i j} - N_{j})}^{2}, {(x_{i} / p_{i j} - X_{j})}^{2}, {(y_{i} / p_{i j} - Y_{j})}^{2}]}^{T} .$

Then, as $m \to \infty,$ the following conclusions hold.

(a) For $j = 1, 2,$ ${\hat{μ}}_{x j} \to_{_{P}} μ_{x j}, {\hat{μ}}_{y j} \to_{_{P}} μ_{y j}, {\hat{β}}_{j} \to_{_{P}} β_{j}, {\hat{α}}_{j} \to_{_{P}} α_{j},$ and ${\hat{σ}}_{x j}^{2} \to_{_{P}} σ_{x j}^{2},$ where $\to_{_{P}}$ denotes convergence in probability.

(b) The combined-stratum estimator $\hat{β}$ has the exact expression

$\hat{β} = \frac{\sum_{j = 1}^{2} {\hat{β}}_{j} {\hat{σ}}_{x j}^{2} {\hat{N}}_{j} + ({\hat{X}}_{2} - {\hat{X}}_{1}) ({\hat{Y}}_{2} - {\hat{Y}}_{1}) {\hat{N}}_{1} {\hat{N}}_{2} / ({\hat{N}}_{1} + {\hat{N}}_{2})}{\sum_{j = 1}^{2} {\hat{σ}}_{x j}^{2} {\hat{N}}_{j} + {({\hat{X}}_{2} - {\hat{X}}_{1})}^{2} {\hat{N}}_{1} {\hat{N}}_{2} / ({\hat{N}}_{1} + {\hat{N}}_{2})} (2.1)$

and the in-probability limit

$β = \frac{\sum_{j = 1}^{2} β_{j} σ_{x j}^{2} ω_{j} + (μ_{x 2} - μ_{x 1}) (μ_{y 2} - μ_{y 1}) ω_{1} ω_{2}}{\sum_{j = 1}^{2} σ_{x j}^{2} ω_{j} + {(μ_{x 2} - μ_{x 1})}^{2} ω_{1} ω_{2}} .$

(c) $\sqrt{n_{j}} ({\hat{β}}_{j} - β_{j}) \to_{d} N (0, σ_{x e, j}^{2} / σ_{x, j}^{4}),$ where $\to_{d}$ denotes convergence in distribution, and ${\hat{σ}}_{x e, j}^{2} \to_{_{P}} σ_{x e, j}^{2} .$

(d) For $k = 1, 2,$

$\sqrt{n} ({\hat{Y}}_{reg, k} - Y) / N \to_{d} N (0, σ_{k}^{2}) (2.2)$

where $σ_{k}^{2} = \sum_{j = 1}^{2} a_{k j}^{T} D_{j} a_{k j}$ and

$a_{1 j} = ω_{j} φ_{j} {[- (μ_{y} - β μ_{x}), - β,1]}^{T}, a_{2 j} = ω_{j} φ_{j} {[- (μ_{y j} - β_{j} μ_{x j}), - β_{j},1]}^{T},$

$μ_{x} = ω_{1} μ_{x 1} + ω_{2} μ_{x 2},$ $μ_{y} = ω_{1} μ_{y 1} + ω_{2} μ_{y 2},$ and $D_{j}$ is given in condition (C3).

The conditions (C1)-(C4) of Theorem 1 provide a general formulation of the superpopulation framework for large-sample design-based statistical inference, within which the survey regression coefficients estimate well-defined frame-population descriptive parameters. The results in parts (a)-(b) show that the in-probability limits $β_{j}, α_{j}$ of ${\hat{β}}_{j}, {\hat{α}}_{j}$ have the standard interpretation as superpopulation least-squares slopes and intercepts. (These slope and intercept parameters also keep their usual model-based interpretations under the model (2.7) introduced in Section 2.2.) The asymptotic distribution theory for ${\hat{β}}_{j}$ in conclusion (c) allows us to deduce the large-sample behavior of ${\hat{Y}}_{dec}$ from that provided in (d) for ${\hat{Y}}_{reg, k} .$

Under the further conditions

$β_{1} = β_{2}, α_{1} = α_{2}, (2.3)$

it is clear from Theorem 1(b) that $β_{j} = β,$ and $σ_{1}^{2} = σ_{2}^{2}$ in (2.2), so that ${\hat{Y}}_{reg,1}$ and ${\hat{Y}}_{reg,2}$ and ${\hat{Y}}_{dec}$ are all asymptotically the same up to remainders of smaller order than $N / \sqrt{n},$ as we now show. Also, if $β_{1} \neq β_{2},$ then ${\hat{Y}}_{reg,2} - {\hat{Y}}_{dec}$ continues to be $o_{P} (N / \sqrt{n}),$ and the test of equality of slopes rejects, i.e., $P ({\hat{Y}}_{dec} = {\hat{Y}}_{reg,2}) \to 1,$ and therefore ${\hat{Y}}_{dec}$ has the same asymptotic distribution as ${\hat{Y}}_{reg,2},$ which is more efficient than ${\hat{Y}}_{reg,1}$ according to the result in Section 2.2.

Theorem 2 Assume the same hypotheses (C1)-(C4) as in Theorem 1.

(a) When (2.3) holds, then as $m \to \infty$

$\sqrt{n} ({\hat{β}}_{2} - {\hat{β}}_{1}) \to_{d} N (0, σ_{d}^{2}), σ_{d}^{2} = \sum_{j = 1}^{2} \frac{σ_{x e, j}^{2}}{φ_{j}^{2} σ_{x j}^{4}}, (2.4)$

and the estimators ${\hat{Y}}_{reg,1}, {\hat{Y}}_{reg,2},$ are all asymptotically normally distributed and equivalent in the sense that

$\frac{n}{N^{2}} [{({\hat{Y}}_{reg,1} - {\hat{Y}}_{reg,2})}^{2} + {({\hat{Y}}_{reg,2} - {\hat{Y}}_{dec})}^{2}] \to_{_{P}} 0 (2.5)$

(b) When $β_{1} \neq β_{2},$ $P ({\hat{Y}}_{dec} = {\hat{Y}}_{reg,2}) \to 1$ and $\sqrt{n} ({\hat{Y}}_{dec} - Y) / N \to_{_{d}} N (0, σ_{2}^{2}) .$

A more refined study of the asymptotic behavior of the estimators ${\hat{Y}}_{dec}$ can be undertaken in the spirit of Saleh (2006), as with contiguous or Pitman alternatives for non-survey statistical models, by assuming that $\sqrt{n} (β_{1} - β_{2}) \to r$ for a constant $r .$ Under this assumption, it can be shown that ${\hat{Y}}_{reg,1} - {\hat{Y}}_{reg,2} = o_{P} (N / \sqrt{n})$ and, therefore, the three centered and scaled estimators $\sqrt{n} ({\hat{Y}}_{dec} - Y),$ $\sqrt{n} ({\hat{Y}}_{reg,2} - Y),$ and $\sqrt{n} ({\hat{Y}}_{reg,1} - Y)$ all have the same asymptotic normal distribution with mean 0. Furthermore,

$P ({\hat{Y}}_{dec} = {\hat{Y}}_{reg,2}) \to Φ (- z_{τ / 2} + r / σ_{d}) + Φ (- z_{τ / 2} - r / σ_{d}), (2.6)$

where $σ_{d}^{2}$ is given in (2.4), and $z_{τ / 2}$ and $Φ$ are respectively the standard normal percentage point and distribution function. Thus, $P ({\hat{Y}}_{dec} = {\hat{Y}}_{reg,2})$ has a limit different from 1. In particular, the limit in (2.6) equals $τ$ when $β_{1} = β_{2}$ (i.e., when $r = 0$ ).

2.2 Model-assisted asymptotic setting

We elaborate in this section the behavior of estimators ${\hat{Y}}_{reg, k}, {\hat{Y}}_{dec}$ under the assumed probabilistic model that the triples $(x_{i}, y_{i}, z_{i})$ in the finite population, $i \in U_{j},$ are independent and identically distributed (iid), where the size-variables $z_{i} > 0$ are used in defining PPS with-replacement draw probabilities $p_{i j} = z_{i} / \sum_{i^{'} \in U_{j}} z_{i^{'}},$ and where $x_{i}$ and $y_{i}$ follow the model

$y_{i} = α_{j} + β_{j} x_{i} + ε_{i}, i \in U_{j}, (2.7)$

with $α_{j}$ and $β_{j}$ as unknown intercept and slope parameters for the regression within stratum $U_{j} .$ The errors $ε_{i}, i \in U_{j},$ are assumed to be iid with mean 0 and finite variance $σ_{ε}^{2}$ and to be independent of $(x_{i}, z_{i}),$ and the variables $x_{i}$ for $i \in U_{j}$ are assumed to have finite variance. Also, to enable PPS sampling, we assume that $\max_{i \in U_{j}} n_{j} p_{i j} < 1$ with probability approaching 1 for large $m,$ i.e., for large $n_{j}, N_{j} .$

In this section, asymptotic properties of estimators ${\hat{Y}}_{reg, k}, {\hat{Y}}_{dec}$ are considered with respect to the regression model and repeated sampling. By Theorem 1, the model-assisted estimators ${\hat{Y}}_{reg,1}$ and ${\hat{Y}}_{reg,2}$ are still consistent and asymptotically normal for triples $(x_{i}, y_{i}, z_{i})$ iid within strata, since the conditions (C1)-(C4) are satisfied under moment assumptions on $z_{i}, 1 / z_{i}$ even if model (2.7) is incorrect. However, the estimators ${\hat{Y}}_{reg, k}$ are efficient when model (2.7) is correct.

Theorem 3 Assume model (2.7) along with (C1), with $E (x_{i}^{4}) < \infty, E (ε_{i}^{4}) < \infty,$ $E (z_{i}) < \infty,$ and $E ((1 + x_{i}^{4}) / z_{i}^{3}) < \infty .$ Then all conclusions in Theorem 1 and Theorem 2 still hold. In particular, when $β_{1} \neq β_{2},$ $σ_{1}^{2},$ the asymptotic variance of $\sqrt{n} ({\hat{Y}}_{reg,1} - Y) / N,$ is larger than $σ_{2}^{2},$ the asymptotic variance of $\sqrt{n} ({\hat{Y}}_{reg,2} - Y) / N .$ Furthermore,

$\sqrt{n} ({\hat{Y}}_{dec} - Y) / N \to_{d} N (0, (1 - π) σ_{1}^{2} + π σ_{2}^{2}), (2.8)$

where $π$ is the limit of $P ({\hat{Y}}_{dec} = {\hat{Y}}_{reg,2}) .$

Note that $π$ in (2.8) is equal to 1 when $β_{1} \neq β_{2}$ and equal to $τ$ when $β_{1} = β_{2} .$

According to Theorem 3, under model (2.7), all three estimators defined in (1.2)-(1.4) have the same asymptotic efficiency when $α_{1} = α_{2}$ and $β_{1} = β_{2}$ (condition (2.3)). Furthermore, ${\hat{Y}}_{reg,1}$ is asymptotically worse than ${\hat{Y}}_{reg,2}$ when $β_{1} \neq β_{2} .$ Thus, why would we not always use ${\hat{Y}}_{reg,2} ?$

The assertions in Theorem 3 are first-order asymptotic results. A more refined, second-order asymptotic result under the conditions in Theorem 3 and condition (2.3) when the sizes $z_{i}$ are all equal is that, up to a term of order $n_{1}^{- 2} + n_{2}^{- 2},$

$mse (\frac{{\hat{Y}}_{reg,1}}{N}) - \frac{σ_{ε}^{2}}{n} \leq [mse (\frac{{\hat{Y}}_{reg,2}}{N}) - \frac{σ_{ε}^{2}}{n}] [1 - \frac{n_{1} n_{2} {({\bar{X}}_{1} - {\bar{X}}_{2})}^{2}}{n D_{n}}], (2.9)$

where mse is the mean squared error conditional on $x_{i} ’ s,$ ${\bar{X}}_{j} = N_{j}^{- 1} \sum_{i \in U_{j}} x_{i},$ and

$D_{n} = \sum_{j = 1}^{2} \sum_{i \in U_{j}} {(x_{i} - {\bar{X}}_{j})}^{2} + \frac{n_{1} n_{2} {({\bar{X}}_{1} - {\bar{X}}_{2})}^{2}}{n} .$

Result (2.9) indicates that, when weights are equal and $β_{1} = β_{2}$ and $α_{1} = α_{2},$ the finite sample performance of ${\hat{Y}}_{reg,1}$ may be better than that of ${\hat{Y}}_{reg,2}$ for moderate $n_{1}$ and $n_{2}$ . See the simulation results in Section 4. The proof of (2.9) is a special case of a more general result in Slud (2012) and, thus, is omitted.

In applications, we do not know whether $β_{1} = β_{2} .$ Hence, the decision-based estimator ${\hat{Y}}_{dec}$ is an adaptive procedure to select a good estimator. In view of (2.8), the performance of ${\hat{Y}}_{dec}$ is close to (slightly worse than) that of ${\hat{Y}}_{reg,2}$ when $β_{1} \neq β_{2},$ and is close to (slightly worse than) that of ${\hat{Y}}_{reg,1}$ when $α_{1} = α_{2}$ and $β_{1} = β_{2} .$ This is also supported by the simulation results in Section 4.

Previous | Next

Date modified:: 2017-09-20

Language selection

Search and menus

Search