Browse by

5. The proposed estimator and its variance estimation

Alina Matei and M. Giovanna Ranalli

Recall that we have a variable of particular interest $y_{j}$ and that item nonresponse is present for it. If we wish to estimate the population total $Y_{j}$ of $y_{j},$ then a naive estimator that does not correct neither for unit nor for item nonresponse is given by

${\hat{Y}}_{j, naive} = N \sum_{k \in r_{j}} \frac{y_{k j}}{π_{k}} / \sum_{k \in r_{j}} \frac{1}{π_{k}} . (5.1)$

Reweighting item responders is also an approach to handle item nonresponse. Moustaki and Knott (2000) propose to weight item responders by the inverse of the fitted probability of item response ${\hat{q}}_{k ℓ},$ assuming ${\hat{q}}_{k ℓ} > 0.$ Therefore, a possible adjustment weight for item and unit nonresponse associated with unit $k \in r_{j}$ is given by $1 / ({\hat{p}}_{k} {\hat{q}}_{k j}) .$ We propose using the three-phase estimator adjusted for item and unit nonresponse via reweighting given by

${\hat{Y}}_{j, p q} = \sum_{k \in r_{j}} \frac{y_{k j}}{π_{k} {\hat{p}}_{k} {\hat{q}}_{k j}}, (5.2)$

where ${\hat{p}}_{k}$ is provided by Model (4.4), and ${\hat{q}}_{k j}$ by Model (4.2). Proposals that use imputation of $y_{k j}$ values for $k \in r \ r_{j}$ to deal with item nonresponse are also considered but not reported for reasons of space. They are available from the Authors upon request.

The properties of the proposed estimator (5.2) depend on the assumptions made about the unit and the item nonresponse mechanisms. In particular, Estimator (5.2) assumes a second phase of sampling with unknown response probabilities. If we ignore estimation of $θ_{k}$ in Model (4.4), the results in Kim and Kim (2007) on design consistency of the two-phase estimator that uses estimated response probabilities hold here as well when considering maximum likelihood estimates for the parameters $α_{0}$ and $α_{1} .$ Again, ignoring estimation of the latent variable $θ_{k}$ and using marginal maximum likelihood estimates for the parameters $β_{ℓ 0}$ and $β_{ℓ 1}$ in Model (4.2), estimator ${\hat{Y}}_{j, p q}$ will be consistent if the models for unit and item nonresponse probabilities are correctly specified.

We can consider replication methods for variance estimation of the proposed estimator and combine proposals for two-phase sampling (Kim, Navarro and Fuller 2006) and for generalized calibration in the presence of nonresponse (Kott 2006). In particular, the replicate variance estimator can be written as

${\hat{V}}_{r} = \sum_{l = 1}^{L} c_{l} {({\hat{Y}}_{j, p q}^{(l)} - {\hat{Y}}_{j, p q})}^{2},$

where ${\hat{Y}}_{j, p q}^{(l)}$ is the $l^{th}$ version of ${\hat{Y}}_{j, p q}$ based on the observations included in the $l^{th}$ replicate, $L$ is the number of replications, $c_{l}$ is a factor associated with replicate $l$ determined by the replication method. The $l^{th}$ replicate of ${\hat{Y}}_{j, p q}$ can be written as ${\hat{Y}}_{j, p q}^{(l)} = \sum_{k \in r_{j}} w_{3 k}^{(l)} y_{k j},$ where $w_{3 k}^{(l)}$ denotes the replicate weight for the $k^{th}$ unit in the $l^{th}$ replication. These replicate weights are computed using a two-step procedure.

First, note that, if we ignore for the moment the presence of item nonresponse, the two-phase estimator ${\hat{Y}}_{j, p} = \sum_{k \in r} w_{2 k} y_{k j},$ has weights

$w_{2 k} = 1 / (π_{k} p_{k}) = w_{1 k} F ({\hat{θ}}_{k}; α_{0}, α_{1}),$

with, $w_{1 k} = 1 / π_{k}, F ({\hat{θ}}_{k}; α_{0}, α_{1}) = 1 + \exp (- (α_{0} + α_{1} {\hat{θ}}_{k}))$ (see Equation (4.4)). Let ${\hat{z}}_{1} =$ $\sum_{k \in s} w_{1 k} z_{1 k}$ be the first phase estimate of the total of variable $z_{1}$ defined as $z_{1 k} = π_{k} p_{k} {(1, {\hat{θ}}_{k})}^{'} .$ Then, parameters $α_{0}$ and $α_{1}$ are such that

$\sum_{k \in r} w_{1 k} F ({\hat{θ}}_{k}; α_{0}, α_{1}) z_{1 k} = {\hat{z}}_{1} . (5.3)$

This procedure is equivalent to obtaining unweighted maximum likelihood estimates, but is convenient to set it as a non-linear generalized calibration problem. In this way, it is possible to use the approach in Kott (2006), combined with that in Kim et al. (2006), to obtain replicate weights using the following steps.

Step 1: Compute the first phase estimate of the total of $z_{1 k}$ with $l^{th}$ observation deleted, i.e., ${\hat{z}}_{1}^{(l)} = \sum_{k \in s} w_{1 k}^{(l)} z_{1 k},$ where $w_{1 k}^{(l)}$ is the classical jackknife replication weight for unit $k$ in replication $l .$ Compute the jackknife weights for the second phase sampling using ${\hat{z}}_{1}^{(l)}$ as a benchmark. In particular, $w_{2 k}^{(l)}$ are chosen to be $w_{2 k}^{(l)} = w_{2 k} w_{1 k}^{(l)} F ({\hat{θ}}_{k}; α_{0}, α_{1}) / w_{1 k}$ with $α_{0}$ and $α_{1}$ such that

$\sum_{k \in r} w_{2 k}^{(l)} z_{1 k} = {\hat{z}}_{1}^{(l)} .$

This procedure provides weights that are very similar to those considered in Kott (2006) and can be computed using existing software that handles generalized calibration.

Item nonresponse is handled similarly by considering $w_{3 k} = 1 / (π_{k} p_{k} q_{k j}) = w_{2 k} F ({\hat{θ}}_{k}; β_{j 0}, β_{j 1})$ (compare Equation (4.3)). A major approximation here is to assume that, given ${\hat{θ}}_{k},$ parameters $β_{j 0}$ and $β_{j 1}$ are estimated using a classical logistic model (instead of a 2PL model) and are such that

$\sum_{k \in r_{j}} w_{2 k} F ({\hat{θ}}_{k}; β_{j 0}, β_{j 1}) z_{2 k} = {\hat{z}}_{2},$

where ${\hat{z}}_{2} = \sum_{k \in r} w_{2 k} z_{2 k}$ and $z_{2 k} = π_{k} p_{k} q_{k j} {(1, {\hat{θ}}_{k})}^{T} .$ Another drawback is that auxiliary variables $z_{2 k}$ depend on $j$ and, therefore, different sets of weights have to be produced for the different variables of interest.

Step 2: Third phase jackknife weights are obtained by first computing the second phase estimate of the total of $z_{2 k}$ with unit $l$ removed by using weights coming from Step 1, i.e., ${\hat{z}}_{2}^{(l)} = \sum_{k \in r} w_{2 k}^{(l)} z_{2 k} .$ Then, using ${\hat{z}}_{2}^{(l)}$ as a benchmark, $w_{3 k}^{(l)}$ are chosen to be $w_{3 k}^{(l)} = w_{3 k} w_{2 k}^{(l)} F ({\hat{θ}}_{k}; β_{j 0}, β_{j 1}) / w_{2 k}$ with $β_{j 0}$ and $β_{j 1}$ computed via

$\sum_{k \in r_{j}} w_{3 k}^{(l)} z_{2 k} = {\hat{z}}_{2}^{(l)} .$

Previous | Next

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

5. The proposed estimator and its variance estimation