Multiple imputation of missing values in household data with structural zeros
Section 3. Handling missing data using the NDPMPM
We modify the Gibbs sampler for the truncated NDPMPM to incorporate missing data. For let be a vector with when household-level variable in is missing, and otherwise. For and let be a vector with when individual-level variable for individual in is missing, and otherwise. For each household let , where comprise all data values corresponding to and and comprises all data values corresponding to and We assume that the data are missing at random (Rubin, 1976).
To incorporate missing values in the Gibbs sampler, we need to sample from the full conditional of each variable in conditioned on the variables for which and at every iteration. Thus, we add the ninth step,
- S9. For sample from its full conditional distribution
Sampling from this conditional distribution is nontrivial because of the dependence among variables induced by the structural zero rules in each Because of the dependence, we cannot simply sample each variable independently using the likelihoods in (2.3) and (2.4). If we could generate the set of all possible completions for all households with missing entries, conditional on the observed values, then calculating the probability of each one and sampling from the set would be straightforward. Unfortunately, this approach is not practical when the size of each is large. Even when the size of each is modest, each household could have different sets of completions, necessitating significant computing, storage, and memory requirements.
However, the full conditional in S9 takes a similar form as the kernel of the truncated NDPMPM in (2.1), so that we can generate the desired samples through a second rejection sampling scheme. Essentially, we sample from an untruncated version of the full conditional until we obtain a valid sample that satisfies see the Appendix for a proof that this rejection sampling scheme results in a valid Gibbs sampler. Notice that since itself is untruncated, we can generate samples from it by sampling each variable independently using (2.3) and (2.4). We therefore replace step S9 with S9'.
- S9'. For sample as follows.
- For each missing household-level variable, that is, each variable where with sample using (2.3).
- For each missing individual-level variable, that is, each variable where and with sample using (2.4).
- Set the sampled household-level and individual-level values to
- Combine with the observed that is, set . If set otherwise, return to step (9'a).
To initialize each we suggest sampling from the empirical marginal distribution of each variable using the available cases for each variable, and requiring that the household satisfies
Report a problem on this page
Is something not working? Is there information outdated? Can't find what you're looking for?
Please contact us and let us know how we can help you.
- Date modified: