Multiple imputation of missing values in household data with structural zeros
Section 5. Empirical study

To evaluate the performance of the NDPMPM as an imputation method, as well as the speed up strategies, we use data from the public use microdata files from the 2012 ACS, available for download from the United States Census Bureau (http://www2.census.gov/acs2012_1yr/pums/). We construct a population of 764,580 households of sizes H = { 2, 3, 4 } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaatuuDJXwAK1uy0HwmaeHbfv3ySLgzG0 uy0Hgip5wzaGabaiab=Tqiijaai2dadaGadaqaaiaaikdacaaISaGa aGjbVlaaiodacaaISaGaaGjbVlaaisdaaiaawUhacaGL9baacaGGSa aaaa@4651@ from which we sample n = 5,000 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbGaaGypaiaabwdacaqGSaGaae imaiaabcdacaqGWaaaaa@36A4@ households comprising N = 13,181 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGobGaaGypaiaabgdacaqGZaGaae ilaiaabgdacaqG4aGaaeymaaaa@3740@ individuals. We work with the variables described in Table 5.1, which mimic those in the U.S. decennial census. The structural zeros involve ages and relationships of individuals in the same house; see the Appendix for a full list of rules that we used. We move the household head to the household level as in Section 4.1 to take advantage of the computational gains.

We introduce missing values using the following scenario. We let household size and age of household heads be fully observed. We randomly and independently blank 30% of each variable for the remaining household-level variables. For individuals other than the household head, we randomly and independently blank 30% of the values for gender, race and Hispanic origin. We make age missing with rates 50%, 20%, 40% and 30% for values of the relationship variable in the sets {2}, {3, 4, 5, 10}, {7, 9} and {6, 8, 11, 12, 13}, respectively. We make the relationship variable missing with rates 40%, 25%, 10%, and 55% for values of age in the sets { x : x 20 } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaGadaqaaiaadIhacaaMi8UaaGOoai aaysW7caWG4bGaeyizImQaaGOmaiaaicdaaiaawUhacaGL9baacaGG Saaaaa@3D52@ { x : 20 < x 50 } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaGadaqaaiaadIhacaaMi8UaaGOoai aaysW7caaIYaGaaGimaiaaiYdacaWG4bGaeyizImQaaGynaiaaicda aiaawUhacaGL9baacaGGSaaaaa@3F91@ { x : 50 < x 70 } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaGadaqaaiaadIhacaaMi8UaaGOoai aaysW7caaI1aGaaGimaiaaiYdacaWG4bGaeyizImQaaG4naiaaicda aiaawUhacaGL9baacaGGSaaaaa@3F96@ and { x : x > 70 } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaGadaqaaiaadIhacaaMi8UaaGOoai aaysW7caWG4bGaaGOpaiaaiEdacaaIWaaacaGL7bGaayzFaaGaaiil aaaa@3C6A@ respectively. This results in approximately 30% missing values for both variables. About 8% of the individuals in the sample are missing both the age and relationship variable, and 2% are missing gender, age, and relationship jointly. This mechanism results in data that technically are not missing at random, but we use the NDPMPM approach regardless to examine its potential in a complicated missingness mechanism. Actual rates of item nonresponse in census data tend to be smaller than what we use here, but we use high rates to put the NDPMPM through a challenging stress test. We also introduce missing values using a missing completely at random scenario with rates in the 10% range across all the variables. In short, the results are similar to those here, though more accurate due to the lower rates of missingness. See the Appendix for the results.


Table 5.1
Description of variables used in the study. “HH” means household head
Table summary
This table displays the results of Description of variables used in the study. “HH” means household head. The information is grouped by Description of variable (appearing as row headers), Categories (appearing as column headers).
Description of variable Categories
Household-level variables Ownership of dwelling 1 = owned or being bought, 2 = rented
Household size 2 = 2 people, 3 = 3 people, 4 = 4 people
Gender of HH 1 = male, 2 = female
Race of HH 1 = white, 2 = black,
3 = American Indian or Alaska native,
4 = Chinese, 5 = Japanese,
6 = other Asian/Pacific islander, 7 = other race,
8 = two major races,
9 = three or more major races
Hispanic origin of HH 1 = not Hispanic, 2 = Mexican,
3 = Puerto Rican, 4 = Cuban, 5 = other
Age of HH 1 = less than one year old, 2 = 1 year old,
3 = 2 years old, ..., 96 = 95 years old
Individual-level variables Gender same as “Gender of HH”
Race same as “Race of HH”
Hispanic origin same as “Hispanic origin of HH”
Age same as “Age of HH”
Relationship to head of household 1 = spouse, 2 = biological child,
3 = adopted child, 4 = stepchild, 5 = sibling,
6 = parent, 7 = grandchild, 8 = parent-in-law,
9 = child-in-law, 10 = other relative,
11 = boarder, roommate or partner,
12 = other non-relative or foster child

We estimate the NDPMPM using two approaches, both using the rejection step S9' in Section 3. The first approach considers ψ 2 = ψ 3 = ψ 4 = 1 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGOmaaqaba GccaaI9aGaeqiYdK3aaSbaaSqaaiaaiodaaeqaaOGaaGypaiabeI8a 5naaBaaaleaacaaI0aaabeaakiaai2dacaaIXaGaaiilaaaa@3D6D@ i.e., without using the cap-and-weight approach, while the second approach considers ψ 2 = ψ 3 = 1 / 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGOmaaqaba GccaaI9aGaeqiYdK3aaSbaaSqaaiaaiodaaeqaaOGaaGypamaalyaa baGaaGymaaqaaiaaikdaaaaaaa@3A06@ and ψ 4 = 1 / 3 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGinaaqaba GccaaI9aWaaSGbaeaacaaIXaaabaGaaG4maaaacaGGUaaaaa@3733@ For each approach, we run the MCMC sampler for 10,000 iterations, discarding the first 5,000 as burn-in and thinning the remaining samples every five iterations, resulting in 1,000 MCMC post burn-in iterates. We set F = 30 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGgbGaaGypaiaaiodacaaIWaaaaa@3473@ and S = 15 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGtbGaaGypaiaaigdacaaI1aaaaa@3483@ for each approach based on initial tuning runs. Across the approaches, the effective number of occupied household-level clusters usually ranges from 13 to 16 with a maximum of 25, while the effective number of occupied individual-level clusters across all household-level clusters ranges from 3 to 5 with a maximum of 10. For convergence, we examined trace plots of α , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHXoqycaGGSaaaaa@33B9@ β , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHYoGycaGGSaaaaa@33BB@ and weighted averages of a random sample of the multinomial probabilities in (2.3) and (2.4) (since the multinomial probabilities themselves are prone to label switching).

For both methods, we generate L = 50 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGmbGaaGypaiaaiwdacaaIWaaaaa@347B@ completed datasets, Z = ( Z ( 1 ) , , Z ( 50 ) ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHAbGaaGypamaabmaabaGaaCOwam aaCaaaleqabaGaaGikaiaaigdacaaIPaaaaOGaaGilaiaaysW7cqWI MaYscaaISaGaaGjbVlaahQfadaahaaWcbeqaamaabmaabaGaaGynai aaicdaaiaawIcacaGLPaaaaaaakiaawIcacaGLPaaaaaa@419B@ , using the posterior predictive distribution of the NDPMPM, from which we estimate all marginal distributions, bivariate distributions of all possible pairs of variables, and trivariate distributions of all possible triplets of variables. We also estimate several probabilities that depend on within household relationships and the household head to investigate the performance of the NDPMPM in estimating complex relationships. We obtain confidence intervals using multiple imputation inferences (Rubin, 1987). As a brief review, let q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGXbaaaa@3260@ be the completed-data point estimator of some estimand Q , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbGaaiilaaaa@32F0@ and let u MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG1baaaa@3264@ be the estimator of variance associated with q . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGXbGaaiOlaaaa@3312@ For l = 1, , L , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGSbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8UaamitaiaacYcaaaa@3B06@ let q ( l ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGXbWaaWbaaSqabeaadaqadaqaai aadYgaaiaawIcacaGLPaaaaaaaaa@3507@ and u ( l ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG1bWaaWbaaSqabeaadaqadaqaai aadYgaaiaawIcacaGLPaaaaaaaaa@350B@ be the values of q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGXbaaaa@3260@ and u MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG1baaaa@3264@ in completed dataset Z ( l ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHAbWaaWbaaSqabeaadaqadaqaai aadYgaaiaawIcacaGLPaaaaaGccaaMb8UaaiOlaaaa@373A@ We use q ¯ L = l = 1 L q ( l ) / L MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGXbGbaebadaWgaaWcbaGaamitaa qabaGccaaI9aWaaSGbaeaadaaeWaqabSqaaiaadYgacaaI9aGaaGym aaqaaiaadYeaa0GaeyyeIuoakiaaykW7caWGXbWaaWbaaSqabeaada qadaqaaiaadYgaaiaawIcacaGLPaaaaaaakeaacaWGmbaaaaaa@3FB0@ as the point estimate of Q . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbGaaiOlaaaa@32F2@ We use T L = ( 1 + 1 / L ) b L + u ¯ L MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGubWaaSbaaSqaaiaadYeaaeqaaO GaaGypamaabmaabaWaaSGbaeaacaaIXaGaey4kaSIaaGymaaqaaiaa dYeaaaaacaGLOaGaayzkaaGaamOyamaaBaaaleaacaWGmbaabeaaki abgUcaRiqadwhagaqeamaaBaaaleaacaWGmbaabeaaaaa@3DB8@ as the estimated variance of q ¯ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGXbGbaebacaGGSaaaaa@3328@ where b L = l = 1 L ( q ( l ) q ¯ L ) 2 / ( L 1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGIbWaaSbaaSqaaiaadYeaaeqaaO GaaGypamaalyaabaWaaabmaeqaleaacaWGSbGaaGypaiaaigdaaeaa caWGmbaaniabggHiLdGcdaqadaqaaiaadghadaahaaWcbeqaamaabm aabaGaamiBaaGaayjkaiaawMcaaaaakiabgkHiTiqadghagaqeamaa BaaaleaacaWGmbaabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGaaG OmaaaaaOqaamaabmaabaGaamitaiabgkHiTiaaigdaaiaawIcacaGL Paaaaaaaaa@46AD@ and u ¯ L = l = 1 L u ( l ) / L . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWG1bGbaebadaWgaaWcbaGaamitaa qabaGccaaI9aWaaSGbaeaadaaeWaqabSqaaiaadYgacaaI9aGaaGym aaqaaiaadYeaa0GaeyyeIuoakiaaykW7caWG1bWaaWbaaSqabeaada qadaqaaiaadYgaaiaawIcacaGLPaaaaaaakeaacaWGmbaaaiaac6ca aaa@406A@ We make inference about Q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbaaaa@3240@ using ( q ¯ L Q ) t v ( 0, T L ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaqadaqaaiqadghagaqeamaaBaaale aacaWGmbaabeaakiabgkHiTiaadgfaaiaawIcacaGLPaaarqqr1ngB PrgifHhDYfgaiqaacqWF8iIocaWG0bWaaSbaaSqaaiaadAhaaeqaaO WaaeWaaeaacaaIWaGaaGilaiaaysW7caWGubWaaSbaaSqaaiaadYea aeqaaaGccaGLOaGaayzkaaGaaiilaaaa@45C3@ where t v MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG0bWaaSbaaSqaaiaadAhaaeqaaa aa@338A@ is a t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG0bGaaGjcVlabgkHiTaaa@34E1@ distribution with v = ( L 1 ) ( 1 + u ¯ L / [ ( 1 + 1 / L ) b L ] ) 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG2bGaaGypamaabmaabaGaamitai abgkHiTiaaigdaaiaawIcacaGLPaaadaqadaqaamaalyaabaGaaGym aiabgUcaRiqadwhagaqeamaaBaaaleaacaWGmbaabeaaaOqaamaadm aabaWaaeWaaeaacaaIXaGaey4kaSYaaSGbaeaacaaIXaaabaGaamit aaaaaiaawIcacaGLPaaacaWGIbWaaSbaaSqaaiaadYeaaeqaaaGcca GLBbGaayzxaaaaaaGaayjkaiaawMcaamaaCaaaleqabaGaaGOmaaaa aaa@4614@ degrees of freedom.

Figures 5.1 and 5.2 display the value of q ¯ 50 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGXbGbaebadaWgaaWcbaGaaGynai aaicdaaeqaaaaa@341C@ for each estimated marginal, bivariate and trivariate probability plotted against its corresponding estimate from the original data, without missing values. Figure 5.1 shows the results for the NDPMPM with the rejection sampler, and Figure 5.2 shows the results for the NDPMPM using the cap-and-weight approach. For both approaches, the point estimates are close to those from the data before introducing missing values, suggesting that the NDPMPM does a good job of capturing important features of the joint distribution of the variables. Figure 5.2 in particular also shows that the cap-and-weight approach did not degrade the estimates.

Table 5.2 displays 95% confidence intervals for several probabilities involving within-household relationships, as well as the value in the full population of 764,580 households. The intervals include the two based on the NDPMPM imputation engines and the interval from the data before introducing missingness. For the latter, we use the usual Wald interval, p ^ ± 1 .96 p ^ ( 1 p ^ ) / n , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGWbGbaKaacqGHXcqScaqGXaGaae OlaiaabMdacaqG2aGaaGjbVpaakaaabaWaaSGbaeaaceWGWbGbaKaa daqadaqaaiaaigdacqGHsislceWGWbGbaKaaaiaawIcacaGLPaaaae aacaWGUbaaaaWcbeaakiaacYcaaaa@3FDD@ where p ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGWbGbaKaaaaa@326F@ is the corresponding sample percentage. For the most part, the intervals from the NDPMPM with the full rejection sampling are close to those based on the data without any missingness. They tend to include the true population quantity. The NDPMPM imputation engine results in noticeable downward bias for the percentages of households where everyone is the same race, with bias increasing as the household size gets bigger. This is a challenging estimand to estimate accurately via imputation, particularly for larger households. Hu et al. (2018) identified biases in the same direction when using the NDPMPM (with household head data treated as individual-level variables) to generate fully synthetic data, noting that the bias gets smaller as the sample size increases. The NDPMPM fits the joint distribution of the data better and better as the sample size grows. Hence, we expect the NDPMPM imputation engine to be more accurate with larger sample sizes, as well as with smaller fractions of missing values.

The interval estimates from the cap-and-weight method are generally similar to those for the full rejection sampler, with some degradation particularly for the percentages of same race households by household size. This degradation comes with a benefit, however. Based on MCMC runs on a standard laptop, the NDPMPM using the cap-and-weight approach and moving household heads’ data values to the household level is about 42% faster than the NDPMPM with household heads’ data values moved to the household level.

Figure 5.1 Marginal, bivariate and trivariate probabilities computed in the sample and imputed datasets from the truncated NDPMPM with the rejection sampler. Household heads’ data values moved to the household level

Description for Figure 5.1 

Figure presenting the marginal, bivariate and trivariate probabilities computed in the sample and imputed datasets from the truncated NDPMPM with the rejection sampler (household heads’ data values moved to the household level). There are three scatter plots with a 45° straight line. The three graphs illustrate the marginal, bivariate and trivariate probabilities respectively. The average from 50 imputed datasets is on the y-axis, ranging from 0.0 to 1.0. The sample estimate is on the x-axis, ranging from 0.0 to 0.6. For all three graphs, estimations from imputed data are close to those from the sample, almost on the line.

Figure 5.2 Marginal, bivariate and trivariate probabilities computed in the sample and imputed datasets from the truncated NDPMPM using the cap-and-weight approach. Household heads’ data values to the household level

Description for Figure 5.2 

Figure presenting the marginal, bivariate and trivariate probabilities computed in the sample and imputed datasets from the truncated NDPMPM using the cap-and-weight approach (household heads’ data values moved to the household level). There are three scatter plots with a 45° straight line. The three graphs illustrate the marginal, bivariate and trivariate probabilities respectively. The average from 50 imputed datasets is on the y-axis, ranging from 0.0 to 1.0. The sample estimate is on the x-axis, ranging from 0.0 to 0.6. For all three graphs, estimations from imputed data are close to those from the sample, almost on the line. The cap-and-weight approach did not degrade the estimates.


Table 5.2
Confidence intervals for selected probabilities that depend on within-household relationships in the original and imputed datasets. “No missing” is based on the sampled data before introducing missing values, “NDPMPM” uses the truncated NDPMPM, moving household heads’ data values to the household level, and “NDPMPM Capped” uses the truncated NDPMPM with the cap-and-weight approach and moving household heads’ data values to the household level. “HH ” means household head, “SP” means spouse, “CH” means child, and “CP” means couple. Q is the value in the full population of 764,580 households
Table summary
This table displays the results of Confidence intervals for selected probabilities that depend on within-household relationships in the original and imputed datasets. “No missing” is based on the sampled data before introducing missing values Q, No Missing, NDPMPM and NDPMPM Capped (appearing as column headers).
Q No Missing NDPMPM NDPMPM Capped
All same race household: n i =2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ 0.942 (0.932, 0.949) (0.891, 0.917) (0.884, 0.911)
n i =3 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ 0.908 (0.907, 0.937) (0.843, 0.890) (0.821, 0.870)
n i =4 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ 0.901 (0.879, 0.917) (0.793, 0.851) (0.766, 0.828)
SP present This is an empty cell 0.696 (0.682, 0.707) (0.695, 0.722) (0.695, 0.722)
Same race CP This is an empty cell 0.656 (0.641, 0.668) (0.640, 0.669) (0.634, 0.664)
SP present, HH is White This is an empty cell 0.600 (0.589, 0.616) (0.603, 0.632) (0.604, 0.634)
White CP This is an empty cell 0.580 (0.569, 0.596) (0.577, 0.606) (0.574, 0.604)
CP with age difference less than five This is an empty cell 0.488 (0.465, 0.492) (0.341, 0.371) (0.324, 0.355)
Male HH, home owner This is an empty cell 0.476 (0.456, 0.484) (0.450, 0.479) (0.451, 0.480)
HH over 35, no CH present This is an empty cell 0.462 (0.441, 0.468) (0.442, 0.470) (0.443, 0.471)
At least one biological CH present This is an empty cell 0.437 (0.431, 0.458) (0.430, 0.459) (0.428, 0.456)
HH older than SP, White HH This is an empty cell 0.322 (0.309, 0.335) (0.307, 0.339) (0.311, 0.343)
Adult female w/ at least one CH under 5 This is an empty cell 0.078 (0.070, 0.085) (0.062, 0.078) (0.061, 0.077)
White HH with Hisp origin This is an empty cell 0.066 (0.064, 0.078) (0.062, 0.079) (0.062, 0.078)
Non-White CP, home owner This is an empty cell 0.058 (0.050, 0.063) (0.038, 0.052) (0.037, 0.051)
Two generations present, Black HH This is an empty cell 0.057 (0.053, 0.066) (0.052, 0.066) (0.052, 0.067)
Black HH, home owner This is an empty cell 0.052 (0.046, 0.058) (0.044, 0.058) (0.044, 0.059)
SP present, HH is Black This is an empty cell 0.039 (0.032, 0.042) (0.032, 0.044) (0.031, 0.043)
White-nonwhite CP This is an empty cell 0.034 (0.029, 0.039) (0.038, 0.053) (0.043, 0.059)
Hisp HH over 50, home owner This is an empty cell 0.029 (0.025, 0.034) (0.023, 0.034) (0.024, 0.034)
One grandchild present This is an empty cell 0.028 (0.023, 0.033) (0.024, 0.035) (0.023, 0.035)
Adult Black female w/ at least one CH under 18 This is an empty cell 0.027 (0.028, 0.038) (0.025, 0.036) (0.025, 0.036)
At least two generations present, Hisp CP This is an empty cell 0.027 (0.022, 0.031) (0.022, 0.032) (0.023, 0.033)
Hisp CP with at least one biological CH This is an empty cell 0.025 (0.020, 0.028) (0.019, 0.029) (0.020, 0.030)
At least three generations present This is an empty cell 0.023 (0.020, 0.028) (0.017, 0.026) (0.017, 0.026)
Only one parent This is an empty cell 0.020 (0.016, 0.024) (0.013, 0.021) (0.013, 0.021)
At least one stepchild This is an empty cell 0.019 (0.018, 0.026) (0.019, 0.030) (0.019, 0.030)
Adult Hisp male w/ at least one CH under 10 This is an empty cell 0.018 (0.017, 0.025) (0.014, 0.022) (0.014, 0.022)
At least one adopted CH, White CP This is an empty cell 0.008 (0.005, 0.010) (0.004, 0.010) (0.004, 0.011)
Black CP with at least two biological children This is an empty cell 0.006 (0.003, 0.007) (0.003, 0.007) (0.003, 0.007)
Black HH under 40, home owner This is an empty cell 0.005 (0.005, 0.009) (0.006, 0.013) (0.007, 0.013)
Three generations present, White CP This is an empty cell 0.005 (0.004, 0.008) (0.004, 0.010) (0.004, 0.009)
White HH under 25, home owner This is an empty cell 0.003 (0.002, 0.005) (0.003, 0.007) (0.003, 0.007)

Date modified: