Multiple imputation of missing values in household data with structural zeros
Section 6. Discussion

The empirical study suggests that the NDPMPM can provide high quality imputations for categorical data nested within households. To our knowledge, this is the first parametric imputation engine for nested multivariate categorical data. The study also illustrates that, with modest sample sizes, agencies should not expect the NDPMPM to preserve all features of the joint distribution. Of course, this is the case with any imputation engine. For the NDPMPM, agencies may be able to improve accuracy for targeted quantities by recoding the data used to fit the model. For example, one can create a new household-level variable that equals one when everyone has the same race and equals zero otherwise, and replace the individual race variable with a new variable that has levels “1 = race is the same as race of household head”, “2 = race is white and differs from race of household head”, “3 = race is black and differs from race of household head”, and so on. The NDPMPM would be estimated with the household-level same race variable and the new individual-level race variable. This would encourage the NDPMPM to estimate the percentages with the same race very accurately, as it would be just another household-level variable like home ownership. It also would add structural zeros involving race to the computation. Evaluating the trade offs in accuracy and computational costs of such recodings is a topic for future research.

The NDPMPM can be computationally expensive, even with the speed-ups presented in this article. The expensive parts of the algorithm are the rejection sampling steps. Fortunately, these can be done easily by parallel processing. For example, we can require each processor to generate a fraction of the impossible cases in Section 2.2. We also can spread the rejection steps for the imputations over many processors. These steps should cut run time by a factor roughly equal to the number of processors available.

The empirical study used households up to size four. We have run the model on data with households up to size seven in reasonable time (a few hours on a standard laptop). Accuracy results are similar qualitatively. As the household sizes get large, the model can generate hundreds or even thousands times as many impossible households as there are feasible ones, slowing the algorithm. In such cases, the cap-and-weight approach is essential for practical applications.

Acknowledgements

This research was supported by grants from the National Science Foundation (NSF SES 1131897) and the Alfred P. Sloan Foundation (G-2-15-20166003).

Appendix

This is an Appendix to the paper. It contains proof that the rejection sampling step S9' in Section 3 generates samples from the correct posterior distribution. It also contains the modified Gibbs sampler for the cap-and-weight approach and a list of the structural zero rules used in fitting the NDPMPM model. Finally, we include empirical results for the speedup approaches mentioned in the paper, using synthetic data, and additional results for handling missing data using the NDPMPM under a missing completely at random scenario.

A.1 Proof that the rejection sampling step S9' in Section 3 generates samples from the correct posterior distribution

The X i k 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGybWaa0baaSqaaiaadMgacaWGRb aabaGaaGymaaaaaaa@350D@ and X i j k 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGybWaa0baaSqaaiaadMgacaWGQb Gaam4Aaaqaaiaaigdaaaaaaa@35FC@ values generated using the rejection sampler in Step S9' are generated from the full conditionals, resulting in a valid Gibbs sampler. The proof follows from the properties of rejection sampling (or simple accept reject). The target distribution is the full conditional for X i mis . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHybWaa0baaSqaaiaadMgaaeaaca qGTbGaaeyAaiaabohaaaGccaaMb8UaaiOlaaaa@387E@ It can be re-expressed as

p ( X i mis ) = 1 { X i 1 S h } Pr ( X i S h | θ ) g ( X i mis ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbWaaeWaaeaacaWHybWaa0baaS qaaiaadMgaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaa caaMe8UaaGypamaalaaabaWefv3ySLgznfgDOfdaryqr1ngBPrginf gDObYtUvgaiqaacqWFXaqmdaGadaqaaiaahIfadaqhaaWcbaGaamyA aaqaaiaaigdaaaGccqGHjiYZcqWFse=udaWgaaWcbaGaamiAaaqaba aakiaawUhacaGL9baaaeaaciGGqbGaaiOCamaabmaabaGaaCiwamaa BaaaleaacaWGPbaabeaakiabgMGiplab=jr8tnaaBaaaleaacaWGOb aabeaakmaaeeaabaGaaGPaVlabeI7aXbGaay5bSdaacaGLOaGaayzk aaaaaiaadEgadaqadaqaaiaahIfadaqhaaWcbaGaamyAaaqaaiaab2 gacaqGPbGaae4CaaaaaOGaayjkaiaawMcaaaaa@6509@

where

g ( X i mis ) = π G i 1 k | a i k = 1 p + q λ G i 1 X i k 1 ( k ) ( j = 1 n i ω G i 1 M i j 1 k | b i j k = 1 p ϕ G i 1 M i j 1 X i j k 1 ( k ) ) . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGNbWaaeWaaeaacaWHybWaa0baaS qaaiaadMgaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaa caaI9aGaeqiWda3aaSbaaSqaaiaadEeadaqhaaadbaGaamyAaaqaai aaigdaaaaaleqaaOWaaebCaeqaleaadaabcaqaaiaadUgacaaMc8oa caGLiWoacaaMc8UaamyyamaaBaaameaacaWGPbGaam4AaaqabaWcca aI9aGaaGymaaqaaiaadchacqGHRaWkcaWGXbaaniabg+GivdGccaaM c8Uaeq4UdW2aa0baaSqaaiaadEeadaqhaaadbaGaamyAaaqaaiaaig daaaWccaWGybWaa0baaWqaaiaadMgacaWGRbaabaGaaGymaaaaaSqa amaabmaabaGaam4AaaGaayjkaiaawMcaaaaakmaabmaabaWaaebCae qaleaacaWGQbGaaGypaiaaigdaaeaacaWGUbWaaSbaaeaacaWGPbaa beaaa0Gaey4dIunakiaaykW7cqaHjpWDdaWgaaWcbaGaam4ramaaDa aameaacaWGPbaabaGaaGymaaaaliaad2eadaqhaaadbaGaamyAaiaa dQgaaeaacaaIXaaaaaWcbeaakmaarahabeWcbaWaaqGaaeaacaWGRb GaaGPaVdGaayjcSdGaaGPaVlaadkgadaWgaaadbaGaamyAaiaadQga caWGRbaabeaaliaai2dacaaIXaaabaGaamiCaaqdcqGHpis1aOGaaG PaVlabew9aMnaaDaaaleaacaWGhbWaa0baaWqaaiaadMgaaeaacaaI XaaaaSGaamytamaaDaaameaacaWGPbGaamOAaaqaaiaaigdaaaWcca WGybWaa0baaWqaaiaadMgacaWGQbGaam4Aaaqaaiaaigdaaaaaleaa daqadaqaaiaadUgaaiaawIcacaGLPaaaaaaakiaawIcacaGLPaaaca aIUaaaaa@8B89@

Our rejection scheme uses g ( X i mis ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGNbWaaeWaaeaacaWHybWaa0baaS qaaiaadMgaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaa aaa@38B7@ as a proposal for p ( X i mis ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbWaaeWaaeaacaWHybWaa0baaS qaaiaadMgaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaa caGGUaaaaa@3972@ To show that the draws are indeed from p ( X i mis ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbWaaeWaaeaacaWHybWaa0baaS qaaiaadMgaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaa caGGSaaaaa@3970@ we need to verify that w ( X i mis ) = p ( X i mis ) / g ( X i mis ) < M , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG3bWaaeWaaeaacaWHybWaa0baaS qaaiaadMgaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaa caaI9aWaaSGbaeaacaWGWbWaaeWaaeaacaWHybWaa0baaSqaaiaadM gaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaaaeaacaWG NbWaaeWaaeaacaWHybWaa0baaSqaaiaadMgaaeaacaqGTbGaaeyAai aabohaaaaakiaawIcacaGLPaaaaaGaaGipaiaad2eacaGGSaaaaa@4A8F@ where 1 < M < , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaaIXaGaaGipaiaad2eacaaI8aGaey OhIuQaaiilaaaa@36A4@ and that we are accepting each sample with probability w ( X i mis ) / M . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaWcgaqaaiaadEhadaqadaqaaiaahI fadaqhaaWcbaGaamyAaaqaaiaab2gacaqGPbGaae4CaaaaaOGaayjk aiaawMcaaaqaaiaad2eaaaGaaGzaVlaac6caaaa@3BEA@ In our case,

  1. w ( X i mis ) = p ( X i mis ) / g ( X i mis ) = 1 { X i 1 S h } / Pr ( X i S h | θ ) 1 / Pr ( X i S h | θ ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG3bWaaeWaaeaacaWHybWaa0baaS qaaiaadMgaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaa caaI9aWaaSGbaeaacaWGWbWaaeWaaeaacaWHybWaa0baaSqaaiaadM gaaeaacaqGTbGaaeyAaiaabohaaaaakiaawIcacaGLPaaaaeaacaWG NbWaaeWaaeaacaWHybWaa0baaSqaaiaadMgaaeaacaqGTbGaaeyAai aabohaaaaakiaawIcacaGLPaaaaaGaaGypamaalyaabaWefv3ySLgz nfgDOfdaryqr1ngBPrginfgDObYtUvgaiqaacqWFXaqmdaGadaqaai aahIfadaqhaaWcbaGaamyAaaqaaiaaigdaaaGccqGHjiYZcqWFse=u daWgaaWcbaGaamiAaaqabaaakiaawUhacaGL9baaaeaaciGGqbGaai OCamaabmaabaWaaqGaaeaacaWHybWaaSbaaSqaaiaadMgaaeqaaOGa eyycI8Sae8NeXp1aaSbaaSqaaiaadIgaaeqaaOGaaGPaVdGaayjcSd GaaGPaVlabeI7aXbGaayjkaiaawMcaaiabgsMiJoaalyaabaGaaGym aaqaaiGaccfacaGGYbWaaeWaaeaadaabcaqaaiaahIfadaWgaaWcba GaamyAaaqabaGccqGHjiYZcqWFse=udaWgaaWcbaGaamiAaaqabaGc caaMc8oacaGLiWoacaaMc8UaeqiUdehacaGLOaGaayzkaaaaaaaaca GGSaaaaa@80C0@ and 0 < Pr ( X i S h | θ ) < 1 1 < 1 / Pr ( X i S h | θ ) < MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaaIWaGaaGipaiGaccfacaGGYbWaae WaaeaadaabcaqaaiaahIfadaWgaaWcbaGaamyAaaqabaGccqGHjiYZ tuuDJXwAK1uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGabaiab=jr8tn aaBaaaleaacaWGObaabeaakiaaykW7aiaawIa7aiaaykW7cqaH4oqC aiaawIcacaGLPaaacaaI8aGaaGymaiaaysW7cqGHshI3caaMe8UaaG ymaiaaiYdadaWcgaqaaiaaigdaaeaaciGGqbGaaiOCamaabmaabaWa aqGaaeaacaWHybWaaSbaaSqaaiaadMgaaeqaaOGaeyycI8Sae8NeXp 1aaSbaaSqaaiaadIgaaeqaaOGaaGPaVdGaayjcSdGaaGPaVlabeI7a XbGaayjkaiaawMcaaaaacaaI8aGaeyOhIukaaa@6874@ necessarily.
  2. By sampling until we obtain a valid sample that satisfies X i 1 S h , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHybWaa0baaSqaaiaadMgaaeaaca aIXaaaaOGaeyycI88efv3ySLgznfgDOfdaryqr1ngBPrginfgDObYt UvgaiqaacqWFse=udaWgaaWcbaGaamiAaaqabaGccaGGSaaaaa@42E8@ we are indeed sampling with probability w ( X i mis ) / M = 1 { X i 1 S h } . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaWcgaqaaiaadEhadaqadaqaaiaahI fadaqhaaWcbaGaamyAaaqaaiaab2gacaqGPbGaae4CaaaaaOGaayjk aiaawMcaaaqaaiaad2eaaaGaaGypamrr1ngBPrwtHrhAXaqeguuDJX wAKbstHrhAG8KBLbaceaGae8xmaeZaaiWaaeaacaWHybWaa0baaSqa aiaadMgaaeaacaaIXaaaaOGaeyycI8Sae8NeXp1aaSbaaSqaaiaadI gaaeqaaaGccaGL7bGaayzFaaGaaiOlaaaa@4F0E@

A.2 Modified Gibbs sampler for the cap-and-weight approach

The modified Gibbs sampler for the cap-and-weight approach replaces steps S1, S3, S4, S5 and S6 of the Gibbs sampler in the main text as follows.

u g | Beta ( 1 + U g , α + f = g + 1 F U f ) , π g = u g f < g ( 1 u f ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaabcaqaaiaadwhadaWgaaWcbaGaam 4zaaqabaGccaaMc8oacaGLiWoacaaMc8UaeyOeI0seeuuDJXwAKbsr 4rNCHbaceaGae8hpIOJaaeOqaiaabwgacaqG0bGaaeyyamaabmaaba GaaGymaiabgUcaRiaadwfadaWgaaWcbaGaam4zaaqabaGccaaMb8Ua aGilaiaaysW7cqaHXoqycqGHRaWkdaaeWbqabSqaaiaadAgacaaI9a Gaam4zaiabgUcaRiaaigdaaeaacaWGgbaaniabggHiLdGccaaMc8Ua amyvamaaBaaaleaacaWGMbaabeaaaOGaayjkaiaawMcaaiaaiYcaca aMe8UaaGjbVlabec8aWnaaBaaaleaacaWGNbaabeaakiaai2dacaWG 1bWaaSbaaSqaaiaadEgaaeqaaOWaaebuaeqaleaacaWGMbGaaGipai aadEgaaeqaniabg+GivdGcdaqadaqaaiaaigdacqGHsislcaWG1bWa aSbaaSqaaiaadAgaaeqaaaGccaGLOaGaayzkaaaaaa@6C3F@

where

U g = i = 1 n 1 ( G i 1 = g ) + h H 1 ψ h i | n i 0 = h 1 ( G i 0 = g ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGvbWaaSbaaSqaaiaadEgaaeqaaO GaaGypamaaqahabeWcbaGaamyAaiaai2dacaaIXaaabaGaamOBaaqd cqGHris5aOGaaGPaVprr1ngBPrwtHrhAXaqeguuDJXwAKbstHrhAG8 KBLbaceaGae8xmaeZaaeWaaeaacaWGhbWaa0baaSqaaiaadMgaaeaa caaIXaaaaOGaaGypaiaadEgaaiaawIcacaGLPaaacqGHRaWkdaaeqb qabSqaaiaadIgacqGHiiIZcqWFlecsaeqaniabggHiLdGccaaMe8+a aSaaaeaacaaIXaaabaGaeqiYdK3aaSbaaSqaaiaadIgaaeqaaaaakm aaqafabeWcbaWaaqGaaeaacaWGPbGaaGPaVdGaayjcSdGaaGPaVlaa d6gadaqhaaadbaGaamyAaaqaaiaaicdaaaWccaaI9aGaamiAaaqab0 GaeyyeIuoakiaaykW7cqWFXaqmdaqadaqaaiaadEeadaqhaaWcbaGa amyAaaqaaiaaicdaaaGccaaI9aGaam4zaaGaayjkaiaawMcaaaaa@6C2B@

for g = 1, , F 1. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGNbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8UaamOraiabgkHiTiaaigdacaGGUaaa aa@3CA5@

v g m | Beta ( 1 + V g m , β + s = m + 1 S V g s ) , ω g m = v g m s < m ( 1 v g s ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaabcaqaaiaadAhadaWgaaWcbaGaam 4zaiaad2gaaeqaaOGaaGPaVdGaayjcSdGaaGPaVlabgkHiTebbfv3y SLgzGueE0jxyaGabaiab=XJi6iaabkeacaqGLbGaaeiDaiaabggada qadaqaaiaaigdacqGHRaWkcaWGwbWaaSbaaSqaaiaadEgacaWGTbaa beaakiaaiYcacaaMe8UaeqOSdiMaey4kaSYaaabCaeqaleaacaWGZb GaaGypaiaad2gacqGHRaWkcaaIXaaabaGaam4uaaqdcqGHris5aOGa aGPaVlaadAfadaWgaaWcbaGaam4zaiaadohaaeqaaaGccaGLOaGaay zkaaGaaGilaiaaysW7caaMe8UaeqyYdC3aaSbaaSqaaiaadEgacaWG Tbaabeaakiaai2dacaWG2bWaaSbaaSqaaiaadEgacaWGTbaabeaakm aarafabeWcbaGaam4CaiaaiYdacaWGTbaabeqdcqGHpis1aOWaaeWa aeaacaaIXaGaeyOeI0IaamODamaaBaaaleaacaWGNbGaam4Caaqaba aakiaawIcacaGLPaaaaaa@70B9@

where

V g m = i = 1 n 1 ( M i j 1 = m , G i 1 = g ) + h H 1 ψ h i | n i 0 = h 1 ( M i j 0 = m , G i 0 = g ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGwbWaaSbaaSqaaiaadEgacaWGTb aabeaakiaai2dadaaeWbqabSqaaiaadMgacaaI9aGaaGymaaqaaiaa d6gaa0GaeyyeIuoakiaaykW7tuuDJXwAK1uy0HwmaeHbfv3ySLgzG0 uy0Hgip5wzaGabaiab=fdaXmaabmaabaGaamytamaaDaaaleaacaWG PbGaamOAaaqaaiaaigdaaaGccaaI9aGaamyBaiaaiYcacaaMe8Uaam 4ramaaDaaaleaacaWGPbaabaGaaGymaaaakiaai2dacaWGNbaacaGL OaGaayzkaaGaey4kaSYaaabuaeqaleaacaWGObGaeyicI4Sae83cHG eabeqdcqGHris5aOWaaSaaaeaacaaIXaaabaGaeqiYdK3aaSbaaSqa aiaadIgaaeqaaaaakmaaqafabeWcbaWaaqGaaeaacaWGPbGaaGPaVd GaayjcSdGaaGPaVlaad6gadaqhaaadbaGaamyAaaqaaiaaicdaaaWc caaI9aGaamiAaaqab0GaeyyeIuoakiaaykW7cqWFXaqmdaqadaqaai aad2eadaqhaaWcbaGaamyAaiaadQgaaeaacaaIWaaaaOGaaGypaiaa d2gacaaISaGaaGjbVlaadEeadaqhaaWcbaGaamyAaaqaaiaaicdaaa GccaaI9aGaam4zaaGaayjkaiaawMcaaaaa@7ACA@

for m = 1, , S 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGTbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8Uaam4uaiabgkHiTiaaigdaaaa@3C06@ and g = 1, , F . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGNbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8UaamOraiaac6caaaa@3AFD@

λ g ( k ) | Dirichlet ( 1 + η g 1 ( k ) , , 1 + η g d k ( k ) ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaabcaqaaiabeU7aSnaaDaaaleaaca WGNbaabaWaaeWaaeaacaWGRbaacaGLOaGaayzkaaaaaOGaaGPaVdGa ayjcSdGaaGPaVlabgkHiTebbfv3ySLgzGueE0jxyaGabaiab=XJi6i aabseacaqGPbGaaeOCaiaabMgacaqGJbGaaeiAaiaabYgacaqGLbGa aeiDamaabmaabaGaaGymaiabgUcaRiabeE7aOnaaDaaaleaacaWGNb GaaGymaaqaamaabmaabaGaam4AaaGaayjkaiaawMcaaaaakiaaygW7 caaISaGaaGjbVlablAciljaaiYcacaaMe8UaaGymaiabgUcaRiabeE 7aOnaaDaaaleaacaWGNbGaamizamaaBaaameaacaWGRbaabeaaaSqa amaabmaabaGaam4AaaGaayjkaiaawMcaaaaaaOGaayjkaiaawMcaaa aa@638E@

where

η g c ( k ) = i | G i 1 = g n 1 ( X i k 1 = c ) + h H 1 ψ h i | n i 0 = h , G i 0 = g 1 ( X i k 0 = c ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaH3oaAdaqhaaWcbaGaam4zaiaado gaaeaadaqadaqaaiaadUgaaiaawIcacaGLPaaaaaGccaaI9aWaaabC aeqaleaadaabcaqaaiaadMgacaaMc8oacaGLiWoacaaMc8Uaam4ram aaDaaameaacaWGPbaabaGaaGymaaaaliaai2dacaWGNbaabaGaamOB aaqdcqGHris5aOGaaGPaVprr1ngBPrwtHrhAXaqeguuDJXwAKbstHr hAG8KBLbaceaGae8xmaeZaaeWaaeaacaWGybWaa0baaSqaaiaadMga caWGRbaabaGaaGymaaaakiaai2dacaWGJbaacaGLOaGaayzkaaGaey 4kaSYaaabuaeqaleaacaWGObGaeyicI4Sae83cHGeabeqdcqGHris5 aOWaaSaaaeaacaaIXaaabaGaeqiYdK3aaSbaaSqaaiaadIgaaeqaaa aakmaaqafabeWcbaWaaqGaaeaacaWGPbGaaGPaVdGaayjcSdGaaGPa Vtaaceqaaiaad6gadaqhaaadbaGaamyAaaqaaiaaicdaaaWccaaI9a GaamiAaiaayIW7caaISaGaaGjbVlaadEeadaqhaaadbaGaamyAaaqa aiaaicdaaaWccaaI9aGaam4zaaaaaeqaniabggHiLdGccaaMc8Uae8 xmaeZaaeWaaeaacaWGybWaa0baaSqaaiaadMgacaWGRbaabaGaaGim aaaakiaai2dacaWGJbaacaGLOaGaayzkaaaaaa@8093@

for g = 1, , F MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGNbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8UaamOraaaa@3A4B@ and k = p + 1, , q . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGRbGaaGypaiaadchacqGHRaWkca aIXaGaaGilaiaaysW7cqWIMaYscaaISaGaaGjbVlaadghacaGGUaaa aa@3D03@

ϕ g m ( k ) | Dirichlet ( 1 + ν g m 1 ( k ) , , 1 + ν g m d k ( k ) ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaabcaqaaiabew9aMnaaDaaaleaaca WGNbGaamyBaaqaamaabmaabaGaam4AaaGaayjkaiaawMcaaaaakiaa ykW7aiaawIa7aiaaykW7cqGHsislrqqr1ngBPrgifHhDYfgaiqaacq WF8iIocaqGebGaaeyAaiaabkhacaqGPbGaae4yaiaabIgacaqGSbGa aeyzaiaabshadaqadaqaaiaaigdacqGHRaWkcqaH9oGBdaqhaaWcba Gaam4zaiaad2gacaaIXaaabaWaaeWaaeaacaWGRbaacaGLOaGaayzk aaaaaOGaaGilaiaaysW7cqWIMaYscaaISaGaaGjbVlaaigdacqGHRa WkcqaH9oGBdaqhaaWcbaGaam4zaiaad2gacaWGKbWaaSbaaWqaaiaa dUgaaeqaaaWcbaWaaeWaaeaacaWGRbaacaGLOaGaayzkaaaaaaGcca GLOaGaayzkaaaaaa@6506@

where

ν g m c ( k ) = i | G i 1 = g , M i j 1 = m n 1 ( X i j k 1 = c ) + h H 1 ψ h i | n i 0 = h , G i 0 = g , M i j 0 = m 1 ( X i j k 0 = c ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaH9oGBdaqhaaWcbaGaam4zaiaad2 gacaWGJbaabaWaaeWaaeaacaWGRbaacaGLOaGaayzkaaaaaOGaaGyp amaaqahabeWcbaWaaqGaaeaacaWGPbGaaGPaVdGaayjcSdGaaGPaVt aaceqaaiaadEeadaqhaaadbaGaamyAaaqaaiaaigdaaaWccaaI9aGa am4zaiaayIW7caaISaGaaGjbVlaad2eadaqhaaadbaGaamyAaiaadQ gaaeaacaaIXaaaaSGaaGypaiaad2gaaaaabaGaamOBaaqdcqGHris5 aOGaaGPaVprr1ngBPrwtHrhAXaqeguuDJXwAKbstHrhAG8KBLbacea Gae8xmaeZaaeWaaeaacaWGybWaa0baaSqaaiaadMgacaWGQbGaam4A aaqaaiaaigdaaaGccaaI9aGaam4yaaGaayjkaiaawMcaaiabgUcaRm aaqafabeWcbaGaamiAaiabgIGiolab=Tqiibqab0GaeyyeIuoakiaa ysW7daWcaaqaaiaaigdaaeaacqaHipqEdaWgaaWcbaGaamiAaaqaba aaaOGaaGjbVpaaqafabeWcbaWaaqGaaeaacaWGPbGaaGPaVdGaayjc SdGaaGPaVtaaceqaaiaad6gadaqhaaadbaGaamyAaaqaaiaaicdaaa WccaaI9aGaamiAaiaaiYcacaaMe8Uaam4ramaaDaaameaacaWGPbaa baGaaGimaaaaliaai2dacaWGNbGaaGjcVlaaiYcacaaMe8Uaamytam aaDaaameaacaWGPbGaamOAaaqaaiaaicdaaaWccaaI9aGaamyBaaaa aeqaniabggHiLdGccaaMc8Uae8xmaeZaaeWaaeaacaWGybWaa0baaS qaaiaadMgacaWGQbGaam4AaaqaaiaaicdaaaGccaaI9aGaam4yaaGa ayjkaiaawMcaaaaa@975F@

for g = 1, , F , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGNbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8UaamOraiaacYcaaaa@3AFB@ m = 1, , S MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGTbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8Uaam4uaaaa@3A5E@ and k = 1, , p . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGRbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8UaamiCaiaac6caaaa@3B2B@

A.3 List of structural zeros

We fit the NDPMPM model using structural zeros which involve ages and relationships of individuals in the same house. The full list of the rules used is presented in Table A.1. These rules were derived from the 2012 ACS by identifying combinations involving the relationship variable that do not appear in the constructed population. This list should not be interpreted as a “true” list of impossible combinations in census data.


Table A.1
List of structural zeros
Table summary
This table displays the results of List of structural zeros. The information is grouped by Description (appearing as row headers), (appearing as column headers).
Description This is an empty column This is an empty column
Rules common to generating both the synthetic and imputed datasets
1 Each household must contain exactly one head and he/she must be at least 16 years old.
2 Each household cannot contain more than one spouse and he/she must be at least 16 years old.
3 Married couples are of opposite sex, and age difference between individuals in the couples cannot exceed 49.
4 The youngest parent must be older than the household head by at least 4.
5 The youngest parent-in-law must be older than the household head by at least 4.
6 The age difference between the household head and siblings cannot exceed 37.
7 The household head must be at least 31 years old to be a grandparent and his/her spouse must be at least 17. Also, He/she must be older than the oldest grandchild by at least 26.
Rules specific to generating the synthetic datasets 8 The household head must be older than the oldest child by at least 7.
Rules specific to generating the imputed datasets 9 The household head must be older than the oldest biological child by at least 7.
10 The household head must be older than the oldest adopted child by at least 11.
11 The household head must be older than the oldest stepchild by at least 9.

A.4 Empirical study of the speedup approaches

We evaluate the performance of the two speedup approaches mentioned in the main text using synthetic data. We use data from the public use microdata files from the 2012 ACS, available for download from the United States Census Bureau (http://www2.census.gov/acs2012_1yr/pums/) to construct a population of 857,018 households of sizes H = { 2, 3, 4, 5, 6 } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaatuuDJXwAK1uy0HwmaeHbfv3ySLgzG0 uy0Hgip5wzaGabaiab=Tqiijaai2dadaGadaqaaiaaikdacaaISaGa aGjbVlaaiodacaaISaGaaGjbVlaaisdacaaISaGaaGjbVlaaiwdaca aISaGaaGjbVlaaiAdaaiaawUhacaGL9baacaGGSaaaaa@4C56@ from which we sample n = 10,000 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbGaaGypaiaabgdacaqGWaGaae ilaiaabcdacaqGWaGaaeimaaaa@3753@ households comprising N = 29,117 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGobGaaGypaiaabkdacaqG5aGaae ilaiaabgdacaqGXaGaae4naaaa@3746@ individuals. We work with the variables described in Table A.2. We evaluate the approaches using probabilities that depend on within household relationships and the household head.


Table A.2
Description of variables used in the synthetic data illustration
Table summary
This table displays the results of Description of variables used in the synthetic data illustration. The information is grouped by Description of variable (appearing as row headers), Categories (appearing as column headers).
Description of variable Categories
Household-level variables Ownership of dwelling 1 = owned or being bought, 2 = rented
Household size 2 = 2 people, 3 = 3 people, 4 = 4 people,
5 = 5 people, 6 = 6 people
Individual-level variables Gender 1 = male, 2 = female
Race 1 = white, 2 = black,
3 = American Indian or Alaska native,
4 = Chinese, 5 = Japanese,
6 = other Asian/Pacific islander, 7 = other race,
8 = two major races,
9 = three or more major races
Hispanic origin 1 = not Hispanic, 2 = Mexican,
3 = Puerto Rican, 4 = Cuban, 5 = other
Age 1 = less than one year old, 2 = 1 year old,
3 = 2 years old, ..., 96 = 95 years old
Relationship to head of household 1 = household head, 2 = spouse, 3 = child,
4 = child-in-law, 5 = parent, 6 = parent-in-law,
7 = sibling, 8 = sibling-in-law, 9 = grandchild,
10 = other relative, 11 = partner/friend/visitor,
12 = other non-relative

We consider the NDPMPM using two approaches, both moving the values of the household head to the household level as in Section 4.1 of the main text and also using the cap-and-weight approach in Section 4.2 of the main text. The first approach considers ψ 2 = ψ 3 = ψ 4 = ψ 5 = ψ 6 = 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGOmaaqaba GccaaI9aGaeqiYdK3aaSbaaSqaaiaaiodaaeqaaOGaaGypaiabeI8a 5naaBaaaleaacaaI0aaabeaakiaai2dacqaHipqEdaWgaaWcbaGaaG ynaaqabaGccaaI9aGaeqiYdK3aaSbaaSqaaiaaiAdaaeqaaOGaaGyp aiaaigdaaaa@43D2@ while the second approach considers ψ 2 = ψ 3 = 1 / 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGOmaaqaba GccaaI9aGaeqiYdK3aaSbaaSqaaiaaiodaaeqaaOGaaGypamaalyaa baGaaGymaaqaaiaaikdaaaaaaa@3A06@ and ψ 4 = ψ 5 = ψ 6 = 1 / 3 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGinaaqaba GccaaI9aGaeqiYdK3aaSbaaSqaaiaaiwdaaeqaaOGaaGypaiabeI8a 5naaBaaaleaacaaI2aaabeaakiaai2dadaWcgaqaaiaaigdaaeaaca aIZaaaaiaac6caaaa@3E48@ We compare these approaches to the NDPMPM as presented in Hu et al., 2018. For each approach, we create L = 50 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGmbGaaGypaiaaiwdacaaIWaaaaa@347B@ synthetic datasets, Z = ( Z ( 1 ) , , Z ( 50 ) ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHAbGaaGypamaabmaabaGaaCOwam aaCaaaleqabaWaaeWaaeaacaaIXaaacaGLOaGaayzkaaaaaOGaaGza VlaaiYcacaaMe8UaeSOjGSKaaGilaiaaysW7caWHAbWaaWbaaSqabe aadaqadaqaaiaaiwdacaaIWaaacaGLOaGaayzkaaaaaaGccaGLOaGa ayzkaaGaaiOlaaaa@43FB@ We generate the synthetic datasets so that the number of households of size h H MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGObGaeyicI48efv3ySLgznfgDOf daryqr1ngBPrginfgDObYtUvgaiqaacqWFlecsaaa@3E5D@ in each Z ( l ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHAbWaaWbaaSqabeaadaqadaqaai aadYgaaiaawIcacaGLPaaaaaaaaa@34F4@ exactly matches n h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadIgaaeqaaa aa@3376@ from the observed data. Thus, Z MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHAbaaaa@324D@ comprises partially synthetic data (Little, 1993; Reiter, 2003), even though every released Z i j k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGAbWaaSbaaSqaaiaadMgacaWGQb Gaam4Aaaqabaaaaa@3542@ is a simulated value. We combine the estimates using using the approach in Reiter (2003). As a brief review, let q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGXbaaaa@3260@ be the point estimator of some estimand Q , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbGaaiilaaaa@32F0@ and let u MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG1baaaa@3264@ be the estimator of variance associated with q . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGXbGaaiOlaaaa@3312@ For l = 1, , L , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGSbGaaGypaiaaigdacaaISaGaaG jbVlablAciljaaiYcacaaMe8UaamitaiaacYcaaaa@3B06@ let q l MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGXbWaaSbaaSqaaiaadYgaaeqaaa aa@337D@ and u l MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG1bWaaSbaaSqaaiaadYgaaeqaaa aa@3381@ be the values of q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGXbaaaa@3260@ and u MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG1baaaa@3264@ in synthetic dataset Z ( l ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHAbWaaWbaaSqabeaadaqadaqaai aadYgaaiaawIcacaGLPaaaaaGccaaMb8UaaiOlaaaa@373A@ We use q ¯ = l = 1 L q l / L MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGXbGbaebacaaI9aWaaSGbaeaada aeWaqabSqaaiaadYgacaaI9aGaaGymaaqaaiaadYeaa0GaeyyeIuoa kiaaykW7caWGXbWaaSbaaSqaaiaadYgaaeqaaaGcbaGaamitaaaaaa a@3D1F@ as the point estimate of Q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbaaaa@3240@ and T = u ¯ + b / L MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGubGaaGypamaalyaabaGabmyDay aaraGaey4kaSIaamOyaaqaaiaadYeaaaaaaa@36CC@ as the estimated variance of q ¯ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGXbGbaebacaGGSaaaaa@3328@ where b = l = 1 L ( q l q ¯ ) 2 / ( L 1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGIbGaaGypamaalyaabaWaaabmae qaleaacaWGSbGaaGypaiaaigdaaeaacaWGmbaaniabggHiLdGcdaqa daqaaiaadghadaWgaaWcbaGaamiBaaqabaGccqGHsislceWGXbGbae baaiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaaaakeaadaqadaqa aiaadYeacqGHsislcaaIXaaacaGLOaGaayzkaaaaaaaa@4315@ and u ¯ = l = 1 L u l / L . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWG1bGbaebacaaI9aWaaSGbaeaada aeWaqabSqaaiaadYgacaaI9aGaaGymaaqaaiaadYeaa0GaeyyeIuoa kiaaykW7caWG1bWaaSbaaSqaaiaadYgaaeqaaaGcbaGaamitaaaaca GGUaaaaa@3DD9@ We make inference about Q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbaaaa@3240@ using ( q ¯ Q ) t v ( 0, T ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaqadaqaaiqadghagaqeaiabgkHiTi aadgfaaiaawIcacaGLPaaarqqr1ngBPrgifHhDYfgaiqaacqWF8iIo caWG0bWaaSbaaSqaaiaadAhaaeqaaOWaaeWaaeaacaaIWaGaaGilai aaysW7caWGubaacaGLOaGaayzkaaGaaiilaaaa@43B5@ where t v MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG0bWaaSbaaSqaaiaadAhaaeqaaa aa@338A@ is a t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG0bGaaGjcVlabgkHiTaaa@34E1@ distribution with v = ( L 1 ) ( 1 + L u ¯ / b ) 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG2bGaaGypamaabmaabaGaamitai abgkHiTiaaigdaaiaawIcacaGLPaaadaqadaqaaiaaigdacqGHRaWk daWcgaqaaiaadYeaceWG1bGbaebaaeaacaWGIbaaaaGaayjkaiaawM caamaaCaaaleqabaGaaGOmaaaaaaa@3E1D@ degrees of freedom.

For each approach, we run the MCMC sampler for 20,000 iterations, discarding the first 10,000 as burn-in and thinning the remaining samples every five iterations, resulting in 2,000 MCMC post burn-in iterates. We create the L = 50 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGmbGaaGypaiaaiwdacaaIWaaaaa@347A@ synthetic datasets by randomly sampling from the 2,000 iterates. We set F = 40 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGgbGaaGypaiaaisdacaaIWaaaaa@3473@ and S = 15 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGtbGaaGypaiaaigdacaaI1aaaaa@3482@ for each approach based on initial tuning runs. For convergence, we examined trace plots of α MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHXoqyaaa@3308@ , β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHYoGyaaa@330A@ and weighted averages of a random sample of the multinomial probabilities in the NDPMPM likelihood. Across the approaches, the effective number of occupied household-level clusters usually ranges from 20 to 33 with a maximum of 38, while the effective number of occupied individual-level clusters across all household-level clusters ranges from 5 to 9 with a maximum of 12.

Based on MCMC runs on a standard laptop, moving household heads’ data values to the household level alone results in a speedup of about 63% on the default rejection sampler while the cap-and-weight approach alone results in a speedup of about 40%.

Table A.3 shows the 95% confidence intervals for each approach. Essentially, all three approaches result in similar confidence intervals, suggesting not much loss in accuracy from the speedups. Most intervals also are reasonably similar to confidence intervals based on the original data, except for the percentage of same age couples. The last row is a rigorous test of how well each method can estimate a probability that can be fairly difficult to estimate accurately. In this case, the probability that a household head and spouse are the same age can be difficult to estimate since each individual’s age can take 96 different values. All three approaches are thus off from the estimate from the original data in this case. These results suggest that we can significantly speedup the sampler with minimal loss in accuracy of estimates and confidence intervals of population estimands.


Table A.3
Confidence intervals for selected probabilities that depend on within-household relationships in the original and synthetic datasets. “Original” is based on the sampled data, “NDPMPM” is the default MCMC sampler described in Section 2.2 of the main text, “NDPMPM w/ HH moved” is the default sampler, moving household heads’ data values to the household level, “NDPMPM capped w/ HH moved” uses the cap-and-weight approach and moving household heads’ data values to the household level. “HH ” means household head and “SP” means spouse
Table summary
This table displays the results of Confidence intervals for selected probabilities that depend on within-household relationships in the original and synthetic datasets. “Original” is based on the sampled data Original, NDPMPM, NDPMPM w/ HH moved and NDPMPM capped w/ HH moved (appearing as column headers).
Original NDPMPM NDPMPM w/ HH moved NDPMPM capped w/ HH moved
All same race n i =2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ (0.939, 0.951) (0.918, 0.932) (0.912, 0.928) (0.910, 0.925)
n i =3 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ (0.896, 0.920) (0.859, 0.888) (0.845, 0.875) (0.844, 0.874)
n i =4 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ (0.885, 0.912) (0.826, 0.860) (0.813, 0.848) (0.817, 0.852)
n i =5 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ (0.879, 0.922) (0.786, 0.841) (0.786, 0.841) (0.777, 0.834)
n i =6 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ (0.831, 0.910) (0.701, 0.803) (0.718, 0.819) (0.660, 0.768)
SP present This is an empty cell (0.693, 0.711) (0.678, 0.697) (0.676, 0.695) (0.677, 0.695)
SP with white HH This is an empty cell (0.589, 0.608) (0.577, 0.597) (0.576, 0.595) (0.575, 0.595)
SP with black HH This is an empty cell (0.036, 0.043) (0.035, 0.043) (0.034, 0.042) (0.034, 0.042)
White couple This is an empty cell (0.570, 0.589) (0.560, 0.580) (0.553, 0.573) (0.552, 0.572)
White couple, own This is an empty cell (0.495, 0.514) (0.468, 0.488) (0.461, 0.481) (0.463, 0.483)
Same race couple This is an empty cell (0.655, 0.673) (0.636, 0.655) (0.626, 0.645) (0.625, 0.644)
White-nonwhite couple This is an empty cell (0.028, 0.035) (0.028, 0.035) (0.034, 0.041) (0.036, 0.044)
Nonwhite couple, own This is an empty cell (0.057, 0.067) (0.047, 0.056) (0.045, 0.053) (0.045, 0.054)
Only mother present This is an empty cell (0.017, 0.022) (0.014, 0.019) (0.014, 0.019) (0.013, 0.018)
Only one parent present This is an empty cell (0.021, 0.026) (0.026, 0.032) (0.026, 0.033) (0.027, 0.033)
Children present This is an empty cell (0.507, 0.527) (0.493, 0.512) (0.517, 0.537) (0.511, 0.531)
Siblings present This is an empty cell (0.022, 0.028) (0.027, 0.034) (0.027, 0.033) (0.027, 0.033)
Grandchild present This is an empty cell (0.041, 0.049) (0.051, 0.060) (0.049, 0.058) (0.050, 0.059)
Three generations present This is an empty cell (0.036, 0.044) (0.037, 0.045) (0.042, 0.050) (0.040, 0.048)
White HH, older than SP This is an empty cell (0.309, 0.327) (0.283, 0.301) (0.294, 0.313) (0.302, 0.321)
Nonhisp HH This is an empty cell (0.882, 0.894) (0.875, 0.888) (0.879, 0.891) (0.876, 0.889)
White, Hisp HH This is an empty cell (0.071, 0.082) (0.074, 0.085) (0.072, 0.082) (0.073, 0.084)
Same age couple This is an empty cell (0.087, 0.098) (0.027, 0.034) (0.023, 0.029) (0.024, 0.031)

A.5 Empirical study of missing data imputation under MCAR

We also evaluate the performance of the NDPMPM as an imputation method under a missing completely at random (MCAR) scenario. We use the same data as in Section 5 of the main text. As a reminder, the data contains n = 5,000 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbGaaGypaiaabwdacaqGSaGaae imaiaabcdacaqGWaaaaa@36A4@ households of sizes H = { 2, 3, 4 } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaatuuDJXwAK1uy0HwmaeHbfv3ySLgzG0 uy0Hgip5wzaGabaiab=Tqiijaai2dadaGadaqaaiaaikdacaaISaGa aGjbVlaaiodacaaISaGaaGjbVlaaisdaaiaawUhacaGL9baacaGGSa aaaa@4651@ comprising N = 13,181 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGobGaaGypaiaabgdacaqGZaGaae ilaiaabgdacaqG4aGaaeymaaaa@3740@ individuals. We introduce missing values using a MCAR scenario. We randomly select 80% households to be complete cases for all variables. For the remaining 20%, we let the variable “household size” be fully observed and randomly MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfKttLearuGrYvMBJHgitnMCPbhDG0evam XvP5wqSXMqHnxAJn0BKvguHDwzZbqegqvATv2CG4uz3bIuV1wyUbqe dmvETj2BSbqegm0B1jxALjhiov2DaebbnrfifHhDYfgasaacH8rrpk 0dbbf9q8WrFfeuY=Hhbbf9v8vrpy0dd9qqpae9q8qqvqFr0dXdHiVc =bYP0xH8peuj0lXxfrpe0=vqpeeaY=brpwe9Fve9Fve8meaacaGacm GadaWaaiqacaabaiaafaaakeaaiiaajugybabaaaaaaaaapeGaa83e Gaaa@3ECD@ and independently MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfKttLearuGrYvMBJHgitnMCPbhDG0evam XvP5wqSXMqHnxAJn0BKvguHDwzZbqegqvATv2CG4uz3bIuV1wyUbqe dmvETj2BSbqegm0B1jxALjhiov2DaebbnrfifHhDYfgasaacH8rrpk 0dbbf9q8WrFfeuY=Hhbbf9v8vrpy0dd9qqpae9q8qqvqFr0dXdHiVc =bYP0xH8peuj0lXxfrpe0=vqpeeaY=brpwe9Fve9Fve8meaacaGacm GadaWaaiqacaabaiaafaaakeaaiiaajugybabaaaaaaaaapeGaa83e Gaaa@3ECD@ blank 50% of each variable for the remaining household-level and individual-level variables. We use these low rates to mimic the actual rates of item nonresponse in census data.

Similar to the main text, we estimate the NDPMPM using two approaches, both combining the rejection step in Section 4.1 of the main text with the cap-and-weight approach in Section 4.2 of the main text. The first approach considers ψ 2 = ψ 3 = ψ 4 = 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGOmaaqaba GccaaI9aGaeqiYdK3aaSbaaSqaaiaaiodaaeqaaOGaaGypaiabeI8a 5naaBaaaleaacaaI0aaabeaakiaai2dacaaIXaaaaa@3CBD@ while the second approach considers ψ 2 = ψ 3 = 1 / 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGOmaaqaba GccaaI9aGaeqiYdK3aaSbaaSqaaiaaiodaaeqaaOGaaGypamaalyaa baGaaGymaaqaaiaaikdaaaaaaa@3A06@ and ψ 4 = 1 / 3 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHipqEdaWgaaWcbaGaaGinaaqaba GccaaI9aWaaSGbaeaacaaIXaaabaGaaG4maaaacaGGUaaaaa@3733@ For each approach, we run the MCMC sampler for 10,000 iterations, discarding the first 5,000 as burn-in and thinning the remaining samples every five iterations, resulting in 1,000 MCMC post burn-in iterates. We set F = 30 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGgbGaaGypaiaaiodacaaIWaaaaa@3473@ and S = 15 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGtbGaaGypaiaaigdacaaI1aaaaa@3483@ for each approach based on initial tuning runs. We monitor convergence as in the main text. For both methods, we generate L = 50 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGmbGaaGypaiaaiwdacaaIWaaaaa@347B@ completed datasets, Z = ( Z ( 1 ) , , Z ( 50 ) ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHAbGaaGypamaabmaabaGaaCOwam aaCaaaleqabaWaaeWaaeaacaaIXaaacaGLOaGaayzkaaaaaOGaaGza VlaaiYcacaaMe8UaeSOjGSKaaGilaiaaysW7caWHAbWaaWbaaSqabe aadaqadaqaaiaaiwdacaaIWaaacaGLOaGaayzkaaaaaaGccaGLOaGa ayzkaaGaaiilaaaa@43F9@ using the posterior predictive distribution of the NDPMPM, from which we estimate the same probabilities as in the main text.

Figures A.1 and A.2 display each estimated marginal, bivariate and trivariate probability q ¯ 50 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGXbGbaebadaWgaaWcbaGaaGynai aaicdaaeqaaaaa@341D@ plotted against its corresponding estimate from the original data, without missing values. Figure A.1 shows the results for the NDPMPM with the rejection sampler, and Figure A.2 shows the results for the NDPMPM using the cap-and-weight approach. For both approaches, the NDPMPM does a good job of capturing important features of the joint distribution of the variables as the point estimates are very close to those from the data before introducing missing values. In short, the results are very similar to those in the main text, though more accurate.

Table A.4 displays 95% confidence intervals for selected probabilities involving within-household relationships, as well as the value in the full population of 764,580 households. The intervals include the two based on the NDPMPM imputation engines and the interval from the data before introducing missingness. The intervals are generally more accurate than those presented in the main text. This is expected since we use lower rates of missingness in the MCAR scenario. For the most part, the intervals from the NDPMPM with the two approaches tend to include the true population quantity. Again, the NDPMPM imputation engine results in downward bias for the percentages of households where everyone is the same race. As mentioned in the main text, this is a challenging estimand to estimate accurately via imputation, particularly for larger households.

Figure A.1 Marginal, bivariate and trivariate probabilities computed in the sample and imputed datasets under MCAR from the truncated NDPMPM with the rejection sampler. Household heads’ data values moved to the household level

Description for Figure A.1 

Figure presenting the marginal, bivariate and trivariate probabilities computed in the sample and imputed datasets under MCAR from the truncated NDPMPM with the rejection sampler (household heads’ data values moved to the household level). There are three scatter plots with a 45° straight line. The three graphs illustrate the marginal, bivariate and trivariate probabilities respectively. The average from 50 imputed datasets is on the y-axis, ranging from 0.0 to 1.0. The sample estimate is on the x-axis, ranging from 0.0 to 0.6. For all three graphs, estimations from imputed data are close to those from the sample, almost on the line.

Figure A.2 Marginal, bivariate and trivariate probabilities computed in the sample and imputed datasets under MCAR from the truncated NDPMPM using the cap-and-weight approach. Household heads’ data values to the household level

Description for Figure A.2 

Figure presenting the marginal, bivariate and trivariate probabilities computed in the sample and imputed datasets under MCAR from the truncated NDPMPM using the cap-and-weight approach (household heads’ data values moved to the household level). There are three scatter plots with a 45° straight line. The three graphs illustrate the marginal, bivariate and trivariate probabilities respectively. The average from 50 imputed datasets is on the y-axis, ranging from 0.0 to 1.0. The sample estimate is on the x-axis, ranging from 0.0 to 0.6. For all three graphs, estimations from imputed data are close to those from the sample, almost on the line.


Table A.4
Confidence intervals for selected probabilities that depend on within-household relationships in the original and imputed datasets under MCAR. “No missing” is based on the sampled data before introducing missing values, “NDPMPM” uses the truncated NDPMPM, moving household heads’ data values to the household level, and “NDPMPM Capped” uses the truncated NDPMPM with the cap-and-weight approach and moving household heads’ data values to the household level. “HH ” means household head, “SP” means spouse, “CH” means child, and “CP” means couple. Q is the value in the full population of 764,580 households
Table summary
This table displays the results of Confidence intervals for selected probabilities that depend on within-household relationships in the original and imputed datasets under MCAR. “No missing” is based on the sampled data before introducing missing values Q, No Missing, NDPMPM and NDPMPM Capped (appearing as column headers).
Q No Missing NDPMPM NDPMPM Capped
All same race household: n i =2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ 0.942 (0.932, 0.949) (0.924, 0.944) (0.925, 0.946)
n i =3 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ 0.908 (0.907, 0.937) (0.887, 0.924) (0.890, 0.925)
n i =4 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacOqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaaikdaaaa@36D4@ 0.901 (0.879, 0.917) (0.854, 0.900) (0.855, 0.900)
SP present This is an empty cell 0.696 (0.682, 0.707) (0.683, 0.709) (0.683, 0.709)
Same race CP This is an empty cell 0.656 (0.641, 0.668) (0.637, 0.664) (0.638, 0.665)
SP present, HH is White This is an empty cell 0.600 (0.589, 0.616) (0.590, 0.618) (0.590, 0.618)
White CP This is an empty cell 0.580 (0.569, 0.596) (0.568, 0.596) (0.568, 0.597)
CP with age difference less than five This is an empty cell 0.488 (0.465, 0.492) (0.422, 0.451) (0.422, 0.450)
Male HH, home owner This is an empty cell 0.476 (0.456, 0.484) (0.455, 0.483) (0.456, 0.485)
HH over 35, no CH present This is an empty cell 0.462 (0.441, 0.468) (0.438, 0.466) (0.438, 0.466)
At least one biological CH present This is an empty cell 0.437 (0.431, 0.458) (0.432, 0.460) (0.432, 0.460)
HH older than SP, White HH This is an empty cell 0.322 (0.309, 0.335) (0.308, 0.335) (0.306, 0.333)
Adult female w/ at least one CH under 5 This is an empty cell 0.078 (0.070, 0.085) (0.068, 0.084) (0.067, 0.083)
White HH with Hisp origin This is an empty cell 0.066 (0.064, 0.078) (0.064, 0.079) (0.064, 0.079)
Non-White CP, home owner This is an empty cell 0.058 (0.050, 0.063) (0.048, 0.061) (0.048, 0.061)
Two generations present, Black HH This is an empty cell 0.057 (0.053, 0.066) (0.053, 0.066) (0.053, 0.067)
Black HH, home owner This is an empty cell 0.052 (0.046, 0.058) (0.046, 0.059) (0.046, 0.059)
SP present, HH is Black This is an empty cell 0.039 (0.032, 0.042) (0.032, 0.043) (0.032, 0.042)
White-nonwhite CP This is an empty cell 0.034 (0.029, 0.039) (0.032, 0.044) (0.032, 0.044)
Hisp HH over 50, home owner This is an empty cell 0.029 (0.025, 0.034) (0.025, 0.035) (0.025, 0.035)
One grandchild present This is an empty cell 0.028 (0.023, 0.033) (0.024, 0.034) (0.024, 0.034)
Adult Black female w/ at least one CH under 18 This is an empty cell 0.027 (0.028, 0.038) (0.027, 0.037) (0.027, 0.037)
At least two generations present, Hisp CP This is an empty cell 0.027 (0.022, 0.031) (0.022, 0.031) (0.022, 0.031)
Hisp CP with at least one biological CH This is an empty cell 0.025 (0.020, 0.028) (0.019, 0.028) (0.019, 0.028)
At least three generations present This is an empty cell 0.023 (0.020, 0.028) (0.019, 0.028) (0.019, 0.028)
Only one parent This is an empty cell 0.020 (0.016, 0.024) (0.016, 0.024) (0.016, 0.024)
At least one stepchild This is an empty cell 0.019 (0.018, 0.026) (0.018, 0.027) (0.018, 0.027)
Adult Hisp male w/ at least one CH under 10 This is an empty cell 0.018 (0.017, 0.025) (0.016, 0.025) (0.016, 0.025)
At least one adopted CH, White CP This is an empty cell 0.008 (0.005, 0.010) (0.005, 0.010) (0.005, 0.010)
Black CP with at least two biological children This is an empty cell 0.006 (0.003, 0.007) (0.003, 0.007) (0.003, 0.007)
Black HH under 40, home owner This is an empty cell 0.005 (0.005, 0.009) (0.005, 0.010) (0.005, 0.011)
Three generations present, White CP This is an empty cell 0.005 (0.004, 0.008) (0.004, 0.010) (0.004, 0.009)
White HH under 25, home owner This is an empty cell 0.003 (0.002, 0.005) (0.004, 0.009) (0.004, 0.009)

References

Andridge, R.R., and Little, R.J.A. (2010). A review of hot deck imputation for survey non-response. International Statistical Review, 78(1), 40-64.

Bennink, M., Croon, M.A., Kroon, B. and Vermunt, J.K. (2016). Micro-macro multilevel latent class models with multiple discrete individual-level variables. Advances in Data Analysis and Classification.

Chambers, R., and Skinner, C. (2003). Analysis of Survey Data, Wiley Series in Survey Methodology, Wiley.

Dunson, D.B., and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. Journal of the American Statistical Association, 104, 1042-1051.

Hu, J., Reiter, J.P. and Wang, Q. (2018). Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data. Bayesian Analysis, 13, 183-200.

Ishwaran, H., and James, L.F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 161-173.

Kalton, G., and Kasprzyk, D. (1986). The treatment of missing survey data. Survey Methodology, 12, 1, 1-16. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1986001/article/14404-eng.pdf.

Little, R.J.A. (1993). Statistical analysis of masked data. Journal of Official Statistics, 9, 407-426.

Manrique-Vallier, D., and Reiter, J.P. (2014). Bayesian estimation of discrete multivariate latent structure models with structural zeros. Journal of Computational and Graphical Statistics, 23, 1061-1079.

Murray, J.S., and Reiter, J.P. (2016). Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence (forthcoming). Journal of the American Statistical Association.

Raghunathan, T.E., and Rubin, D.B. (2001). Multiple imputation for statistical disclosure limitation. Technical Report.

Reiter, J.P. (2003). Inference for partially synthetic, public use microdata sets. Survey Methodology, 29, 2, 181-188. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2003002/article/6785-eng.pdf.

Reiter, J.P., and Raghunathan, T.E. (2007). The multiple adaptations of multiple imputation. Journal of the American Statistical Association, 102, 1462-1471.

Rubin, D.B. (1976). Inference and missing data (with discussion). Biometrika, 63, 581-592.

Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys, New York: John Wiley & Sons, Inc.

Rubin, D.B. (1993). Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9, 462-468.

Savitsky, T.D., and Toth, D. (2016). Bayesian estimation under informative sampling. Electronic Journal of Statistics, 10.1, 1677-1708.

Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639-650.

Si, Y., and Reiter, J.P. (2013). Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavioral Statistics, 38.5, 199-521.

Vermunt, J.K. (2003). Multilevel latent class models. Sociological Methodology, 213-239.

Vermunt, J.K. (2008). Latent class and finite mixture models for multilevel data sets. Statistical Methods in Medical Research, 33-51.

Walker, S.G. (2007). Sampling the Dirichlet mixture model with slices. Communications in Statistics - Simulation and Computation, 1, 45-54.

Wang, Q., Akande, O., Hu, J., Reiter, J. and Barrientos, A. (2016). NestedCategBayesImpute: Modeling and Generating Synthetic Versions of Nested Categorical Data in the Presence of Impossible Combinations. The Comprehensive R Archive Network.


Date modified: