5. Inference from multiple nonparametric synthetic populations

Qi Dong, Michael R. Elliott and Trivellore E. Raghunathan

Previous | Next

Assume we generate L MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadYeaaa a@399F@  synthetic populations, S l , l=1,,L MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaabaaaaaaa aapeGaam4uamaaBaaaleaacaWGSbaabeaakiaacYcacaGGGcGaamiB aiabg2da9iaaigdacaGGSaGaeSOjGSKaaiilaiaadYeaaaa@42C6@  using the nonparametric method described in Section 4, and that our inferential target is QQ( Y ), MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadgfacq GHHjIUcaWGrbWaaeWaaeaacaWGzbaacaGLOaGaayzkaaGaaiilaaaa @3F3A@  a function of the population data (e.g., population mean, correlation, population maximum likelihood estimator of a regression parameter, etc.). We can compute Q l MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadgfada WgaaWcbaGaamiBaaqabaaaaa@3AA1@  as the estimate of Q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadgfaaa a@39A4@  obtained from pooling the F MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadAeaaa a@3999@  synthetic populations that impute the unobserved units of S l ; MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaabaaaaaaa aapeGaam4ua8aadaWgaaWcbaWdbiaadYgaa8aabeaakiaacUdaaaa@3BDA@  since these are direct draws from the posterior predictive distribution of the population, we can compute posterior means, quantiles, and credible intervals from the corresponding empirical estimates from the draws, if L MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadYeaaa a@399F@  is sufficiently large.

However, in many settings, the computational effort required to impute the population may be very large, even if the full population is not required to be synthesized. Hence an alternative approach for inference is to approximate the posterior predictive distribution of a scalar population statistic Q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadgfaaa a@3984@  via a t­ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadsharu WrL9MCNLwyaGqbaiaa=1kaaaa@3DD6@  distribution:

Q| S 1 ,, S L ~ · t L1 ( Q ¯ L ,( 1+ L 1 ) V L ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaamaaeiaaba GaamyuaiaaykW7aiaawIa7aiaaykW7caWGtbWaaSbaaSqaaiaaigda aeqaaOGaaiilaiablAciljaacYcacaWGtbWaaSbaaSqaaiaadYeaae qaaOWaaCbiaeaacaGG+baaleqabaGaeS4JPFgaaOGaamiDamaaBaaa leaacaWGmbGaeyOeI0IaaGymaaqabaGcdaqadaqaaiqadgfagaqeam aaBaaaleaacaWGmbaabeaakiaacYcadaqadaqaaiaaigdacqGHRaWk caWGmbWaaWbaaSqabeaacqGHsislcaaIXaaaaaGccaGLOaGaayzkaa GaamOvamaaBaaaleaacaWGmbaabeaaaOGaayjkaiaawMcaaaaa@57AC@

where

Q ¯ L = l=1 L Q l L = l=1 L f=1 F Q lf LF  and  V L = 1 L l=1 L ( Q l Q ¯ L ) 2 . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiqadgfaga qeamaaBaaaleaacaWGmbaabeaakabaaaaaaaaapeGaeyypa0ZaaSaa a8aabaWdbmaavadabeWcpaqaa8qacaWGSbGaeyypa0JaaGymaaWdae aapeGaamitaaqdpaqaa8qacqGHris5aaGccaaMc8Uaamyua8aadaWg aaWcbaWdbiaadYgaa8aabeaaaOqaa8qacaWGmbaaaiabg2da9maala aapaqaa8qadaqfWaqabSWdaeaapeGaamiBaiabg2da9iaaigdaa8aa baWdbiaadYeaa0WdaeaapeGaeyyeIuoaaOWaaubmaeqal8aabaWdbi aadAgacqGH9aqpcaaIXaaapaqaa8qacaWGgbaan8aabaWdbiabggHi LdaakiaaykW7caWGrbWdamaaBaaaleaapeGaamiBaiaadAgaa8aabe aaaOqaa8qacaWGmbGaamOraaaacaqGGaGaaeyyaiaab6gacaqGKbGa aeiiaiaadAfapaWaaSbaaSqaa8qacaWGmbaapaqabaGcpeGaeyypa0 ZaaSaaa8aabaWdbiaaigdaa8aabaWdbiaadYeaaaWaaybCaeqal8aa baWdbiaadYgacqGH9aqpcaaIXaaapaqaa8qacaWGmbaan8aabaWdbi abggHiLdaakmaabmaapaqaa8qacaWGrbWdamaaBaaaleaapeGaamiB aaWdaeqaaOWdbiabgkHiTiqadgfagaqeamaaBaaaleaacaWGmbaabe aaaOGaayjkaiaawMcaa8aadaahaaWcbeqaa8qacaaIYaaaaOWdaiaa c6caaaa@71CB@

The result follows immediately from Section 4.1 of Raghunathan et al. 2003, and is based on the standard Rubin (1987) multiple imputation combining rules, treating the unobserved units of S l MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaabaaaaaaa aapeGaam4ua8aadaWgaaWcbaWdbiaadYgaa8aabeaaaaa@3B11@  as missing data and the sampled units as observed data. The average "within� imputation variance is zero, since the entire population is being synthesized; hence the posterior variance of Q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqGqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadgfaaa a@39A4@  is entirely a function of the between-imputation variance, and the degrees of freedom is simply given by the number of FPBB samples. (When the population is extremely large, we need only synthesize a draw sufficiently large for average "within� imputation variance to be trivial relative to the between imputation variance V L . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadAfada WgaaWcbaGaamitaaqabaGccaGGUaaaaa@3B42@  ) The result assumes that E( Q lf )=Q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadweada qadaqaaiaadgfadaWgaaWcbaGaamiBaiaadAgaaeqaaaGccaGLOaGa ayzkaaGaeyypa0Jaamyuaaaa@3FC5@  - a result guaranteed by our weighted FPBB estimator - as well as a a sufficiently large sample size for Bayesian asymptotics to apply.

Previous | Next

Date modified: