Using Multiple Imputation of Latent Classes to construct population census tables with data from multiple sources
Section 2. Methodology

When applying the MILC method, the starting point is a unit-linked combined dataset, which can consists of combinations of administrative population registries and survey samples. In order to account for uncertainty regarding the parameters of the LC model estimated at a later step in MILC, a non-parametric bootstrap procedure is applied on this dataset first (step 1). This involves creating M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  bootstrap samples by drawing observations from the observed dataset with replacement. Subsequently, for each bootstrap sample, the LC model of interest is estimated (step 2) using Latent GOLD software (Vermunt and Magidson, 2013a). Here, model parameters are estimated by Maximum Likelihood using a combination of the Expectation-Maximization and Newton-Raphson algorithms. Note that here, by explicitly stating which cells should be restricted, constrained estimation is used. Next, M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  imputations are created using the M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  sets of parameter values obtained from the M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  latent class models (step 3). If imputations would be created based on the maximum-likelihood estimates obtained directly using the original observed data, sampling uncertainty regarding the estimated parameters of the latent class model would be ignored.

In the following subsections, we explain each of the steps of MILC in more detail and present the extension for the estimation of multiple latent variables for a finite population from register and sample survey data.

2.1  Step 1: Creating bootstrap samples

We propose to use the “classical” bootstrap procedure here, which consists of repeatedly drawing samples with replacement from the original dataset, of the same size as the original dataset. A motivation for using this classical with-replacement bootstrap here, as opposed to an adapted bootstrap procedure for a finite population, is provided in Section 2.5 below.

The bootstrap should be applied to the dataset that is used to estimate the LC models. When register data and survey data are combined, the indicator variables from the survey will typically be missing for a large part (e.g., 90% or more) of the population. The LC models could then be estimated by two different approaches:

Under the second approach, full information maximum likelihood can be used to handle missing values when estimating the LC models. This has the advantage of using all available information. Since this amounts to estimating the LC model on M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  datasets with the size of the target population, a practical drawback of this approach is that it may be computationally demanding in terms of time and memory. Therefore, the first approach may be more attractive, in particular when the associations among the covariates and target variables are relatively weak. In the latter approach, the cases with missing survey data will contain relatively little information about the parameters of the LC model. Note that under both approaches, the estimated LC models are used to impute predictions of the latent classes throughout the population. Depending on which approach is chosen to estimate the LC models, bootstrapping is applied either to the subset of complete cases or to the target population. In the simulation study in this paper, the complete-case approach will be used.

2.2  Step 2: Estimating the latent class model

The second step performed is the estimation of the LC model. It is explained below how this is done for multiple latent variables. As described in the previous section, the LC model is typically estimated M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  times using the M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  bootstrapped datasets. In the situation under evaluation in this paper, the LC model is estimated M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  times on M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  subsets of complete observations coming from the M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  bootstrap samples. An extensive discussion of the model and the assumptions made when using the model to correct for measurement error can be found in Boeschoten et al. (2017). Multiple latent variables can be estimated simultaneously in one model, which yields the following model structure for the joint probability of the response variables given covariate values, denoted by P( Y=y| Q=q ). MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGqbGaaGPaVpaabmqabaGaaCywai aaysW7cqGH9aqpcaaMe8UaaCyEaiaaysW7daabbeqaaiaaysW7caWH rbGaaGjbVlabg2da9iaaysW7caWHXbaacaGLhWoaaiaawIcacaGLPa aacaGGUaaaaa@46F2@  The number of latent variables is denoted as v MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWG2baaaa@32AC@  and K h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGlbWaaSbaaSqaaiaadIgaaeqaaa aa@339A@  is the number of classes of latent variable X h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGybWaaSbaaSqaaiaadIgaaeqaaa aa@33A7@  (scalar), where ( h=1,,v ). MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaadaqadeqaaiaadIgacaaMe8Uaeyypa0 JaaGjbVlaaigdacaaISaGaaGjbVlablAciljaacYcacaaMe8UaamOD aaGaayjkaiaawMcaaiaac6caaaa@4052@  Furthermore, Y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWHzbaaaa@3293@  are the observed target variables, i.e. the indicator variables, L h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGmbWaaSbaaSqaaiaadIgaaeqaaa aa@339B@  is the number of indicator variables for X h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGybWaaSbaaSqaaiaadIgaaeqaaa aa@33A7@  and Q MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWHrbaaaa@328B@  are the (also observed) covariate variables:

P( Y=y| Q=q ) = x 1 =1 K 1 x v =1 K v P( X 1 = x 1 ,, X v = x v | Q=q ) l 1 =1 L 1 P( Y l 1 ,1 = y l 1 ,1 | X 1 = x 1 ) l v =1 L v P( Y l v ,v = y l v ,v | X v = x v ).(2.1) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9q8qrpq0xc9fs0xc9q8qqaqFn0dXdir=xcv k9pIe9q8qqaq=dir=f0=yqaqVeLsFr0=vr0=vr0db8meaabaqaciGa caGaaeqabaGabiWadaaakeaafaqaaeabcaaaaeaacaqGqbGaaGjbVp aabmqabaGaaCywaiaaysW7cqGH9aqpcaaMe8UaaCyEaiaaysW7daab beqaaiaaysW7caWHrbGaaGjbVlabg2da9iaaysW7caWHXbaacaGLhW oaaiaawIcacaGLPaaaaeaacqGH9aqpcaaMe8UaaGjbVpaaqahabaGa aGPaVlablAcilnaaqahabaGaaGPaVlaabcfacaaMe8+aaeWabeaaca WGybWaaSbaaSqaaiaaigdaaeqaaOGaaGjbVlabg2da9iaaysW7caWG 4bWaaSbaaSqaaiaaigdaaeqaaOGaaiilaiaaysW7cqWIMaYscaGGSa GaaGjbVlaadIfadaWgaaWcbaGaamODaaqabaGccaaMe8Uaeyypa0Ja aGjbVlaadIhadaWgaaWcbaGaamODaaqabaGccaaMe8+aaqqabeaaca aMe8UaaCyuaiaaysW7cqGH9aqpcaaMe8UaaCyCaaGaay5bSdaacaGL OaGaayzkaaaaleaacaWG4bWaaSbaaWqaaiaadAhaaeqaaSGaaGPaVl abg2da9iaaykW7caaIXaaabaGaam4samaaBaaameaacaWG2baabeaa a0GaeyyeIuoaaSqaaiaadIhadaWgaaadbaGaaGymaaqabaWccaaMc8 Uaeyypa0JaaGPaVlaaigdaaeaacaWGlbWaaSbaaWqaaiaaigdaaeqa aaqdcqGHris5aaGcbaaabaGaaGzbVlaaywW7caaMf8UaaGzbVlaayw W7caaMf8UaaGjbVlaaysW7daqeWbqaaiaaysW7caqGqbGaaGjbVpaa bmqabaGaamywamaaBaaaleaacaWGSbWaaSbaaWqaaiaaigdaaeqaaS GaaiilaiaaysW7caaIXaaabeaakiaaysW7cqGH9aqpcaaMe8UaamyE amaaBaaaleaacaWGSbWaaSbaaWqaaiaaigdaaeqaaSGaaiilaiaays W7caaIXaaabeaakiaaysW7daabbeqaaiaaysW7caWGybWaaSbaaSqa aiaaigdaaeqaaOGaaGjbVlabg2da9iaaysW7caWG4bWaaSbaaSqaai aaigdaaeqaaaGccaGLhWoaaiaawIcacaGLPaaaaSqaaiaadYgadaWg aaadbaGaaGymaaqabaWccaaMc8Uaeyypa0JaaGjbVlaaigdaaeaaca WGmbWaaSbaaWqaaiaaigdaaeqaaaqdcqGHpis1aaGcbaaabaGaaGzb VlaaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8 UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGjbVlaaysW7 cqWIMaYsaeaaaeaacaaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaayw W7daqeWbqaaiaaysW7caqGqbGaaGjbVpaabmqabaGaamywamaaBaaa leaacaWGSbWaaSbaaWqaaiaadAhaaeqaaSGaaiilaiaaysW7caWG2b aabeaakiaaysW7cqGH9aqpcaaMe8UaamyEamaaBaaaleaacaWGSbWa aSbaaWqaaiaadAhaaeqaaSGaaiilaiaaysW7caWG2baabeaakiaays W7daabbeqaaiaaysW7caWGybWaaSbaaSqaaiaadAhaaeqaaOGaaGjb Vlabg2da9iaaysW7caWG4bWaaSbaaSqaaiaadAhaaeqaaaGccaGLhW oaaiaawIcacaGLPaaacaGGUaGaaGzbVlaaywW7caaMf8UaaGzbVlaa ywW7caGGOaGaaGOmaiaac6cacaaIXaGaaiykaaWcbaGaamiBamaaBa aameaacaWG2baabeaaliaaykW7cqGH9aqpcaaMe8UaaGymaaqaaiaa dYeadaWgaaadbaGaamODaaqabaaaniabg+Givdaaaaaa@242B@

Here, local independence is assumed as well as independence of covariates.

Constrained parameter estimation is used when certain cells within P( X 1 = x 1 ,, X v = x v | Q=q ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGqbGaaGjbVpaabmqabaGaamiwam aaBaaaleaacaaIXaaabeaakiaaysW7cqGH9aqpcaaMe8UaamiEamaa BaaaleaacaaIXaaabeaakiaaiYcacaaMe8UaeSOjGSKaaiilaiaays W7caWGybWaaSbaaSqaaiaadAhaaeqaaOGaaGjbVlabg2da9iaaysW7 caWG4bWaaSbaaSqaaiaadAhaaeqaaOGaaGjbVpaaeeqabaGaaGjbVl aahgfacaaMe8Uaeyypa0JaaGjbVlaahghaaiaawEa7aaGaayjkaiaa wMcaaaaa@5618@  are restricted. This can be used to specify that certain combinations of scores between covariates and latent variables are logically impossible, or when a “quasi-latent” variable is used to create imputations for missing values in a variable (Vermunt and Magidson, 2013b).

2.3  Step 3: Multiple imputation

To be able to create multiple imputations, joint posterior membership probabilities are calculated for every person in the original dataset. They represent the probability that a unit is part of a combination of latent classes from the different latent variables, given its combination of scores on the indicators and covariates used in the LC model. These probabilities can be used to create multiple imputations of the latent variables which contain their “true scores”.

The joint posterior membership probabilities can be calculated by applying Bayes’ rule to the conditional response probabilities obtained from the M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  LC models:

P( X 1 = x 1 ,, X v = x v | Y=y,Q=q )= P( X 1 = x 1 ,, X v = x v ,Y=y| Q=q ) P( Y=y| Q=q ) ,(2.2) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGqbGaaGjbVpaabmqabaGaamiwam aaBaaaleaacaaIXaaabeaakiaaysW7cqGH9aqpcaaMe8UaamiEamaa BaaaleaacaaIXaaabeaakiaaiYcacaaMe8UaeSOjGSKaaiilaiaays W7caWGybWaaSbaaSqaaiaadAhaaeqaaOGaaGjbVlabg2da9iaaysW7 caWG4bWaaSbaaSqaaiaadAhaaeqaaOGaaGjbVpaaeeqabaGaaGjbVl aahMfacaaMe8Uaeyypa0JaaGjbVlaahMhacaGGSaGaaGjbVlaahgfa caaMe8Uaeyypa0JaaGjbVlaahghaaiaawEa7aaGaayjkaiaawMcaai aaysW7caaMe8Uaeyypa0JaaGjbVlaaysW7daWcaaqaaiaabcfacaaM e8+aaeWabeaacaWGybWaaSbaaSqaaiaaigdaaeqaaOGaaGjbVlabg2 da9iaaysW7caWG4bWaaSbaaSqaaiaaigdaaeqaaOGaaGilaiaaysW7 cqWIMaYscaGGSaGaaGjbVlaadIfadaWgaaWcbaGaamODaaqabaGcca aMe8Uaeyypa0JaaGjbVlaadIhadaWgaaWcbaGaamODaaqabaGccaaI SaGaaGjbVlaahMfacaaMe8Uaeyypa0JaaGjbVlaahMhacaaMe8+aaq qabeaacaaMe8UaaCyuaiaaysW7cqGH9aqpcaaMe8UaaCyCaaGaay5b SdaacaGLOaGaayzkaaaabaGaaeiuaiaaysW7daqadeqaaiaahMfaca aMe8Uaeyypa0JaaGjbVlaahMhacaaMe8+aaqqabeaacaaMe8UaaCyu aiaaysW7cqGH9aqpcaaMe8UaaCyCaaGaay5bSdaacaGLOaGaayzkaa aaaiaaiYcacaaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaacIcacaaI YaGaaiOlaiaaikdacaGGPaaaaa@B2E0@

where

P( X 1 = x 1 ,, X v = x v ,Y=y| Q=q ) =P( X 1 = x 1 ,, X v = x v |Q=q ) l 1 =1 L 1 P( Y l 1 ,1 = y l 1 ,1 | X 1 = x 1 ) l v =1 L v P( Y l v ,v = y l v ,v | X v = x v )(2.3) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9q8WrFr0xc9fs0xc9q8qqaqFn0dXdir=xcv k9pIe9q8qqaq=dir=f0=yqaqVeLsFr0=vr0=vr0db8meaabaqaciGa caGaaeqabaGabiWadaaakeaafaqaaeabcaaaaeaacaqGqbGaaGjbVp aabmqabaGaamiwamaaBaaaleaacaaIXaaabeaakiaaysW7cqGH9aqp caaMe8UaamiEamaaBaaaleaacaaIXaaabeaakiaaiYcacaaMe8UaeS OjGSKaaiilaiaaysW7caWGybWaaSbaaSqaaiaadAhaaeqaaOGaaGjb Vlabg2da9iaaysW7caWG4bWaaSbaaSqaaiaadAhaaeqaaOGaaGilai aaysW7caWHzbGaaGjbVlabg2da9iaaysW7caWH5bGaaGjbVpaaeeqa baGaaGjbVlaahgfacaaMe8Uaeyypa0JaaGjbVlaahghaaiaawEa7aa GaayjkaiaawMcaaaqaaiaai2dacaaMe8UaaGjbVlaabcfacaaMe8+a aeWabeaacaWGybWaaSbaaSqaaiaaigdaaeqaaOGaaGjbVlabg2da9i aaysW7caWG4bWaaSbaaSqaaiaaigdaaeqaaOGaaGilaiaaysW7cqWI MaYscaGGSaGaaGjbVlaadIfadaWgaaWcbaGaamODaaqabaGccaaMe8 Uaeyypa0JaaGjbVlaadIhadaWgaaWcbaGaamODaaqabaGccaaMe8+a aqqabeaacaaMe8oacaGLhWoacaWHrbGaaGjbVlabg2da9iaaysW7ca WHXbaacaGLOaGaayzkaaaabaaabaGaaGzbVlaaywW7caaMf8+aaebC aeaacaaMe8UaaeiuaiaaysW7daqadeqaaiaadMfadaWgaaWcbaGaam iBamaaBaaameaacaaIXaaabeaaliaacYcacaaMe8UaaGymaaqabaGc caaMe8Uaeyypa0JaaGjbVlaadMhadaWgaaWcbaGaamiBamaaBaaame aacaaIXaaabeaaliaacYcacaaMe8UaaGymaaqabaGccaaMe8+aaqqa beaacaaMe8UaamiwamaaBaaaleaacaaIXaaabeaakiaaysW7cqGH9a qpcaaMe8UaamiEamaaBaaaleaacaaIXaaabeaaaOGaay5bSdaacaGL OaGaayzkaaaaleaacaWGSbWaaSbaaWqaaiaaigdaaeqaaSGaaGjbVl abg2da9iaaysW7caaIXaaabaGaamitamaaBaaameaacaaIXaaabeaa a0Gaey4dIunaaOqaaaqaaiaaywW7caaMf8UaaGzbVlaaywW7caaMf8 UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlablAci lbqaaaqaaiaaywW7caaMf8UaaGzbVlaaysW7daqeWbqabSqaaiaadY gadaWgaaadbaGaamODaaqabaWccaaMe8UaaGypaiaaysW7caaIXaaa baGaamitamaaBaaabaGaamODaaqabaaaniabg+GivdGccaaMe8Uaae iuaiaaysW7daqadeqaaiaadMfadaWgaaWcbaGaamiBamaaBaaameaa caWG2baabeaaliaacYcacaaMe8UaamODaaqabaGccaaMe8Uaeyypa0 JaaGjbVlaadMhadaWgaaWcbaGaamiBamaaBaaameaacaWG2baabeaa liaacYcacaaMe8UaamODaaqabaGccaaMe8+aaqqabeaacaaMe8Uaam iwamaaBaaaleaacaWG2baabeaakiaaysW7cqGH9aqpcaaMe8UaamiE amaaBaaaleaacaWG2baabeaaaOGaay5bSdaacaGLOaGaayzkaaGaaG zbVlaaywW7caaMf8UaaGzbVlaaywW7caGGOaGaaGOmaiaac6cacaaI ZaGaaiykaaaaaaa@0E8C@

and P( Y=y| Q=q ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGqbGaaGjbVpaabmqabaGaaCywai aaysW7cqGH9aqpcaaMe8UaaCyEaiaaysW7daabbeqaaiaaysW7caWH rbGaaGjbVlabg2da9iaaysW7caWHXbaacaGLhWoaaiaawIcacaGLPa aaaaa@4642@  is defined in equation 2.1. For one profile (so one set of scores on all indicator and covariate variables), the joint posterior membership probabilities sum up to one.

To be able to include parameter uncertainty in our variance estimates, we perform the model estimation on M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  bootstrap samples of the dataset, resulting in M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  different LC models. We generate imputations in the original dataset accounting for the parameter uncertainty by using the resulting M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  sets of bootstrap parameter estimates. More specifically, with each of these M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  parameter sets we compute the posterior class membership probabilities for the original sample, and use these to generate the imputations. In other words, the M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  imputations are based of M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  different sets of posterior probabilities.

2.4  Step 4: Pooling

The next step is to obtain estimates of interest for every imputation, and to pool them using Rubin’s Rules (Rubin, 1987, page 76). For this research, the main interest is producing a frequency table. Therefore, the frequency table of interest is obtained for the M MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbaaaa@3283@  imputations and they are pooled, which means taking the average over the imputations for every cell in the frequency table:

θ ^ j = 1 M i=1 M θ ^ ij ,(2.4) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacuaH4oqCgaqcamaaBaaaleaacaWGQb aabeaakiaaysW7caaMe8UaaGypaiaaysW7caaMe8+aaSaaaeaacaaI XaaabaGaamytaaaacaaMe8+aaabCaeaacaaMc8UafqiUdeNbaKaada WgaaWcbaGaamyAaiaadQgaaeqaaaqaaiaadMgacaaMc8UaaGypaiaa ykW7caaIXaaabaGaamytaaqdcqGHris5aOGaaGilaiaaywW7caaMf8 UaaGzbVlaaywW7caaMf8UaaiikaiaaikdacaGGUaGaaGinaiaacMca aaa@58B4@

where j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGQbaaaa@32A0@  refers to a specific cell in the frequency table.

Next, an estimate of the uncertainty around these frequencies is of interest. In general, the variance of the pooled estimate j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGQbaaaa@32A0@  can be estimated by Rubin’s total variance formula for multiple imputation (Rubin, 1987, page 76):

VAR total j = VAR ¯ within j + VAR between j + VAR between j M .(2.5) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGwbGaaeyqaiaabkfadaWgaaWcba GaaeiDaiaab+gacaqG0bGaaeyyaiaabYgadaWgaaadbaGaamOAaaqa baaaleqaaOGaaGjbVlaai2dacaaMe8+aa0aaaeaacaqGwbGaaeyqai aabkfaaaWaaSbaaSqaaiaabEhacaqGPbGaaeiDaiaabIgacaqGPbGa aeOBamaaBaaameaacaWGQbaabeaaaSqabaGccaaMe8Uaey4kaSIaaG jbVlaabAfacaqGbbGaaeOuamaaBaaaleaacaqGIbGaaeyzaiaabsha caqG3bGaaeyzaiaabwgacaqGUbWaaSbaaWqaaiaadQgaaeqaaaWcbe aakiaaysW7cqGHRaWkcaaMe8+aaSaaaeaacaqGwbGaaeyqaiaabkfa daWgaaWcbaGaaeOyaiaabwgacaqG0bGaae4DaiaabwgacaqGLbGaae OBamaaBaaameaacaWGQbaabeaaaSqabaaakeaacaWGnbaaaiaai6ca caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaacIcacaaIYaGaaiOlai aaiwdacaGGPaaaaa@7112@

Here, VAR between j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGwbGaaeyqaiaabkfadaWgaaWcba GaaeOyaiaabwgacaqG0bGaae4DaiaabwgacaqGLbGaaeOBamaaBaaa meaacaWGQbaabeaaaSqabaaaaa@3BF5@  can be estimated as

VAR between j = 1 M1 i=1 M ( θ ^ ij θ ^ j ) ( θ ^ ij θ ^ j ) .(2.6) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGwbGaaeyqaiaabkfadaWgaaWcba GaaeOyaiaabwgacaqG0bGaae4DaiaabwgacaqGLbGaaeOBamaaBaaa meaacaWGQbaabeaaaSqabaGccaaMe8UaaGypaiaaysW7daWcaaqaai aaigdaaeaacaWGnbGaaGjbVlabgkHiTiaaysW7caaIXaaaaiaaysW7 daaeWbqaaiaaykW7daqadeqaaiqbeI7aXzaajaWaaSbaaSqaaiaadM gacaWGQbaabeaakiaaysW7cqGHsislcaaMe8UafqiUdeNbaKaadaWg aaWcbaGaamOAaaqabaaakiaawIcacaGLPaaacaaMe8+aaeWabeaacu aH4oqCgaqcamaaBaaaleaacaWGPbGaamOAaaqabaGccaaMe8UaeyOe I0IaaGjbVlqbeI7aXzaajaWaaSbaaSqaaiaadQgaaeqaaaGccaGLOa GaayzkaaWaaWbaaSqabeaakiadaITHYaIOaaaaleaacaWGPbGaaGjb Vlabg2da9iaaysW7caaIXaaabaGaamytaaqdcqGHris5aOGaaGOlai aaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaiikaiaaikdacaGGUaGa aGOnaiaacMcaaaa@7B90@

The within variance VAR within j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGwbGaaeyqaiaabkfadaWgaaWcba Gaae4DaiaabMgacaqG0bGaaeiAaiaabMgacaqGUbWaaSbaaWqaaiaa dQgaaeqaaaWcbeaaaaa@3B1B@  reflects the average sampling variance of ij MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGPbGaamOAaaaa@338E@  when the imputed values are treated as observed. In our application, as the population is finite and imputations are generated for the complete population, this within variance component is zero and can be mitigated (Vink and van Buuren, 2014). Note that this is a property of multiple imputation and is due to the fact that the complete population is imputed. This should not be confused with the decision to only use a sample for LC model estimation. Hence, formula (2.5) is reduced in this case to:

VAR total j = VAR between j + VAR between j M .(2.7) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGwbGaaeyqaiaabkfadaWgaaWcba GaaeiDaiaab+gacaqG0bGaaeyyaiaabYgadaWgaaadbaGaamOAaaqa baaaleqaaOGaaGjbVlabg2da9iaaysW7caqGwbGaaeyqaiaabkfada WgaaWcbaGaaeOyaiaabwgacaqG0bGaae4DaiaabwgacaqGLbGaaeOB amaaBaaameaacaWGQbaabeaaaSqabaGccaaMe8Uaey4kaSIaaGjbVp aalaaabaGaaeOvaiaabgeacaqGsbWaaSbaaSqaaiaabkgacaqGLbGa aeiDaiaabEhacaqGLbGaaeyzaiaab6gadaWgaaadbaGaamOAaaqaba aaleqaaaGcbaGaamytaaaacaaIUaGaaGzbVlaaywW7caaMf8UaaGzb VlaaywW7caGGOaGaaGOmaiaac6cacaaI3aGaaiykaaaa@63D2@

2.5  A note on bootstrapping for multiple imputation in finite populations

The aim of a census is to estimate certain target parameters of a finite population (e.g., all persons currently living in the Netherlands). Hence, a natural idea might be to apply a finite-population bootstrap procedure in this context; see Mashreghi, Haziza and Léger (2016) for an overview of bootstrap methods for finite populations. However, when determining the appropriate bootstrap approach, it should be noted that the bootstrap in MILC is specifically implemented to account for the between imputation variance component of formula (2.5) in Section 2.4. In general, variability in the target parameters due to the fact that a sample was drawn from a finite population is incorporated in the within variance component of formula (2.5). As we use mass imputation here, the within variance component in fact reduces to zero; cf. formula (2.7). More generally, this component would be estimated separately from the bootstrap method at hand; see Boeschoten et al. (2017) for an example.

Furthermore, the reason for incorporating the bootstrap in the MILC approach is to account for uncertainty in the estimated parameters of the latent class model. Note that these parameters are not associated with a finite population, but with a model. Even if we had observed the entire finite population, there would still be uncertainty about the true parameter values of the latent class model. This uncertainty can be considered as drawing from an infinite distribution. Therefore, we select the classical with-replacement bootstrap. We argue that bootstrap methods for finite populations should not be used in this context. For large samples, such methods would result in a substantial underestimation of the variance when combined with the usual approach to multiple imputation. We also checked this empirically in the simulation study to be discussed in Section 3. As an example, when a pseudo-population bootstrap method for finite populations was used, the resulting se/sd ratios in Table 4.7 for the condition MAR, M=5 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGnbGaaGjbVlabg2da9iaaysW7ca aI1aaaaa@3762@  were 0.7217, 0.7887, 0.7536 and 0.8607, respectively, all pointing to a non-negligible underestimation of the true variance.

In the simulation study in this paper, we will restrict attention to surveys based on simple random sampling and stratified simple random sampling. For more complex survey designs, e.g. involving cluster sampling or sampling with unequal probabilities, it is unclear whether the proposed bootstrap approach is always appropriate. It is possible that in some cases such complex design features could indirectly affect the uncertainty of estimated parameters of the latent class model and therefore become relevant for variance estimation. We will return to this point in the discussion section.


Date modified: