Using Multiple Imputation of Latent Classes to construct population census tables with data from multiple sources
Section 4. Simulation results
First, cell-proportions of univariate and multivariate
cross-tables are evaluated in terms of bias and root mean squared error (RMSE)
over the 500 simulation replications. Second, these cell-proportions are
evaluated in terms of variance by investigating the average of the estimated
standard error divided by the standard deviation over the 500 estimates
obtained from the 500 simulation replications (SESD). Due to the
log-transformations we made in equations 3.2, 3.3 and 3.4 to account for
small cell frequencies, the RMSE and SESD are reported on a log scale.
4.1 Results in terms of bias
4.1.1 Univariate marginal frequencies of imputed
variables
In Table 4.1, the simulation results can be found that
cover the univariate marginal frequencies of the imputed latent variable
“Gender” in terms of bias and RMSE. Results from all simulation conditions are
shown. Here, it can be seen that a smaller amount of bias is obtained if is used, compared to results obtained using
MILC under all conditions. In addition, it can be seen that the RMSE is also
smaller if is used instead of the MILC method.
Furthermore, it can be seen that both bias and RMSE slightly decrease as increases, and that the quality of the results
appears to be unrelated to the missingness mechanism.
Table 4.1
Results in terms of bias and root mean squared error for the two categories of the imputed latent variable “Gender”
Table summary
This table displays the results of Results in terms of bias and root mean squared error for the two categories of the imputed latent variable “Gender” Gender, Frequency,
, MCAR and MAR (appearing as column headers).
|
Gender |
Frequency |
|
MCAR |
MAR |
|
|
|
|
|
|
|
| Bias |
F. |
1,367,167 |
-2,126 |
3,386 |
3,308 |
3,325 |
3,231 |
3,153 |
3,109 |
| M. |
1,324,310 |
2,126 |
-3,386 |
-3,308 |
-3,325 |
-3,231 |
-3,153 |
-3,109 |
| RMSE |
F. |
1,367,167 |
2,154 |
6,008 |
5,888 |
5,760 |
5,914 |
5,637 |
5,512 |
| M. |
1,324,310 |
2,154 |
6,008 |
5,888 |
5,760 |
5,914 |
5,637 |
5,512 |
In Table 4.2, the simulation results can be found that
cover the univariate marginal frequencies of the imputed latent variable “Type
of family nucleus” in terms of bias and RMSE. Here, the results are very
different from the results we found for “Gender”, the bias obtained for is much higher compared to the bias obtained
using MILC under all conditions and the same holds for RMSE. In addition,
whether the results for the MILC method depend on the missingness mechanism
differ per category. In terms of bias and RMSE, this is the case for the
categories “N.A.” and “Partners”.
Table 4.2
Results in terms of bias and root mean squared error for the four observed categories of the imputed latent variable “Type of family nucleus”
Table summary
This table displays the results of Results in terms of bias and root mean squared error for the four observed categories of the imputed latent variable “Type of family nucleus” Type of family nucleus, Frequency,
, MCAR and MAR (appearing as column headers).
|
Type of family nucleus |
Frequency |
|
MCAR |
MAR |
|
|
|
|
|
|
|
| Bias |
Lone parents |
97,360 |
2,670 |
185 |
182 |
176 |
224 |
226 |
220 |
| N.A. |
604,032 |
8,985 |
-957 |
-975 |
-989 |
-1,601 |
-1,612 |
-1,611 |
| Partners |
1,272,339 |
-19,686 |
401 |
411 |
427 |
932 |
935 |
932 |
| Sons/daughters |
717,746 |
8,030 |
371 |
381 |
386 |
446 |
451 |
459 |
| RMSE |
Lone parents |
97,360 |
2,672 |
425 |
408 |
395 |
426 |
421 |
414 |
| N.A. |
604,032 |
8,989 |
1,337 |
1,318 |
1,312 |
1,837 |
1,833 |
1,818 |
| Partners |
1,272,339 |
19,688 |
954 |
914 |
904 |
1,256 |
1,235 |
1,218 |
| Sons/daughters |
717,746 |
8,034 |
630 |
624 |
617 |
715 |
692 |
688 |
In Table 4.3, the simulation results can be found that
cover the univariate marginal frequencies of the imputed latent variable
“Citizen” in terms of bias and RMSE. Here, the results are comparable to the
results we found for “Type of family nucleus”, as the bias obtained when only is used is again much higher compared to the
bias obtained using MILC method and the same holds for RMSE. As was also the
case for “Type of family nucleus”, whether the results for the MILC method
depend on the missingness mechanism differ per category, although this is more
the case for the bias here, and not so much in terms of RMSE.
Table 4.3
Results in terms of bias and root mean squared error for the four observed categories of the imputed latent variable “Citizen”
Table summary
This table displays the results of Results in terms of bias and root mean squared error for the four observed categories of the imputed latent variable “Citizen” Citizen, Frequency, , MCAR and MAR (appearing as column headers).
|
Citizen |
Frequency |
|
MCAR |
MAR |
|
|
|
|
|
|
|
| Bias |
EU |
79,212 |
51,365 |
-5 |
-7 |
-12 |
-199 |
-211 |
-216 |
| NL |
2,511,214 |
-116,899 |
-555 |
-546 |
-545 |
117 |
124 |
107 |
| not EU |
89,592 |
58,085 |
512 |
502 |
507 |
62 |
69 |
89 |
| Not stated |
11,459 |
7,448 |
49 |
51 |
49 |
21 |
18 |
20 |
| RMSE |
EU |
79,212 |
51,365 |
410 |
398 |
388 |
488 |
486 |
475 |
| NL |
2,511,214 |
116,899 |
925 |
894 |
883 |
767 |
756 |
720 |
| not EU |
89,592 |
58,086 |
800 |
770 |
767 |
618 |
611 |
590 |
| Not stated |
11,459 |
7,449 |
201 |
197 |
190 |
204 |
205 |
198 |
Boeschoten et al. (2017) concluded that the quality of
the output when MILC is applied related to how well the latent class model is
able to make classifications based on the observed data, which is summarized in
the entropy The entropy values for “Gender”, “Type of family nucleus”
and “Citizen” are approximately 0.7352, 0.9191, and 0.8571 respectively under
MCAR. So this corresponds to the quality of the results for the latent
variables in terms of bias and RMSE. An additional explanation for “Gender” is
that the two categories are of comparable size and the amount of
misclassification in both categories is approximately equal and behaves
symmetrical in our simulation study. This causes that the marginal distribution
of is very similar to the marginal distribution
of and not so much affected by misclassification.
4.1.2 Joint frequencies of imputed variables
In Table 4.4, the simulation results can be found that
cover the joint marginal frequencies of the three imputed latent variables in
terms of bias and RMSE. Again, it can be seen here that if only is used, severe bias is present in all cells
of the joint frequency table. The results obtained when the MILC method is
applied show much lower amounts of bias and RMSE. Here, the differences between
different numbers for or different missingness mechanism are much
smaller compared to the differences between MILC and Furthermore, the differences in the amount of
bias for particular cells after applying the MILC method seem to be related to
imbalances in cell frequencies within particular variables. More specifically,
the variable “Citizen” knows substantive differences in cell frequencies and
within Table 4.4, it can be seen that particular the category “not EU” is
affected in terms of bias by this imbalance.
Table 4.4
Results in terms of bias and root mean squared error for the 32 observed categories of the joint distribution of the three imputed latent variables “Gender”, “Type of family nucleus” and “Citizen”
Table summary
This table displays the results of Results in terms of bias and root mean squared error for the 32 observed categories of the joint distribution of the three imputed latent variables “Gender” Gender (équation) Type of family nucleus (équation) Citizen, Frequency,
, MCAR and MAR (appearing as column headers).
|
Gender
Type of family nucleus
Citizen |
Frequency |
|
MCAR |
MAR |
| Gender |
Family nucleus |
Citizen |
|
|
|
|
|
|
| Bias |
F. |
Lone parents |
EU |
2,091 |
1,434 |
8 |
7 |
7 |
1 |
0 |
0 |
| F. |
Lone parents |
NL |
76,131 |
-6,620 |
652 |
650 |
646 |
240 |
241 |
234 |
| F. |
Lone parents |
not EU |
3,120 |
1,513 |
33 |
32 |
32 |
39 |
39 |
38 |
| F. |
Lone parents |
N.S. |
646 |
154 |
-5 |
-5 |
-6 |
-13 |
-13 |
-13 |
| F. |
N.A. |
EU |
12,436 |
5,971 |
433 |
432 |
432 |
431 |
427 |
427 |
| F. |
N.A. |
NL |
293,960 |
-11,998 |
-595 |
-618 |
-623 |
905 |
891 |
880 |
| F. |
N.A. |
not EU |
9,509 |
7,317 |
1,032 |
1,031 |
1,032 |
1,069 |
1,069 |
1,071 |
| F. |
N.A. |
N.S. |
1,221 |
982 |
182 |
182 |
182 |
198 |
197 |
197 |
| F. |
Partners |
EU |
20,443 |
11,185 |
237 |
236 |
235 |
24 |
19 |
21 |
| F. |
Partners |
NL |
584,547 |
-34,001 |
294 |
262 |
279 |
-564 |
-599 |
-624 |
| F. |
Partners |
not EU |
26,877 |
12,022 |
404 |
402 |
401 |
254 |
255 |
258 |
| F. |
Partners |
N.S. |
1,292 |
1,837 |
-19 |
-18 |
-18 |
-23 |
-24 |
-24 |
| F. |
Sons/daughters |
EU |
4,368 |
7,541 |
-778 |
-779 |
-780 |
-851 |
-853 |
-854 |
| F. |
Sons/daughters |
NL |
321,364 |
-8,738 |
2,483 |
2,471 |
2,479 |
2,620 |
2,601 |
2,588 |
| F. |
Sons/daughters |
not EU |
7,680 |
8,303 |
-764 |
-768 |
-766 |
-876 |
-874 |
-869 |
| F. |
Sons/daughters |
N.S. |
1,482 |
971 |
-209 |
-208 |
-208 |
-223 |
-223 |
-222 |
| M. |
Lone parents |
EU |
389 |
591 |
-10 |
-11 |
-11 |
9 |
9 |
9 |
| M. |
Lone parents |
NL |
14,536 |
4,791 |
-553 |
-552 |
-554 |
-134 |
-131 |
-130 |
| M. |
Lone parents |
not EU |
372 |
707 |
35 |
35 |
35 |
53 |
53 |
53 |
| M. |
Lone parents |
N.S. |
75 |
100 |
27 |
27 |
27 |
28 |
29 |
29 |
| M. |
N.A. |
EU |
16,308 |
4,444 |
-306 |
-304 |
-305 |
-349 |
-349 |
-350 |
| M. |
N.A. |
NL |
253,493 |
-3,733 |
-714 |
-708 |
-717 |
-2,730 |
-2,722 |
-2,713 |
| M. |
N.A. |
not EU |
13,636 |
5,548 |
-904 |
-903 |
-902 |
-1,023 |
-1,023 |
-1,020 |
| M. |
N.A. |
N.S. |
3,469 |
455 |
-85 |
-86 |
-87 |
-102 |
-103 |
-104 |
| M. |
Partners |
EU |
18,444 |
11,881 |
793 |
796 |
794 |
905 |
906 |
906 |
| M. |
Partners |
NL |
599,278 |
-38,164 |
-3,170 |
-3,128 |
-3,127 |
-1,528 |
-1,490 |
-1,474 |
| M. |
Partners |
not EU |
19,776 |
13,709 |
1,794 |
1,793 |
1,793 |
1,785 |
1,790 |
1,791 |
| M. |
Partners |
N.S. |
1,682 |
1,846 |
69 |
69 |
69 |
78 |
78 |
79 |
| M. |
Sons/daughters |
EU |
4,733 |
8,319 |
-382 |
-382 |
-384 |
-370 |
-371 |
-374 |
| M. |
Sons/daughters |
NL |
367,905 |
-18,435 |
1,049 |
1,076 |
1,072 |
1,308 |
1,333 |
1,346 |
| M. |
Sons/daughters |
not EU |
8,622 |
8,966 |
-1,118 |
-1,120 |
-1,117 |
-1,240 |
-1,239 |
-1,233 |
| M. |
Sons/daughters |
N.S. |
1,592 |
1,103 |
90 |
90 |
91 |
77 |
77 |
78 |
| RMSE |
F. |
Lone parents |
EU |
2,091 |
1,434 |
45 |
42 |
41 |
45 |
42 |
40 |
| F. |
Lone parents |
NL |
76,131 |
6,621 |
742 |
734 |
724 |
418 |
408 |
394 |
| F. |
Lone parents |
not EU |
3,120 |
1,514 |
67 |
64 |
64 |
71 |
68 |
66 |
| F. |
Lone parents |
N.S. |
646 |
155 |
22 |
21 |
20 |
26 |
25 |
24 |
| F. |
N.A. |
EU |
12,436 |
5,972 |
449 |
446 |
445 |
447 |
442 |
440 |
| F. |
N.A. |
NL |
293,960 |
12,001 |
1,260 |
1,245 |
1,222 |
1,433 |
1,374 |
1,348 |
| F. |
N.A. |
not EU |
9,509 |
7,317 |
1,038 |
1,037 |
1,037 |
1,075 |
1,075 |
1,076 |
| F. |
N.A. |
N.S. |
1,221 |
983 |
185 |
185 |
185 |
202 |
201 |
201 |
| F. |
Partners |
EU |
20,443 |
11,186 |
291 |
285 |
282 |
173 |
163 |
157 |
| F. |
Partners |
NL |
584,547 |
34,003 |
2,332 |
2,285 |
2,204 |
2,364 |
2,248 |
2,197 |
| F. |
Partners |
not EU |
26,877 |
12,023 |
456 |
450 |
447 |
330 |
327 |
327 |
| F. |
Partners |
N.S. |
1,292 |
1,838 |
46 |
44 |
43 |
48 |
48 |
47 |
| F. |
Sons/daughters |
EU |
4,368 |
7,541 |
787 |
787 |
787 |
860 |
862 |
863 |
| F. |
Sons/daughters |
NL |
321,364 |
8,742 |
2,820 |
2,796 |
2,781 |
2,959 |
2,903 |
2,879 |
| F. |
Sons/daughters |
not EU |
7,680 |
8,304 |
779 |
782 |
780 |
892 |
889 |
883 |
| F. |
Sons/daughters |
N.S. |
1,482 |
972 |
216 |
214 |
214 |
230 |
230 |
229 |
| M. |
Lone parents |
EU |
389 |
592 |
18 |
17 |
17 |
17 |
17 |
16 |
| M. |
Lone parents |
NL |
14,536 |
4,792 |
605 |
600 |
600 |
271 |
260 |
257 |
| M. |
Lone parents |
not EU |
372 |
707 |
38 |
38 |
37 |
55 |
55 |
55 |
| M. |
Lone parents |
N.S. |
75 |
101 |
27 |
27 |
27 |
29 |
29 |
29 |
| M. |
N.A. |
EU |
16,308 |
4,445 |
331 |
328 |
327 |
373 |
371 |
370 |
| M. |
N.A. |
NL |
253,493 |
3,742 |
1,390 |
1,349 |
1,314 |
2,959 |
2,931 |
2,911 |
| M. |
N.A. |
not EU |
13,636 |
5,549 |
913 |
912 |
911 |
1,033 |
1,031 |
1,028 |
| M. |
N.A. |
N.S. |
3,469 |
456 |
107 |
105 |
104 |
121 |
121 |
120 |
| M. |
Partners |
EU |
18,444 |
11,881 |
808 |
810 |
807 |
919 |
919 |
917 |
| M. |
Partners |
NL |
599,278 |
38,165 |
3,898 |
3,837 |
3,794 |
2,755 |
2,617 |
2,568 |
| M. |
Partners |
not EU |
19,776 |
13,709 |
1,804 |
1,803 |
1,803 |
1,797 |
1,800 |
1,800 |
| M. |
Partners |
N.S. |
1,682 |
1,846 |
88 |
87 |
85 |
98 |
95 |
95 |
| M. |
Sons/daughters |
EU |
4,733 |
8,319 |
403 |
403 |
403 |
401 |
401 |
402 |
| M. |
Sons/daughters |
NL |
367,905 |
18,437 |
1,728 |
1,723 |
1,687 |
1,905 |
1,872 |
1,854 |
| M. |
Sons/daughters |
not EU |
8,622 |
8,967 |
1,129 |
1,130 |
1,127 |
1,252 |
1,250 |
1,244 |
| M. |
Sons/daughters |
N.S. |
1,592 |
1,104 |
109 |
108 |
107 |
103 |
102 |
101 |
4.1.3 Restricted cells
In Table 4.5, the simulation results can be found for
the six cells that are restricted in the marginal cross-table between “Age” and
“Type of family nucleus”. Under “Frequency”, it can be seen that these six
cells should all contain zero observations. A combination of these scores is
logically impossible. Furthermore, it can be seen that due to misclassification
in observations containing these combinations of
scores are present when is used to estimate this cross-table directly.
In addition, it can be seen that if the MILC method is applied, such impossible
combinations of scores will never be present, regardless of the missingness
mechanism or the number of imputations. Furthermore, as the cells in this
marginal table contain zero observations, all cells of more detailed tables
covering these logically impossible combinations of scores automatically also
contain zero observations.
Table 4.5
Results in terms of bias and root mean squared error for the six restricted categories from cross-table between “Type of family nucleus” and the covariate “Age”
Table summary
This table displays the results of Results in terms of bias and root mean squared error for the six restricted categories from cross-table between “Type of family nucleus” and the covariate “Age” Type of family nucleus, Frequency,
, MCAR and MAR (appearing as column headers).
|
Type of family nucleus |
Frequency |
|
MCAR |
MAR |
|
|
|
|
|
|
|
| Bias |
Lone parents |
under 5 years |
0 |
377 |
0 |
0 |
0 |
0 |
0 |
0 |
| Lone parents |
5 to 9 years |
0 |
386 |
0 |
0 |
0 |
0 |
0 |
0 |
| Lone parents |
10 to 14 years |
0 |
376 |
0 |
0 |
0 |
0 |
0 |
0 |
| Partners |
under 5 years |
0 |
4,934 |
0 |
0 |
0 |
0 |
0 |
0 |
| Partners |
5 to 9 years |
0 |
5,041 |
0 |
0 |
0 |
0 |
0 |
0 |
| Partners |
10 to 14 years |
0 |
4,937 |
0 |
0 |
0 |
0 |
0 |
0 |
| RMSE |
Lone parents |
under 5 years |
0 |
377 |
0 |
0 |
0 |
0 |
0 |
0 |
| Lone parents |
5 to 9 years |
0 |
386 |
0 |
0 |
0 |
0 |
0 |
0 |
| Lone parents |
10 to 14 years |
0 |
377 |
0 |
0 |
0 |
0 |
0 |
0 |
| Partners |
under 5 years |
0 |
4,934 |
0 |
0 |
0 |
0 |
0 |
0 |
| Partners |
5 to 9 years |
0 |
5,041 |
0 |
0 |
0 |
0 |
0 |
0 |
| Partners |
10 to 14 years |
0 |
4,937 |
0 |
0 |
0 |
0 |
0 |
0 |
4.1.4 The complete population frequency table
Figures 4.1 and 4.2 show results in terms of bias and
root mean squared error (RMSE) when the complete census table, so the
cross-table between the six variables, is estimated. As these are 42,000 cells
in total, it is not feasible to evaluate them individually. Figure 4.1 and
Figure 4.2 give an overview of how size of the cell frequency is related
to the quality of the results. Here it can be seen that if are used, results in terms of bias and RMSE
are related directly to cell frequency. More specifically, the relationship
between cell frequency and absolute bias is approximately linear where the
amount of bias is approximately 10% of the cell frequency.

Description of Figure 4.1
Figure presenting the relationship between size of the cell frequency and quality of the results in terms of bias when the complete cross-table between the latent variables “Gender”, “Type of family nucleus” and “Citizen” and the three covariates “Age”, “Marital status” and “Place of birth” is estimated. The X-axis represents cell frequency and the Y-axis represents the bias. Results are shown for
(graph 1), MILC-MCAR-20 (graph 2) and MILC-MAR-20 (graph 3).

Description of Figure 4.2
Figure presenting the relationship between size of the cell frequency and quality of the results in terms of root mean squared error (RMSE) when the complete cross-table between the latent variables “Gender”, “Type of family nucleus” and “Citizen” and the three covariates “Age”, “Marital status” and “Place of birth” is estimated. The X-axis represents cell frequency and the Y-axis represents the RMSE. Results are shown for (graph 1), MILC-MCAR-20 (graph 2) and MILC-MAR-20 (graph 3).
4.2 Results in terms of variance
4.2.1 Univariate marginal frequencies of imputed
variables
In Table 4.6, the simulation results can be found that
cover the univariate marginal frequencies “Gender” in terms of se/sd. As this
ratio measures whether the average standard error estimated at each replication
in the simulation correctly describes the uncertainty (standard deviation) that
is found over the estimates, it should be close to one. In addition, as a
completely observed and finite population is assumed, variance is not estimated
when is used. The results obtained using MILC are
generally close to one and comparable to the results in terms of bias as only
minor differences can be found between different values for or between the different missingness
mechanisms.
Table 4.6
Results in terms of average standard error of the estimates divided by standard deviation over the estimates (se/sd) for the two categories of the imputed latent variable “Gender”
Table summary
This table displays the results of Results in terms of average standard error of the estimates divided by standard deviation over the estimates (se/sd) for the two categories of the imputed latent variable “Gender” Gender, Frequency, , MCAR and MAR (appearing as column headers).
|
Gender |
Frequency |
|
MCAR |
MAR |
|
|
|
|
|
|
|
| se/sd |
F. |
1,367,167 |
- |
1.0540 |
1.0317 |
1.0363 |
1.0030 |
1.0235 |
1.0237 |
| M. |
1,324,310 |
- |
1.0546 |
1.0317 |
1.0363 |
1.0034 |
1.0236 |
1.0236 |
In Table 4.7 and 4.8, the simulation results can be
found that cover the univariate marginal frequencies for “Type of family
nucleus” and “Citizen” respectively in terms of se/sd. The results found here
have a very comparable structure compared to the results we found for “Gender”.
Table 4.7
Results in terms of average standard error of the estimates divided by standard deviation over the estimates (se/sd) for the four observed categories of the imputed latent variable “Type of family nucleus”
Table summary
This table displays the results of Results in terms of average standard error of the estimates divided by standard deviation over the estimates (se/sd) for the four observed categories of the imputed latent variable “Type of family nucleus” Type of family nucleus, Frequency, , MCAR and MAR (appearing as column headers).
|
Type of family nucleus |
Frequency |
|
MCAR |
MAR |
|
|
|
|
|
|
|
| se/sd |
Lone parents |
97,360 |
- |
1.0457 |
1.0510 |
1.0529 |
1.0561 |
1.0337 |
1.0336 |
| N.A. |
604,032 |
- |
0.9706 |
0.9874 |
0.9922 |
0.9751 |
0.9829 |
0.9863 |
| Partners |
1,272,339 |
- |
1.0332 |
1.0418 |
1.0456 |
1.0052 |
1.0269 |
1.0298 |
| Sons/daughters |
717,746 |
- |
0.9594 |
0.9615 |
0.9606 |
0.9696 |
0.9880 |
0.9938 |
Table 4.8
Results in terms of average standard error of the estimates divided by standard deviation over the estimates for the four observed categories of the imputed latent variable “Citizen”
Table summary
This table displays the results of Results in terms of average standard error of the estimates divided by standard deviation over the estimates for the four observed categories of the imputed latent variable “Citizen” Type of family nucleus, Frequency, , MCAR and MAR (appearing as column headers).
|
Type of family nucleus |
Frequency |
|
MCAR |
MAR |
|
|
|
|
|
|
|
| se/sd |
Citizen EU |
79,212 |
- |
1.0417 |
1.0172 |
1.0362 |
1.0768 |
1.0539 |
1.0571 |
| Citizen NL |
2,511,214 |
- |
1.0136 |
1.0113 |
1.0235 |
1.0925 |
1.0645 |
1.0927 |
| Citizen not EU |
89,592 |
- |
0.9478 |
0.9632 |
0.9709 |
1.0282 |
0.9916 |
1.0125 |
| Not stated |
11,459 |
- |
1.0063 |
1.0208 |
1.0238 |
1.1057 |
1.0861 |
1.1143 |
4.2.2 Joint frequencies of imputed variables
In Table 4.9, the simulation results can be found that
cover the joint marginal frequencies of the imputed latent variables “Gender”,
“Type of family nucleus” and “Citizen” in terms of absolute se/sd. The results
found for these joint frequencies are very comparable to the results we found
for the marginal frequencies. For cells with a relatively low frequency, it can
be seen that the ratio is in general larger than one, indicating that the
variance estimated for these frequencies (and thereby the differences between
the imputations) incorporate more uncertainty than is actually found over
different replications. Summarizing, the uncertainty for cells containing low
frequencies is overestimated.
Results in terms for variance are not shown for the
restricted cells, as a variance term cannot be estimated here.
Table 4.9
Results in terms of average standard error of tde estimates divided by standard deviation over tde estimates for tde 32 observed categories of tde joint distribution of tde tdree imputed latent variables “Gender”, “Type of family nucleus” and “Citizen”
Table summary
This table displays tde results of Results in terms of average standard error of tde estimates divided by standard deviation over tde estimates for tde 32 observed categories of tde joint distribution of tde tdree imputed latent variables “Gender”. The information is grouped by Gender
Type of family nucleus
Citizen (appearing as row headers), Frequency, , MCAR and MAR (appearing as column headers).
| Gender
Type of family nucleus
Citizen |
Frequency |
|
MCAR |
MAR |
|
|
|
|
|
|
|
|
|
|
| F. |
Lone parents |
EU |
2,091 |
- |
1.1813 |
1.2097 |
1.2032 |
1.1495 |
1.1654 |
1.1997 |
| F. |
Lone parents |
NL |
76,131 |
- |
1.0371 |
1.0471 |
1.0504 |
1.0270 |
1.0252 |
1.0349 |
| F. |
Lone parents |
not EU |
3,120 |
- |
1.1659 |
1.1590 |
1.1519 |
1.1607 |
1.1634 |
1.1870 |
| F. |
Lone parents |
N.S. |
646 |
- |
1.0963 |
1.1004 |
1.1272 |
1.1110 |
1.1000 |
1.1054 |
| F. |
N.A. |
EU |
12,436 |
- |
1.0850 |
1.0838 |
1.1172 |
1.0888 |
1.1065 |
1.1456 |
| F. |
N.A. |
NL |
293,960 |
- |
1.0840 |
1.0652 |
1.0575 |
1.0158 |
1.0406 |
1.0461 |
| F. |
N.A. |
not EU |
9,509 |
- |
1.1636 |
1.1822 |
1.1892 |
1.1574 |
1.1383 |
1.1562 |
| F. |
N.A. |
N.S. |
1,221 |
- |
1.1789 |
1.1964 |
1.2097 |
1.1959 |
1.1826 |
1.2133 |
| F. |
Partners |
EU |
20,443 |
- |
1.0508 |
1.0537 |
1.0653 |
1.0689 |
1.0684 |
1.0925 |
| F. |
Partners |
NL |
584,547 |
- |
1.0313 |
1.0099 |
1.0189 |
1.0035 |
1.0253 |
1.0197 |
| F. |
Partners |
not EU |
26,877 |
- |
1.0532 |
1.0766 |
1.0720 |
1.0765 |
1.0725 |
1.0733 |
| F. |
Partners |
N.S. |
1,292 |
- |
1.1471 |
1.1566 |
1.1504 |
1.2157 |
1.1855 |
1.1940 |
| F. |
Sons/daughters |
EU |
4,368 |
- |
1.0135 |
1.0147 |
1.0338 |
1.0430 |
1.0518 |
1.0479 |
| F. |
Sons/daughters |
NL |
321,364 |
- |
1.0548 |
1.0379 |
1.0527 |
1.0017 |
1.0222 |
1.0221 |
| F. |
Sons/daughters |
not EU |
7,680 |
- |
0.9977 |
0.9966 |
0.9909 |
1.0249 |
1.0132 |
1.0416 |
| F. |
Sons/daughters |
N.S. |
1,482 |
- |
1.0344 |
1.0325 |
1.0357 |
1.0836 |
1.0688 |
1.0890 |
| M. |
Lone parents |
EU |
389 |
- |
1.3198 |
1.4136 |
1.4316 |
1.2941 |
1.3575 |
1.4470 |
| M. |
Lone parents |
NL |
14,536 |
- |
1.0784 |
1.0762 |
1.0736 |
1.0755 |
1.0690 |
1.0650 |
| M. |
Lone parents |
not EU |
372 |
- |
1.4159 |
1.3857 |
1.4511 |
1.4814 |
1.4481 |
1.4619 |
| M. |
Lone parents |
N.S. |
75 |
- |
1.4330 |
1.5192 |
1.5659 |
1.4598 |
1.5035 |
1.5373 |
| M. |
N.A. |
EU |
16,308 |
- |
1.0990 |
1.0908 |
1.1165 |
1.0894 |
1.1022 |
1.1366 |
| M. |
N.A. |
NL |
253,493 |
- |
1.0035 |
1.0100 |
1.0193 |
0.9920 |
1.0175 |
1.0238 |
| M. |
N.A. |
not EU |
13,636 |
- |
1.1168 |
1.1100 |
1.1141 |
1.0826 |
1.1054 |
1.0952 |
| M. |
N.A. |
N.S. |
3,469 |
- |
1.0241 |
1.0818 |
1.1052 |
1.1592 |
1.1478 |
1.1780 |
| M. |
Partners |
EU |
18,444 |
- |
1.1618 |
1.1593 |
1.1579 |
1.1473 |
1.1335 |
1.1476 |
| M. |
Partners |
NL |
599,278 |
- |
1.0668 |
1.0444 |
1.0487 |
1.0081 |
1.0329 |
1.0231 |
| F. |
Partners |
not EU |
19,776 |
- |
1.0932 |
1.0788 |
1.0816 |
1.0674 |
1.0612 |
1.0911 |
| F. |
Partners |
N.S. |
1,682 |
- |
1.1068 |
1.1411 |
1.1418 |
1.1335 |
1.1719 |
1.1770 |
| F. |
Sons/daughters |
EU |
4,733 |
- |
1.0598 |
1.0396 |
1.0548 |
1.0528 |
1.0497 |
1.0414 |
| F. |
Sons/daughters |
NL |
367,905 |
- |
1.0549 |
1.0347 |
1.0365 |
1.0098 |
1.0298 |
1.0340 |
| F. |
Sons/daughters |
not EU |
8,622 |
- |
1.0077 |
1.0093 |
1.0100 |
1.0413 |
1.0449 |
1.0471 |
| F. |
Sons/daughters |
N.S. |
1,592 |
- |
1.0472 |
1.0617 |
1.0699 |
1.0458 |
1.0362 |
1.0627 |
4.2.3 The complete population frequency table
In Figure 4.3, results can be found in terms of average
standard error of the cell frequencies divided by the standard deviation over
the frequencies estimated in the 500 replications in the simulation study
(se/sd). Here it can be seen that the standard error estimated per cell
frequency is especially too large when cell frequencies are close to zero, and
become closer to the nominal rate of one as the cell frequencies become larger.
Apparently, variability due to missing and conflicting values is overestimated
by MILC for cells with a frequency close to zero. In addition, this becomes
more apparent when the number of imputations increases and it is not influenced
by missingness mechanism.

Description of Figure 4.3
Figure illustrating the results in terms of average standard error of the cell frequencies divided by the standard deviation over the frequencies estimated in the 500 replications in the simulation study (se/sd), when the complete cross-table between the latent variables “Gender”, “Type of family nucleus” and “Citizen” and the three covariates “Age”, “Marital status” and “Place of birth” is estimated. The X-axis represents cell frequency and the Y-axis represents the se/sd ratio. Results are shown for MILC-MCAR-5, MILC-MCAR-10, MILC-MCAR-20, MILC-MAR-5, MILC-MAR-10 and MILC-MAR-20. Here it can be seen that the standard error estimated per cell frequency is especially too large when cell frequencies are close to zero, and become closer to the nominal rate of one as the cell frequencies become larger.
4.3 Sensitivity to violations of assumptions
The simulation study presented in this paper is aimed at
investigating the performance of the MILC method in a situation of
misclassification in a finite population setting. When applying the MILC method
in practice, a number of assumptions are made and during this simulation study
these assumptions were met. To further investigate the sensitivity to
violations of these assumptions, additional simulation studies were performed.
An important assumption made when applying the MILC method
is that the missingness mechanism is either MCAR or MAR. Therefore, a first
sensitivity analysis involves a Missing Not At Random (MNAR) mechanism. More
specifically, we generated this mechanism in such a way that the probability of
being missing in the survey indicator for “Type of family nucleus” depends on
the latent variable “type of family nucleus” and is smallest for the first
category and largest for the last category. In Table 4.10, it can be seen
that the bias and RMSE increase when the mechanism is MNAR compared to MAR,
while the se/sd is not affected. More specifically, it can be seen that the
extent of the bias relates to how much the respective class is affected by the
mechanism.
A second assumption states that the measurement error
present in the indicators is random. To investigate sensitivity to the
violation of this assumption, we generated a selective measurement error
mechanism where the probability of measurement error in the register indicator
for the variable “type of family nucleus” differs per category. Here, again the
first category is least affected and the last category most. In Table 4.10
it can be seen that the effect of this selective mechanism are limited. The
bias increases in a similar way as the percentage of measurement error in the
respective category increases, but these are still relatively low amounts of
bias. The se/sd is not affected by the mechanism.
Table 4.10
Results in terms of bias, root mean squared error and se/sd for the four observed categories of the imputed latent variable “Type of family nucleus”
Table summary
This table displays the results of Results in terms of bias Type of family nucleus, Frequency, (
, MAR, MNAR, Selective and ME covar (appearing as column headers).
|
Type of family nucleus |
Frequency |
|
MAR |
MNAR |
Selective |
ME covar |
| Bias |
Lone parents |
97,360 |
2,670 |
224 |
6,256 |
105 |
1,172,993 |
| N.A. |
604,032 |
8,985 |
-1,601 |
27,002 |
-1,824 |
534 |
| Partners |
1,272,339 |
-19,686 |
932 |
-11,341 |
1,116 |
-1,174,697 |
| Sons/daughters |
717,746 |
8,030 |
446 |
-21,917 |
603 |
1,170 |
| RMSE |
Lone parents |
97,360 |
2,672 |
426 |
6,268 |
332 |
1,172,994 |
| N.A. |
604,032 |
8,989 |
1,837 |
27,017 |
2,060 |
1,094 |
| Partners |
1,272,339 |
19,688 |
1,256 |
11,377 |
1,466 |
1,174,697 |
| Sons/daughters |
717,746 |
8,034 |
715 |
21,924 |
819 |
1,291 |
| se/sd |
Lone parents |
97,360 |
- |
1.0561 |
1.01936 |
1.0634 |
1.0518 |
| N.A. |
604,032 |
- |
0.9751 |
1.02491 |
0.9722 |
1.0471 |
| Partners |
1,272,339 |
- |
1.0052 |
0.97456 |
0.9291 |
0.9649 |
| Sons/daughters |
717,746 |
- |
0.9696 |
1.02547 |
1.0962 |
1.0181 |
A third assumption is that covariates do not contain
measurement error. This assumption is the most remarkable, as it is typically
often not the case that a coviarate does not contain measurement error. It is
more likely that these variables will be treated as such because no additional
information about their measurement error is known. If information was known,
for example because additional survey information was present, it would have
been incorporated by means of a latent variable. As in practice however there
is always a probability that for some variables such information is not known,
we investigate the sensitivity of the method to violation of this assumption.
More specifically, we generated 5% misclassification in the covariate “marital
status”, which has a relatively strong association with the latent variable “type
of family nucleus”. Indeed, the bias in some categories is highly affected by
this misclassification.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa