A design effect measure for calibration weighting in single-stage samples 4. Empirical evaluationA design effect measure for calibration weighting in single-stage samples 4. Empirical evaluation

We conducted two simulation studies using data that mimic single-stage sampling. The first utilizes publically-available data from tax returns and continuous variables of interest, while the second examines the performance of the alternative measures for a binary outcome measure in a single-stage survey.

4.1 Establishment data simulation study

Here a sample dataset of tax return data is used to mimic an establishment survey setup. The data come from the Tax Year 2007 Statistics of Income (SOI) Form 990 Exempt Organization (EO) sample. This is a stratified Bernoulli sample of 22,430 EO tax returns selected from 428,719 filed with and processed by the IRS between December 2007 and November 2009. This sample dataset, along with the population frame data, is free and electronically available online (Statistics of Income 2011). These data make a candidate “establishment-type” dataset for estimating design effects, in which Kish’s design effect may not apply.

The SOI EO sample dataset is used here as a pseudopopulation for illustration. Four variables of interest are used: Total Assets, Total Liabilities, Total Revenue, and Total Expenses. Returns that were sampled with certainty or that had “very small” assets (defined by having Total Assets less than $1,000,000, including zero) were removed, leaving 8,914 units. We then randomly replicated and perturbed the data to create a pseudopopulation of 50,000 units. We used simple random sampling with replacement to select more observations, then the additional data values were perturbed using the “jitter” (Chambers, Cleveland, Kleiner and Tukey 1983) function in R.

Figure 4.1 shows a pairwise plot of the pseudo-population, including plots of the variable values against each other in the lower left panels, histograms on the diagonal panels, and the correlations among the variables in the upper right panels. This plot mimics establishment-type data patterns. From the diagonal panels, we see that the variables of interest are all highly skewed. From the lower left panels, there exists a range of different relationships among them. The Total Assets variable is less related to Total Revenue and Total Expenses (with moderate correlations of $0.41 - 0.44);$ Total Revenue and Total Expenses are highly correlated.

Figure 4.1 of section 4 Design of the Dutch Labour Force Survey

Description for Figure 4.1

Figure 4.1 shows a pairwise plot of the pseudo-population, including plots of the variable values against each other in the lower left panels, histograms on the diagonal panels, and the correlations among the variables in the upper right panels. The variables are Total Assets, Total Liabilities, Total Revenue and Total Expenses. From the diagonal panels, we see that the variables of interest are all highly left-skewed, the less skewed variable being Total Assets. From the lower left panels, there exists a range of different relationships among them. Correlations between the variables are given in the following table.

Data table for Figure 4.1
Table summary
This table displays the results of Data table for Figure 4.1. The information is grouped by Variables (appearing as row headers), Total Assets, Total Liabilities, Total Revenue and Total Expenses (appearing as column headers).
Variables	Total Assets	Total Liabilities	Total Revenue	Total Expenses
Total Assets	1	0.56	0.44	0.41
Total Liabilities	0.56	1	0.42	0.44
Total Revenue	0.44	0.42	1	0.99
Total Expenses	0.41	0.44	0.99	1

Three sizes of samples were selected $(n = 100; 500; 1,000)$ without replacement from the pseudopopulation using the square root of Total Assets as a measure of size. This type of sampling is referred to as $π ps$ sampling subsequently. The HT weights were then calibrated using the “linear” method in the “calibrate” function in the “survey” package for R (corresponding to a GREG estimator, Lumley 2012) to match the totals of an intercept, Total Assets and Total Revenue. The analysis variables are Total Liabilities and Total Expenses. (Note that we follow the common practice of developing procedures in the previous sections using formulas for with-replacement sampling but empirically evaluating them in without-replacement samples, which are the type used in applications.)

Eight design effects estimates are considered:

Estimates of the design effect measures (2.2) and (2.3). Expression (2.2) reflects the efficiency of $π ps$ sampling and use of the $HT -$ estimator. Expression (2.3) reflects gains (if any) of $π ps$ sampling combined with GREG estimation;
The Kish measure (2.4) computed using the GREG weights;
Three Spencer measures computed using the GREG weights: (i) the exact measure that estimates (2.5), (ii) the approximation (2.7) assuming zero correlation terms, and (iii) the large-population approximation (2.9). The Spencer measures are designed to reflect gains due to $PPSWR$ sampling and use of the $pwr -$ estimator. It does not account for any gains due to calibration.
Two proposed measures: (i) the exact proposed single-stage design effect (3.4) and (ii) the zero-correlation approximation (3.5). Both of these are meant to show the precision gains (if any) of $PPSWR$ sampling combined with GREG estimation.

Note that neither the Spencer nor the proposed measures account for any reduction in variances due to sampling a large fraction of the population.

We selected ten thousand samples to further understand the empirical behavior of the alternative design effect estimators. The empirical relbiases and ratio of the mean square errors (MSE’s) of the totals are

$\begin{array}{l} relbias (\hat{T}) & = 100 \times \sum_{s = 1}^{S} ({\hat{T}}_{s} - T) / T \\ MSE ratio & = MSE ({\hat{T}}_{HT}) / MSE ({\hat{T}}_{GREG}) \\ = {\sum_{s = 1}^{S} ({\hat{T}}_{HT, s} - T)}^{2} / \sum_{s = 1}^{S} {({\hat{T}}_{GREG, s} - T)}^{2} \end{array}$

where ${\hat{T}}_{s}$ is an estimated total from sample $s$ (either HT or GREG), $S = 10,000$ is the number of samples selected, and ${\hat{T}}_{HT, s}$ and ${\hat{T}}_{GREG, s}$ are the estimated HT and GREG totals from sample $s .$ The empirical $deff$ of an estimated total is computed as $empdeff (\hat{T}) = S^{- 1} \sum_{s = 1}^{S} {({\hat{T}}_{s} - \bar{\hat{T}})}^{2} / {Var}_{srswr} ({\hat{T}}_{srswr})$ where $\bar{\hat{T}} = S^{- 1} \sum_{s = 1}^{S} {\hat{T}}_{s}$ and ${Var}_{srswr} ({\hat{T}}_{srswr}) ≐ N^{2} σ_{y}^{2} / n .$

The results for relbiases and MSEs are shown in Table 4.1. Both estimators of totals are approximately unbiased. The GREG is also more precise than the HT estimator, especially for Total Expenses, as evidenced by the MSE ratios larger than one.

Table 4.1
Simulation results of HT and GREG totals, 10,000 $π ps$ samples drawn from the SOI 2007 pseudopopulation EO data
Table summary
This table displays the results of Simulation results of HT and GREG totals Variable of Interest, Total Liabilities
(weakly correlated with X) and Total Expenses
(strongly correlated with X) , calculated using XXXXX units of measure (appearing as column headers).
	Total Liabilities (weakly correlated with X)			Total Expenses (strongly correlated with X)
	Variable of Interest
	$n = 100$	$n = 500$	$n = 1,000$	$n = 100$	$n = 500$	$n = 1,000$
Estimates
Percent relbias(HT)	-0.13	0.07	0.03	-0.64	0.05	0.07
Percent relbias(GREG)	0.37	0.27	0.14	-0.12	-0.01	0.00
MSE ratio	1.17	1.20	1.19	34.89	50.11	48.26
Note: A small number of samples were dropped in which either the matrix to be inverted for the GREG was singular or the GREG produced negative weights. The percentages of samples dropped were 3.6% for $n = 100,$ 1.2% for $n = 500,$ and 0.5% for $n = 1, 000.$

We also computed the biases of the various estimated design effects across the 10,000 samples. The relbiases of the Kish, Spencer, and proposed design effect estimates are computed as

$relbias ({deff}_{K}) = 100 \times ({\bar{deff}}_{K} - edeff ({\hat{T}}_{HT y})) / edeff ({\hat{T}}_{HT y}),$

$relbias ({deff}_{S}) = 100 \times ({\bar{deff}}_{S} - edeff ({\hat{T}}_{HT y})) / edeff ({\hat{T}}_{HT y}),$

and

$relbias ({deff}_{H}) = 100 \times ({\bar{deff}}_{H} - edeff ({\hat{T}}_{GREG})) / edeff ({\hat{T}}_{GREG})$

where ${\bar{deff}}_{K},$ ${\bar{deff}}_{S},$ and ${\bar{deff}}_{H}$ are the average Kish, Spencer, and proposed $deff ’ s$ over all samples. The terms $edeff ({\hat{T}}_{HT y})$ and $edeff ({\hat{T}}_{GREG})$ are computed in two ways: (1) as the simulation $empdeff$ of ${\hat{T}}_{HT y} (or {\hat{T}}_{GREG}),$ and (2) as the average over all samples of the $deff ’ s$ of ${\hat{T}}_{HT y}$ computed from the “survey” package. The “survey” package’s default method of estimating the $deff$ from a particular sample uses a with-replacement variance estimate in the numerator. This corresponds to the sample design used to derive ${deff}_{H} .$ Results are displayed in Table 4.2.

For both variables of interest, we see large positive biases for the Kish design effect, and the design effects involving approximations. Thus, ignoring correlation components accounted for in the ‘exact’ Spencer and proposed design effects would lead to over-estimating the design effects.

The proposed estimator is closer to the “survey” package design effects than to the empirical simulation $deff ’ s$ of the GREG. Although the relbiases of ${deff}_{H}$ are fairly large for Total Expenses when computed with respect to $edeff,$ the empirical $deff ’ s$ themselves are small. We highlight the small magnitude of the Total Expenses $(y_{2})$ variable $deff$ of 0.02 to put the relbiases into context. For example, the relbias of 12.9% for the exact version of our proposed estimator for $n = 500$ for $y_{2}$ corresponds to a difference in the third decimal place. Specifically, in this scenario, on average we over-estimate the $deff$ by 0.003.

We can understand why calibration is more efficient for Expenses than for Liabilities by examining the distributions of $y_{i}$ and $u_{i}$ in one particular sample. Figures 4.2 and 4.3 show boxplots of $u_{i}$ and $y_{i}$ for each variable and sample size.

Table 4.2
Relative bias of design effect estimates, 10,000 $π ps$ samples drawn from the SOI 2007 pseudopopulation EO data
Table summary
This table displays the results of Relative bias of design effect estimates Variable of Interest, Total Liabilities
(weakly correlated with X) and Total Expenses
(strongly correlated with X) , calculated using Relative biases w.r.t. empirical deff’s and Relative biases w.r.t. average of survey package deff’s units of measure (appearing as column headers).
	Total Liabilities (weakly correlated with X)			Total Expenses (strongly correlated with X)
	Variable of Interest
	$n = 100$	$n = 500$	$n = 1,000$	$n = 100$	$n = 500$	$n = 1,000$
Empirical deff’s*
HT	0.51	0.50	0.50	0.56	0.65	0.64
GREG	0.43	0.42	0.42	0.02	0.02	0.02
	Relative biases w.r.t. empirical deff’s
Kish**	158.7	158.3	158.3	132.8	101.7	104.7
Spencer**
Exact	2.6	2.0	1.8	9.9	-4.5	-2.2
Zero-corr. approx.	96.1	98.0	98.4	91.2	70.1	73.7
Large $- N$ approx.	96.7	98.9	99.3	101.7	78.1	81.7
Proposed***
Exact	-6.3	-1.6	0.2	25.3	12.9	8.1
Zero-corr. approx.	83.4	94.0	98.2	129.9	116.6	108.7
	Relative biases w.r.t. average of “survey” package deff’s
Kish**	219.7	211.3	209.4	6400.5	7786.2	8287.2
Spencer**
Exact	3.1	0.8	0.5	3.5	-1.0	-1.5
Zero-corr. approx.	97.1	95.8	95.8	80.1	76.2	74.8
Large $- N$ approx.	97.7	96.7	96.7	90.0	84.5	82.8
Proposed***
Exact	-0.9	-0.2	-0.1	11.3	-0.4	-0.1
Zero-corr. approx.	94.0	96.8	97.6	104.2	91.0	93.0
Note: * Averages across the simulated samples; Note: relative to the average of empirical HT deff’s; Note: * relative to the average of empirical GREG deff’s.

Figure 4.2 of section 4 Design of the Dutch Labour Force Survey

Description for Figure 4.2

Boxplots of $y_{i}$ and $u_{i} -$ values for sample sizes $n = 100, 500 and 1,000,$ for total liabilities variable. The $u_{i} -$ values in all of these samples have shorter ranges of values, less extreme values and less variation than $y_{i} .$

Figure 4.3 of section 4 Design of the Dutch Labour Force Survey

Description for Figure 4.3

Boxplots of $y_{i}$ and $u_{i} -$ values for sample sizes $n = 100, 500 and 1,000,$ for total expenses variable. The $u_{i} -$ values in all of these samples have shorter ranges of values, less extreme values and less variation than $y_{i} .$

The $u_{i} -$ values in all of these samples have shorter ranges of values and less variation than $y_{i},$ particularly for the Total Expenses variable. This occurs since the Total Expenses variable is highly correlated with the calibration variable Total Revenue (see Figure 4.1) and explains why the direct and proposed design effect measures are so much smaller for Total Expenses.

4.2 Simulation study with a binary variable

The second simulation study illustrates the performance of the proposed estimator when estimating the total of a binary variable in a single-stage survey that uses poststratification.

We use the “nhis.large” population, which has $N = 21,588$ units, from the “PracTools” R package (Valliant, Dever and Kreuter 2015) to gauge the impact of poststratification weighting adjustments. The binary variable used is whether or not a person received Medicaid or not. Receipt of Medicaid, which is a social welfare program in the US, is an example of a variable that is collected in some telephone surveys. Missing values of Medicaid recipiency were recoded to be “no” responses. There is a fairly strong relationship between race-ethnicity, age, and whether Medicaid is received, as shown in Table 4.3 or Table 14.1 in Valliant, Dever and Kreuter (2013). The 15 age × race-ethnicity cells in the table will be used as poststrata, which is a typical procedure in telephone surveys.

Table 4.3
Population percentages of persons receiving medicaid, by age group and Hispanic status
Table summary
This table displays the results of Population percentages of persons receiving medicaid. The information is grouped by Age Group (appearing as row headers), Hispanic Status (appearing as column headers).
Age Group	Hispanic Status
Age Group	Hispanic	Non-Hispanic White	Non-Hispanic Black or Other
< 18 years	31.8	12.9	30.9
18-24	10.5	6.5	12.2
25-44	7.5	3.8	8.6
45-64	2.4	3.0	6.2
65+	26.8	3.7	16.2

In our simulation, we selected 10,000 simple random samples without replacement from the NHIS population. The HT estimator for the total number of persons receiving Medicaid is $N {\bar{y}}_{s},$ where ${\bar{y}}_{s}$ is the proportion in sample $s$ that receives Medicaid. Due to the relatively large number of poststrata and varying number of persons receiving Medicaid by poststratum, we include results only for samples of size $n = 500 and 1,000$ since no collapsing of poststrata within a given particular sample was needed for these sample sizes.

The base weights for the $HT -$ estimator are simply $w_{i} = N / n .$ The variance of the poststratified estimator is 91% of that of $N {\bar{y}}_{s}$ in samples of $n = 500$ and 88% in samples of $n = 1,000 .$ Since the base weights are constant, Spencer’s design effects are not computable in this example. Therefore, only results for the Kish and proposed design effects are shown in Table 4.4.

Table 4.4
Relative bias of design effect estimates, 10,000 $p p s$ samples drawn from the NHIS pseudopopulation data
Table summary
This table displays the results of Relative bias of design effect estimates Number of Persons Receiving Medicaid, calculated using XXXXX units of measure (appearing as column headers).
	$n = 500$		$n = 1,000$
	Number of Persons Receiving Medicaid
Empirical deff’s*
HT	0.97		0.95
GREG	0.91		0.88
	w.r.t. empirical deff	w.r.t. “survey” deff	w.r.t. empirical deff	w.r.t. “survey” deff
Relative biases (percent)
Kish**	6.0	17.5	7.0	17.6
Proposed***
Exact	-1.4	3.2	-0.9	5.0
Zero-corr. approx.	-1.5	2.9	-1.2	4.7
* Averages across the simulated samples; relative to the average of empirical HT deff’s. * relative to the average of empirical GREG deff’s.

The Kish design effect has positive biases of 17.5% and 17.6% when computed with respect to the empirical $deff ’ s .$ The exact proposed design effects are positively biased with respect to the “survey” $deff$ (3.2 and 5.0%), but much less so than the Kish estimator. In this example, the zero-correlation approximation is very similar to the exact version of the proposed estimator. The correlation components were negligible for these weighting adjustments within three decimal places.

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Catalogue no. 12-001-X

Frequency: semi-annual

Ottawa

Date modified:: 2017-09-20

Language selection

Search and menus

Search

A design effect measure for calibration weighting in single-stage samples 4. Empirical evaluationA design effect measure for calibration weighting in single-stage samples 4. Empirical evaluation

4.1 Establishment data simulation study

4.2 Simulation study with a binary variable