A design effect measure for calibration weighting in single-stage samples
4. Empirical evaluationA design effect measure for calibration weighting in single-stage samples
4. Empirical evaluation
We conducted two
simulation studies using data that mimic single-stage sampling. The first
utilizes publically-available data from tax returns and continuous variables of
interest, while the second examines the performance of the alternative measures
for a binary outcome measure in a single-stage survey.
4.1 Establishment data
simulation study
Here a sample
dataset of tax return data is used to mimic an establishment survey setup. The
data come from the Tax Year 2007 Statistics of Income (SOI) Form 990 Exempt Organization (EO) sample. This is a stratified Bernoulli sample of 22,430 EO
tax returns selected from 428,719 filed with and processed by the IRS between
December 2007 and November 2009. This sample dataset, along with the population
frame data, is free and electronically available online (Statistics of Income
2011). These data make a candidate “establishment-type” dataset for estimating
design effects, in which Kish’s design effect may not apply.
The SOI EO sample
dataset is used here as a pseudopopulation for illustration. Four variables of
interest are used: Total Assets, Total Liabilities, Total Revenue, and Total
Expenses. Returns that were sampled with certainty or that had “very small”
assets (defined by having Total Assets less than $1,000,000, including zero)
were removed, leaving 8,914 units. We then randomly replicated and perturbed
the data to create a pseudopopulation of 50,000 units. We used simple random
sampling with replacement to select more observations, then the additional data
values were perturbed using the “jitter” (Chambers, Cleveland, Kleiner and Tukey 1983)
function in R.
Figure 4.1 shows a
pairwise plot of the pseudo-population, including plots of the variable values
against each other in the lower left panels, histograms on the diagonal panels,
and the correlations among the variables in the upper right panels. This plot
mimics establishment-type data patterns. From the diagonal panels, we see that
the variables of interest are all highly skewed. From the lower left panels,
there exists a range of different relationships among them. The Total Assets
variable is less related to Total Revenue and Total Expenses (with moderate
correlations of
Total Revenue and Total Expenses
are highly correlated.
Description for Figure 4.1
Figure 4.1 shows a pairwise plot of the pseudo-population, including plots of the variable values against each other in the lower left panels, histograms on the diagonal panels, and the correlations among the variables in the upper right panels. The variables are Total Assets, Total Liabilities, Total Revenue and Total Expenses. From the diagonal panels, we see that the variables of interest are all highly left-skewed, the less skewed variable being Total Assets. From the lower left panels, there exists a range of different relationships among them. Correlations between the variables are given in the following table.
Data table for Figure 4.1 Table summary
This table displays the results of Data table for Figure 4.1. The information is grouped by Variables (appearing as row headers), Total Assets, Total Liabilities, Total Revenue and Total Expenses (appearing as column headers).
Variables
Total Assets
Total Liabilities
Total Revenue
Total Expenses
Total Assets
1
0.56
0.44
0.41
Total Liabilities
0.56
1
0.42
0.44
Total Revenue
0.44
0.42
1
0.99
Total Expenses
0.41
0.44
0.99
1
Three sizes of
samples were selected
without replacement from the
pseudopopulation using the square root of Total Assets as a measure of size. This
type of sampling is referred to as
sampling subsequently. The HT
weights were then calibrated using the “linear”
method in the “calibrate” function in the “survey” package for R (corresponding to a GREG
estimator, Lumley 2012) to match the totals of an intercept, Total Assets and
Total Revenue. The analysis variables are Total Liabilities and Total Expenses.
(Note that we follow the common practice of developing procedures in the
previous sections using formulas for with-replacement sampling but empirically
evaluating them in without-replacement samples, which are the type used in
applications.)
Eight design
effects estimates are considered:
Estimates
of the design effect measures (2.2) and (2.3). Expression (2.2) reflects the
efficiency of
sampling and use of the
estimator. Expression (2.3)
reflects gains (if any) of
sampling combined with GREG
estimation;
The Kish
measure (2.4) computed using the GREG weights;
Three
Spencer measures computed using the GREG weights: (i) the exact measure that
estimates (2.5), (ii) the
approximation (2.7) assuming zero correlation terms, and (iii) the
large-population approximation (2.9). The Spencer measures are designed to
reflect gains due to
sampling and use of the
estimator. It does not account
for any gains due to calibration.
Two
proposed measures: (i) the exact proposed single-stage design effect (3.4) and
(ii) the zero-correlation approximation (3.5). Both of these are meant to show
the precision gains (if any) of
sampling combined with GREG
estimation.
Note that neither
the Spencer nor the proposed measures account for any reduction in variances
due to sampling a large fraction of the population.
We selected ten
thousand samples to further understand the empirical behavior of the
alternative design effect estimators. The empirical relbiases and ratio of the
mean square errors (MSE’s) of the totals are
where
is an estimated total from sample
(either HT or GREG),
is the number of samples
selected, and
and
are the estimated HT and GREG
totals from sample
The empirical
of an estimated total is computed
as
where
and
The results for
relbiases and MSEs are shown in Table 4.1. Both estimators of totals are
approximately unbiased. The GREG is also more precise than the HT estimator,
especially for Total Expenses, as evidenced by the MSE ratios larger than one.
Table 4.1
Simulation results of HT and GREG totals, 10,000 samples drawn from the SOI 2007 pseudopopulation EO data Table summary
This table displays the results of Simulation results of HT and GREG totals Variable of Interest, Total Liabilities
(weakly correlated with X) and Total Expenses
(strongly correlated with X) , calculated using XXXXX units of measure (appearing as column headers).
Variable of Interest
Total Liabilities
(weakly correlated with X)
Total Expenses
(strongly correlated with X)
Estimates
Percent relbias(HT)
-0.13
0.07
0.03
-0.64
0.05
0.07
Percent relbias(GREG)
0.37
0.27
0.14
-0.12
-0.01
0.00
MSE ratio
1.17
1.20
1.19
34.89
50.11
48.26
Note: A small number of samples were dropped in which either the matrix to be inverted for the GREG was singular or the GREG produced negative weights. The percentages of samples dropped were 3.6% for 1.2% for and 0.5% for
We also computed
the biases of the various estimated design effects across the 10,000 samples.
The relbiases of the Kish, Spencer, and proposed design effect estimates are
computed as
and
where
and
are the average Kish, Spencer,
and proposed
over all samples. The terms
and
are computed in two ways: (1) as
the simulation
of
and (2) as the average over all
samples of the
of
computed from the “survey” package. The “survey” package’s default method of estimating the
from a particular sample uses a
with-replacement variance estimate in the numerator. This corresponds to the
sample design used to derive
Results are displayed in Table 4.2.
For both variables
of interest, we see large positive biases for the Kish design effect, and the
design effects involving approximations. Thus, ignoring correlation components
accounted for in the ‘exact’ Spencer and proposed design effects would lead to
over-estimating the design effects.
The proposed
estimator is closer to the “survey”
package design effects than to the empirical simulation
of the GREG. Although the
relbiases of
are fairly large for Total
Expenses when computed with respect to
the empirical
themselves are small. We
highlight the small magnitude of the Total Expenses
variable
of 0.02 to put the relbiases into
context. For example, the relbias of 12.9% for the exact version of our
proposed estimator for
for
corresponds to a difference in
the third decimal place. Specifically, in this scenario, on average we
over-estimate the
by 0.003.
We can understand
why calibration is more efficient for Expenses than for Liabilities by examining
the distributions of
and
in one particular sample. Figures
4.2 and 4.3 show boxplots of
and
for each variable and sample
size.
Table 4.2
Relative bias of design effect estimates, 10,000 samples drawn from the SOI 2007 pseudopopulation EO data Table summary
This table displays the results of Relative bias of design effect estimates Variable of Interest, Total Liabilities
(weakly correlated with X) and Total Expenses
(strongly correlated with X) , calculated using Relative biases w.r.t. empirical deff’s and Relative biases w.r.t. average of survey package deff’s units of measure (appearing as column headers).
Variable of Interest
Total Liabilities
(weakly correlated with X)
Total Expenses
(strongly correlated with X)
Empirical deff’s*
HT
0.51
0.50
0.50
0.56
0.65
0.64
GREG
0.43
0.42
0.42
0.02
0.02
0.02
Relative biases w.r.t. empirical deff’s
Kish**
158.7
158.3
158.3
132.8
101.7
104.7
Spencer**
Exact
2.6
2.0
1.8
9.9
-4.5
-2.2
Zero-corr. approx.
96.1
98.0
98.4
91.2
70.1
73.7
Large approx.
96.7
98.9
99.3
101.7
78.1
81.7
Proposed***
Exact
-6.3
-1.6
0.2
25.3
12.9
8.1
Zero-corr. approx.
83.4
94.0
98.2
129.9
116.6
108.7
Relative biases w.r.t. average of “survey” package deff’s
Kish**
219.7
211.3
209.4
6400.5
7786.2
8287.2
Spencer**
Exact
3.1
0.8
0.5
3.5
-1.0
-1.5
Zero-corr. approx.
97.1
95.8
95.8
80.1
76.2
74.8
Large approx.
97.7
96.7
96.7
90.0
84.5
82.8
Proposed***
Exact
-0.9
-0.2
-0.1
11.3
-0.4
-0.1
Zero-corr. approx.
94.0
96.8
97.6
104.2
91.0
93.0
Note: * Averages across the simulated samples;
Note: ** relative to the average of empirical HT deff’s;
Note: *** relative to the average of empirical GREG deff’s.
Description for Figure 4.2
Boxplots of and values for sample sizes for total liabilities variable. The values in all of these samples have shorter ranges of values, less extreme values and less variation than
Description for Figure 4.3
Boxplots of and values for sample sizes for total expenses variable. The values in all of these samples have shorter ranges of values, less extreme values and less variation than
The
values in all of these samples
have shorter ranges of values and less variation than
particularly for the Total
Expenses variable. This occurs since the Total Expenses variable is highly
correlated with the calibration variable Total Revenue (see Figure 4.1) and
explains why the direct and proposed design effect measures are so much smaller
for Total Expenses.
4.2 Simulation study with
a binary variable
The second
simulation study illustrates the performance of the proposed estimator when
estimating the total of a binary variable in a single-stage survey that uses
poststratification.
We use the “nhis.large” population, which has
units, from the “PracTools” R package (Valliant, Dever and Kreuter 2015) to gauge the impact
of poststratification weighting adjustments. The binary variable used is
whether or not a person received Medicaid or not. Receipt of Medicaid, which is
a social welfare program in the US, is an example of a variable that is
collected in some telephone surveys. Missing values of Medicaid recipiency were
recoded to be “no” responses. There is a fairly strong relationship between
race-ethnicity, age, and whether Medicaid is received, as shown in Table 4.3 or
Table 14.1 in Valliant, Dever and Kreuter
(2013). The 15 age × race-ethnicity cells in the table will be
used as poststrata, which is a typical procedure in telephone surveys.
Table 4.3
Population percentages of persons receiving medicaid, by age group and Hispanic status Table summary
This table displays the results of Population percentages of persons receiving medicaid. The information is grouped by Age Group (appearing as row headers), Hispanic Status (appearing as column headers).
Age Group
Hispanic Status
Hispanic
Non-Hispanic White
Non-Hispanic Black or Other
< 18 years
31.8
12.9
30.9
18-24
10.5
6.5
12.2
25-44
7.5
3.8
8.6
45-64
2.4
3.0
6.2
65+
26.8
3.7
16.2
In our simulation,
we selected 10,000 simple random samples without replacement from the NHIS
population. The HT estimator for the total number of persons receiving Medicaid
is
where
is the proportion in sample
that receives Medicaid. Due to
the relatively large number of poststrata and varying number of persons
receiving Medicaid by poststratum, we include results only for samples of size
since no collapsing of poststrata
within a given particular sample was needed for these sample sizes.
The base weights
for the
estimator are simply
The variance of the
poststratified estimator is 91% of that of
in samples of
and 88% in samples of
Since the base weights are
constant, Spencer’s design effects are not computable in this example. Therefore,
only results for the Kish and proposed design effects are shown in Table 4.4.
Table 4.4
Relative bias of design effect estimates, 10,000 samples drawn from the NHIS pseudopopulation data Table summary
This table displays the results of Relative bias of design effect estimates Number of Persons Receiving Medicaid, calculated using XXXXX units of measure (appearing as column headers).
Number of Persons Receiving Medicaid
Empirical deff’s*
HT
0.97
0.95
GREG
0.91
0.88
w.r.t.
empirical deff
w.r.t.
“survey” deff
w.r.t.
empirical deff
w.r.t.
“survey” deff
Relative biases (percent)
Kish**
6.0
17.5
7.0
17.6
Proposed***
Exact
-1.4
3.2
-0.9
5.0
Zero-corr. approx.
-1.5
2.9
-1.2
4.7
* Averages across the simulated samples;
** relative to the average of empirical HT deff’s.
*** relative to the average of empirical GREG deff’s.
The Kish design
effect has positive biases of 17.5% and 17.6% when computed with respect to the
empirical
The exact proposed design effects
are positively biased with respect to the “survey”
(3.2 and 5.0%), but much less so
than the Kish estimator. In this example, the zero-correlation approximation is
very similar to the exact version of the proposed estimator. The correlation
components were negligible for these weighting adjustments within three decimal
places.
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.