Survey Methodology
Archived Content
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
Release date: June 21, 2018
The June 2018 issue of the journal Survey Methodology (Volume 44, Number 1) contains seven regular papers.
Regular papers
Model based inference using ranked set samples
by Omer Ozturk and Konul Bayramoglu Kavlak
This paper develops statistical inference based on super population model in a finite population setting using ranked set samples (RSS). The samples are constructed without replacement. It is shown that the sample mean of RSS is model unbiased and has smaller mean square prediction error (MSPE) than the MSPE of a simple random sample mean. Using an unbiased estimator of MSPE, the paper also constructs a prediction confidence interval for the population mean. A small scale simulation study shows that estimator is as good as a simple random sample (SRS) estimator for poor ranking information. On the other hand it has higher efficiency than SRS estimator when the quality of ranking information is good, and the cost ratio of obtaining a single unit in RSS and SRS is not very high. Simulation study also indicates that coverage probabilities of prediction intervals are very close to the nominal coverage probabilities. Proposed inferential procedure is applied to a real data set.
Linearization versus bootstrap for variance estimation of the change between Gini indexes
by Guillaume Chauvet and Camelia Goga
This paper investigates the linearization and bootstrap variance estimation for the Gini coefficient and the change between Gini indexes at two periods of time. For the one-sample case, we use the influence function linearization approach suggested by Deville (1999), the without-replacement bootstrap suggested by Gross (1980) for simple random sampling without replacement and the with-replacement of primary sampling units described in Rao and Wu (1988) for multistage sampling. To obtain a two-sample variance estimator, we use the linearization technique by means of partial influence functions (Goga, Deville and Ruiz-Gazen, 2009). We also develop an extension of the studied bootstrap procedures for two-dimensional sampling. The two approaches are compared on simulated data.
Growth Rates Preservation (GRP) temporal benchmarking: Drawbacks and alternative solutions
by Jacco Daalmans, Tommaso Di Fonzo, Nino Mushkudiani and Reinier Bikker
Benchmarking monthly or quarterly series to annual data is a common practice in many National Statistical Institutes. The benchmarking problem arises when time series data for the same target variable are measured at different frequencies and there is a need to remove discrepancies between the sums of the sub-annual values and their annual benchmarks. Several benchmarking methods are available in the literature. The Growth Rates Preservation (GRP) benchmarking procedure is often considered the best method. It is often claimed that this procedure is grounded on an ideal movement preservation principle. However, we show that there are important drawbacks to GRP, relevant for practical applications, that are unknown in the literature. Alternative benchmarking models will be considered that do not suffer from some of GRP’s side effects.
Investigating alternative estimators for the prevalence of serious mental illness based on a two-phase sample
by Phillip S. Kott, Dan Liao, Jeremy Aldworth, Sarra L. Hedden, Joseph C. Gfroerer, Jonaki Bose and Lisa Colpe
A two-phase process was used by the Substance Abuse and Mental Health Services Administration to estimate the proportion of US adults with serious mental illness (SMI). The first phase was the annual National Survey on Drug Use and Health (NSDUH), while the second phase was a random subsample of adult respondents to the NSDUH. Respondents to the second phase of sampling were clinically evaluated for serious mental illness. A logistic prediction model was fit to this subsample with the SMI status (yes or no) determined by the second-phase instrument treated as the dependent variable and related variables collected on the NSDUH from all adults as the model’s explanatory variables. Estimates were then computed for SMI prevalence among all adults and within adult subpopulations by assigning an SMI status to each NSDUH respondent based on comparing his (her) estimated probability of having SMI to a chosen cut point on the distribution of the predicted probabilities. We investigate alternatives to this standard cut point estimator such as the probability estimator. The latter assigns an estimated probability of having SMI to each NSDUH respondent. The estimated prevalence of SMI is the weighted mean of those estimated probabilities. Using data from NSDUH and its subsample, we show that, although the probability estimator has a smaller mean squared error when estimating SMI prevalence among all adults, it has a greater tendency to be biased at the subpopulation level than the standard cut point estimator.
Strategies for subsampling nonrespondents for economic programs
by Katherine Jenny Thompson, Stephen Kaputa and Laura Bechtel
The U.S. Census Bureau is investigating nonrespondent subsampling strategies for usage in the 2017 Economic Census. Design constraints include a mandated lower bound on the unit response rate, along with targeted industry-specific response rates. This paper presents research on allocation procedures for subsampling nonrespondents, conditional on the subsampling being systematic. We consider two approaches: (1) equal-probability sampling and (2) optimized allocation with constraints on unit response rates and sample size with the objective of selecting larger samples in industries that have initially lower response rates. We present a simulation study that examines the relative bias and mean squared error for the proposed allocations, assessing each procedure’s sensitivity to the size of the subsample, the response propensities, and the estimation procedure.
Robust Bayesian small area estimation
by Malay Ghosh, Jiyoun Myung and Fernando A.S. Moura
Small area models handling area level data typically assume normality of random effects. This assumption does not always work. The present paper introduces a new small area model with t random effects. Along with this, this paper also considers joint modeling of small area means and variances. The present approach is shown to perform better than other methods.
Model-assisted calibration of non-probability sample survey data using adaptive LASSO
by Jack Kuang Tsung Chen, Richard L. Valliant and Michael R. Elliott
The probability-sampling-based framework has dominated survey research because it provides precise mathematical tools to assess sampling variability. However increasing costs and declining response rates are expanding the use of non-probability samples, particularly in general population settings, where samples of individuals pulled from web surveys are becoming increasingly cheap and easy to access. But non-probability samples are at risk for selection bias due to differential access, degrees of interest, and other factors. Calibration to known statistical totals in the population provide a means of potentially diminishing the effect of selection bias in non-probability samples. Here we show that model calibration using adaptive LASSO can yield a consistent estimator of a population total as long as a subset of the true predictors is included in the prediction model, thus allowing large numbers of possible covariates to be included without risk of overfitting. We show that the model calibration using adaptive LASSO provides improved estimation with respect to mean square error relative to standard competitors such as generalized regression (GREG) estimators when a large number of covariates are required to determine the true model, with effectively no loss in efficiency over GREG when smaller models will suffice. We also derive closed form variance estimators of population totals, and compare their behavior with bootstrap estimators. We conclude with a real world example using data from the National Health Interview Survey.
- Date modified: