Analysis
Filter results by
Search HelpKeyword(s)
Year of publication
Author(s)
- Selected: Opsomer, Jean D. (12)
- Breidt, F. Jay (2)
- Da Silva, Damião Nóbrega (2)
- Engmark, Joseph D. (1)
- Erciulescu, Andreea L. (1)
- Han, Daifeng (1)
- Johnson, Alicia A. (1)
- Liu, Teng (1)
- Meyer, Mary C. (1)
- Oliva-Aviles, Cristian (1)
- Ranalli, M. Giovanna (1)
- Riddles, Minsun K. (1)
- Uppala, Medha (1)
- Wang, Haonan (1)
- Wang, Jianqiang C. (1)
Results
All (12)
All (12) (0 to 10 of 12 results)
- Articles and reports: 12-001-X202500200008Description: Classical design-based survey estimation relies on a properly specified sampling design for valid inference. We consider the properties of regression estimation under a misspecified sample design, in which the nominal and true inclusion probabilities do not necessarily match. This general misspecified sample design setting encompasses many challenges in the modern survey environment. Under this setting, an asymptotic analysis of the regression estimator, an expression of the bias, and an expression of the variance are presented. Further, a consistent variance estimator is derived and an expression which estimates the bias in-part or in-whole is discussed. This later expression may be used as an indicator of the presence of bias due to misspecification by a practitioner. A simulation study is conducted to support the presented theory.Release date: 2025-12-23
- Articles and reports: 12-001-X202500100006Description: Survey practitioners have increasingly embraced the benefits of modern machine learning techniques, including classification and regression tree algorithms, in the development of nonresponse adjustments. These methods, which do not require a predefined functional relationship between outcomes and predictors, offer a practical means of conducting variable selection and deriving interpretable structures that link response propensity with explanatory variables. However, when applying these algorithms to survey data, it is common to overlook crucial factors like sampling weights, as well as sample design features such as stratification and clustering. To bridge this shortcoming, we propose an extension of the Chi-square Automatic Interaction Detector (CHAID) approach, and we describe the design-based asymptotic properties of the resulting “survey CHAID” (sCHAID) method. To facilitate the practical use of sCHAID, we incorporate a Rao-Scott correction into the splitting criterion, accounting for the survey design. Using data from the U.S. American Community Survey, we illustrate the use of the method and evaluate its performance through comparisons with existing weighted and unweighted algorithms.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100012Description: In this discussion, we complement the excellent overview by Profs. Lohr and Rao with some additional topics. The first topic is a call for more recognition of the central role of modeling in survey estimation. The second is a brief discussion of the use of partial frame information in survey design. Finally, we draw the attention to recent increases of synthetic methods, in particular, multilevel regression and poststratification (MRP) in small area estimation applications.Release date: 2025-06-30
- Articles and reports: 12-001-X202400200009Description: Many studies face the problem of comparing estimates obtained with different survey methodology, including differences in frames, measurement instruments, and modes of delivery. The problem arises in multimode surveys and in surveys that are redesigned. Major redesign of survey processes could affect survey estimates systematically, and it is important to quantify and adjust for such discontinuities between the designs to ensure comparability of estimates over time. We propose a small area estimation approach to reconcile two sets of survey estimates, and apply it to two surveys in the Marine Recreational Information Program (MRIP), which monitors recreational fishing along the Atlantic and Gulf coasts of the United States. We develop a log-normal model for the estimates from the two surveys, accounting for temporal dynamics through regression on population size and state-by-wave seasonal factors, and accounting in part for changing coverage properties through regression on wireless telephone penetration. Using the estimated design variances, we develop a regression model that is analytically consistent with the log-normal mean model. We use the modeled design variances in a Fay-Herriot small area estimation procedure to obtain empirical best linear unbiased predictors of the reconciled estimates of fishing effort (requiring predictions at new sets of covariates), and provide an asymptotically valid mean square error approximation.Release date: 2024-12-20
- Articles and reports: 12-001-X202100200006Description:
Sample-based calibration occurs when the weights of a survey are calibrated to control totals that are random, instead of representing fixed population-level totals. Control totals may be estimated from different phases of the same survey or from another survey. Under sample-based calibration, valid variance estimation requires that the error contribution due to estimating the control totals be accounted for. We propose a new variance estimation method that directly uses the replicate weights from two surveys, one survey being used to provide control totals for calibration of the other survey weights. No restrictions are set on the nature of the two replication methods and no variance-covariance estimates need to be computed, making the proposed method straightforward to implement in practice. A general description of the method for surveys with two arbitrary replication methods with different numbers of replicates is provided. It is shown that the resulting variance estimator is consistent for the asymptotic variance of the calibrated estimator, when calibration is done using regression estimation or raking. The method is illustrated in a real-world application, in which the demographic composition of two surveys needs to be harmonized to improve the comparability of the survey estimates.
Release date: 2022-01-06 - Articles and reports: 12-001-X202000200002Description:
In many large-scale surveys, estimates are produced for numerous small domains defined by cross-classifications of demographic, geographic and other variables. Even though the overall sample size of such surveys might be very large, samples sizes for domains are sometimes too small for reliable estimation. We propose an improved estimation approach that is applicable when “natural” or qualitative relationships (such as orderings or other inequality constraints) can be formulated for the domain means at the population level. We stay within a design-based inferential framework but impose constraints representing these relationships on the sample-based estimates. The resulting constrained domain estimator is shown to be design consistent and asymptotically normally distributed as long as the constraints are asymptotically satisfied at the population level. The estimator and its associated variance estimator are readily implemented in practice. The applicability of the method is illustrated on data from the 2015 U.S. National Survey of College Graduates.
Release date: 2020-12-15 - Articles and reports: 12-001-X201400214118Description:
Bagging is a powerful computational method used to improve the performance of inefficient estimators. This article is a first exploration of the use of bagging in survey estimation, and we investigate the effects of bagging on non-differentiable survey estimators including sample distribution functions and quantiles, among others. The theoretical properties of bagged survey estimators are investigated under both design-based and model-based regimes. In particular, we show the design consistency of the bagged estimators, and obtain the asymptotic normality of the estimators in the model-based context. The article describes how implementation of bagging for survey estimators can take advantage of replicates developed for survey variance estimation, providing an easy way for practitioners to apply bagging in existing surveys. A major remaining challenge in implementing bagging in the survey context is variance estimation for the bagged estimators themselves, and we explore two possible variance estimation approaches. Simulation experiments reveal the improvement of the proposed bagging estimator relative to the original estimator and compare the two variance estimation approaches.
Release date: 2014-12-19 - 8. Innovations in survey sampling design: Discussion of three contributions presented at the U.S. Census Bureau ArchivedArticles and reports: 12-001-X201100211610Description:
In this paper, a discussion of the three papers from the US Census Bureau special compilation is presented.
Release date: 2011-12-21 - 9. Nonparametric propensity weighting for survey nonresponse through local polynomial regression ArchivedArticles and reports: 12-001-X200900211039Description:
Propensity weighting is a procedure to adjust for unit nonresponse in surveys. A form of implementing this procedure consists of dividing the sampling weights by estimates of the probabilities that the sampled units respond to the survey. Typically, these estimates are obtained by fitting parametric models, such as logistic regression. The resulting adjusted estimators may become biased when the specified parametric models are incorrect. To avoid misspecifying such a model, we consider nonparametric estimation of the response probabilities by local polynomial regression. We study the asymptotic properties of the resulting estimator under quasi-randomization. The practical behavior of the proposed nonresponse adjustment approach is evaluated on NHANES data.
Release date: 2009-12-23 - 10. Endogenous post-stratification in surveys ArchivedArticles and reports: 11-536-X200900110810Description:
Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from remote sensing data, classified into categories and displayed as pixel-based maps. These maps may be constructed based on classification models fitted to the sample data. Post-stratification of the sample data based on categories derived from the sample data ("endogenous post-stratification") violates several standard post-stratification assumptions, and has been generally considered invalid as a design-based estimation method. In this presentation, properties of the endogenous post-stratification estimator are derived for the case of a sample-fitted generalized linear model. Design consistency of the endogenous post-stratification estimator is established under mild conditions. Under a superpopulation model, consistency and asymptotic normality of the endogenous post-stratification estimator are established. Simulation experiments demonstrate that the practical effect of first fitting a model to the survey data before post-stratifying is small, even for relatively small sample sizes.
Release date: 2009-08-11
Articles and reports (12)
Articles and reports (12) (0 to 10 of 12 results)
- Articles and reports: 12-001-X202500200008Description: Classical design-based survey estimation relies on a properly specified sampling design for valid inference. We consider the properties of regression estimation under a misspecified sample design, in which the nominal and true inclusion probabilities do not necessarily match. This general misspecified sample design setting encompasses many challenges in the modern survey environment. Under this setting, an asymptotic analysis of the regression estimator, an expression of the bias, and an expression of the variance are presented. Further, a consistent variance estimator is derived and an expression which estimates the bias in-part or in-whole is discussed. This later expression may be used as an indicator of the presence of bias due to misspecification by a practitioner. A simulation study is conducted to support the presented theory.Release date: 2025-12-23
- Articles and reports: 12-001-X202500100006Description: Survey practitioners have increasingly embraced the benefits of modern machine learning techniques, including classification and regression tree algorithms, in the development of nonresponse adjustments. These methods, which do not require a predefined functional relationship between outcomes and predictors, offer a practical means of conducting variable selection and deriving interpretable structures that link response propensity with explanatory variables. However, when applying these algorithms to survey data, it is common to overlook crucial factors like sampling weights, as well as sample design features such as stratification and clustering. To bridge this shortcoming, we propose an extension of the Chi-square Automatic Interaction Detector (CHAID) approach, and we describe the design-based asymptotic properties of the resulting “survey CHAID” (sCHAID) method. To facilitate the practical use of sCHAID, we incorporate a Rao-Scott correction into the splitting criterion, accounting for the survey design. Using data from the U.S. American Community Survey, we illustrate the use of the method and evaluate its performance through comparisons with existing weighted and unweighted algorithms.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100012Description: In this discussion, we complement the excellent overview by Profs. Lohr and Rao with some additional topics. The first topic is a call for more recognition of the central role of modeling in survey estimation. The second is a brief discussion of the use of partial frame information in survey design. Finally, we draw the attention to recent increases of synthetic methods, in particular, multilevel regression and poststratification (MRP) in small area estimation applications.Release date: 2025-06-30
- Articles and reports: 12-001-X202400200009Description: Many studies face the problem of comparing estimates obtained with different survey methodology, including differences in frames, measurement instruments, and modes of delivery. The problem arises in multimode surveys and in surveys that are redesigned. Major redesign of survey processes could affect survey estimates systematically, and it is important to quantify and adjust for such discontinuities between the designs to ensure comparability of estimates over time. We propose a small area estimation approach to reconcile two sets of survey estimates, and apply it to two surveys in the Marine Recreational Information Program (MRIP), which monitors recreational fishing along the Atlantic and Gulf coasts of the United States. We develop a log-normal model for the estimates from the two surveys, accounting for temporal dynamics through regression on population size and state-by-wave seasonal factors, and accounting in part for changing coverage properties through regression on wireless telephone penetration. Using the estimated design variances, we develop a regression model that is analytically consistent with the log-normal mean model. We use the modeled design variances in a Fay-Herriot small area estimation procedure to obtain empirical best linear unbiased predictors of the reconciled estimates of fishing effort (requiring predictions at new sets of covariates), and provide an asymptotically valid mean square error approximation.Release date: 2024-12-20
- Articles and reports: 12-001-X202100200006Description:
Sample-based calibration occurs when the weights of a survey are calibrated to control totals that are random, instead of representing fixed population-level totals. Control totals may be estimated from different phases of the same survey or from another survey. Under sample-based calibration, valid variance estimation requires that the error contribution due to estimating the control totals be accounted for. We propose a new variance estimation method that directly uses the replicate weights from two surveys, one survey being used to provide control totals for calibration of the other survey weights. No restrictions are set on the nature of the two replication methods and no variance-covariance estimates need to be computed, making the proposed method straightforward to implement in practice. A general description of the method for surveys with two arbitrary replication methods with different numbers of replicates is provided. It is shown that the resulting variance estimator is consistent for the asymptotic variance of the calibrated estimator, when calibration is done using regression estimation or raking. The method is illustrated in a real-world application, in which the demographic composition of two surveys needs to be harmonized to improve the comparability of the survey estimates.
Release date: 2022-01-06 - Articles and reports: 12-001-X202000200002Description:
In many large-scale surveys, estimates are produced for numerous small domains defined by cross-classifications of demographic, geographic and other variables. Even though the overall sample size of such surveys might be very large, samples sizes for domains are sometimes too small for reliable estimation. We propose an improved estimation approach that is applicable when “natural” or qualitative relationships (such as orderings or other inequality constraints) can be formulated for the domain means at the population level. We stay within a design-based inferential framework but impose constraints representing these relationships on the sample-based estimates. The resulting constrained domain estimator is shown to be design consistent and asymptotically normally distributed as long as the constraints are asymptotically satisfied at the population level. The estimator and its associated variance estimator are readily implemented in practice. The applicability of the method is illustrated on data from the 2015 U.S. National Survey of College Graduates.
Release date: 2020-12-15 - Articles and reports: 12-001-X201400214118Description:
Bagging is a powerful computational method used to improve the performance of inefficient estimators. This article is a first exploration of the use of bagging in survey estimation, and we investigate the effects of bagging on non-differentiable survey estimators including sample distribution functions and quantiles, among others. The theoretical properties of bagged survey estimators are investigated under both design-based and model-based regimes. In particular, we show the design consistency of the bagged estimators, and obtain the asymptotic normality of the estimators in the model-based context. The article describes how implementation of bagging for survey estimators can take advantage of replicates developed for survey variance estimation, providing an easy way for practitioners to apply bagging in existing surveys. A major remaining challenge in implementing bagging in the survey context is variance estimation for the bagged estimators themselves, and we explore two possible variance estimation approaches. Simulation experiments reveal the improvement of the proposed bagging estimator relative to the original estimator and compare the two variance estimation approaches.
Release date: 2014-12-19 - 8. Innovations in survey sampling design: Discussion of three contributions presented at the U.S. Census Bureau ArchivedArticles and reports: 12-001-X201100211610Description:
In this paper, a discussion of the three papers from the US Census Bureau special compilation is presented.
Release date: 2011-12-21 - 9. Nonparametric propensity weighting for survey nonresponse through local polynomial regression ArchivedArticles and reports: 12-001-X200900211039Description:
Propensity weighting is a procedure to adjust for unit nonresponse in surveys. A form of implementing this procedure consists of dividing the sampling weights by estimates of the probabilities that the sampled units respond to the survey. Typically, these estimates are obtained by fitting parametric models, such as logistic regression. The resulting adjusted estimators may become biased when the specified parametric models are incorrect. To avoid misspecifying such a model, we consider nonparametric estimation of the response probabilities by local polynomial regression. We study the asymptotic properties of the resulting estimator under quasi-randomization. The practical behavior of the proposed nonresponse adjustment approach is evaluated on NHANES data.
Release date: 2009-12-23 - 10. Endogenous post-stratification in surveys ArchivedArticles and reports: 11-536-X200900110810Description:
Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from remote sensing data, classified into categories and displayed as pixel-based maps. These maps may be constructed based on classification models fitted to the sample data. Post-stratification of the sample data based on categories derived from the sample data ("endogenous post-stratification") violates several standard post-stratification assumptions, and has been generally considered invalid as a design-based estimation method. In this presentation, properties of the endogenous post-stratification estimator are derived for the case of a sample-fitted generalized linear model. Design consistency of the endogenous post-stratification estimator is established under mild conditions. Under a superpopulation model, consistency and asymptotic normality of the endogenous post-stratification estimator are established. Simulation experiments demonstrate that the practical effect of first fitting a model to the survey data before post-stratifying is small, even for relatively small sample sizes.
Release date: 2009-08-11