Sort Help
entries

Results

All (4)

All (4) ((4 results))

  • Articles and reports: 12-001-X202100100002
    Description:

    We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty about its parameters through a prior distribution. The method is illustrated with a real dataset, yielding satisfactory results. As a baseline, we use the strategy that couples probability proportional-to-size sampling with the difference estimator, as it is known to be optimal when the superpopulation model is fully known. We show that, even under moderate misspecifications of the model, this strategy is not robust and can be outperformed by some alternatives.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100004
    Description:

    Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100007
    Description:

    We consider the estimation of a small area mean under the basic unit-level model. The sum of the resulting model-dependent estimators may not add up to estimates obtained with a direct survey estimator that is deemed to be accurate for the union of these small areas. Benchmarking forces the model-based estimators to agree with the direct estimator at the aggregated area level. The generalized regression estimator is the direct estimator that we benchmark to. In this paper we compare small area benchmarked estimators based on four procedures. The first procedure produces benchmarked estimators by ratio adjustment. The second procedure is based on the empirical best linear unbiased estimator obtained under the unit-level model augmented with a suitable variable that ensures benchmarking. The third procedure uses pseudo-empirical estimators constructed with suitably chosen sampling weights so that, when aggregated, they agree with the reliable direct estimator for the larger area. The fourth procedure produces benchmarked estimators that are the result of a minimization problem subject to the constraint given by the benchmark condition. These benchmark procedures are applied to the small area estimators when the sampling rates are non-negligible. The resulting benchmarked estimators are compared in terms of relative bias and mean squared error using both a design-based simulation study as well as an example with real survey data.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100009
    Description:

    Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.

    Release date: 2021-06-24
Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (4)

Articles and reports (4) ((4 results))

  • Articles and reports: 12-001-X202100100002
    Description:

    We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty about its parameters through a prior distribution. The method is illustrated with a real dataset, yielding satisfactory results. As a baseline, we use the strategy that couples probability proportional-to-size sampling with the difference estimator, as it is known to be optimal when the superpopulation model is fully known. We show that, even under moderate misspecifications of the model, this strategy is not robust and can be outperformed by some alternatives.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100004
    Description:

    Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100007
    Description:

    We consider the estimation of a small area mean under the basic unit-level model. The sum of the resulting model-dependent estimators may not add up to estimates obtained with a direct survey estimator that is deemed to be accurate for the union of these small areas. Benchmarking forces the model-based estimators to agree with the direct estimator at the aggregated area level. The generalized regression estimator is the direct estimator that we benchmark to. In this paper we compare small area benchmarked estimators based on four procedures. The first procedure produces benchmarked estimators by ratio adjustment. The second procedure is based on the empirical best linear unbiased estimator obtained under the unit-level model augmented with a suitable variable that ensures benchmarking. The third procedure uses pseudo-empirical estimators constructed with suitably chosen sampling weights so that, when aggregated, they agree with the reliable direct estimator for the larger area. The fourth procedure produces benchmarked estimators that are the result of a minimization problem subject to the constraint given by the benchmark condition. These benchmark procedures are applied to the small area estimators when the sampling rates are non-negligible. The resulting benchmarked estimators are compared in terms of relative bias and mean squared error using both a design-based simulation study as well as an example with real survey data.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100009
    Description:

    Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.

    Release date: 2021-06-24
Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Date modified: