Analysis
Filter results by
Search HelpKeyword(s)
Year of publication
Author(s)
- Selected: Haziza, David (16)
- Beaumont, Jean-François (4)
- Rao, J.N.K. (3)
- Dagdoug, Mehdi (2)
- Tillé, Yves (2)
- Yung, Wesley (2)
- Ardilly, Pascal (1)
- Charbonnier, C. (1)
- Chen, Sixia (1)
- Chow, O.S.Y. (1)
- Favre-Martinoz, Cyril (1)
- Hidiroglou, Mike (1)
- Larbi, Khaled (1)
- Lavallée, Pierre (1)
- Neusy, Elisabeth (1)
- Picard, F. (1)
- Thompson, Katherine Jenny (1)
- Tsang, John (1)
Results
All (16)
All (16) (0 to 10 of 16 results)
- Articles and reports: 11-522-X202500100025Description: National statistical offices have increasingly adopted machine learning (ML) for its potential to improve survey estimates. ML techniques offer significant advantages, notably the ability to manage high-dimensional data and to capture complex, nonlinear relationships, thereby enhancing the overall quality of survey statistics. In this article, following the approach of Chernozhukov et al. (2018), we describe a double debiased machine learning framework that enables valid statistical inference when imputed estimators are derived from ML procedures. Simulation results suggest that the proposed framework performs well in a wide range of scenarios.Release date: 2025-09-08
- Articles and reports: 12-001-X202500100003Description: In recent years, there has been a significant interest in machine learning in national statistical offices. Thanks to their flexibility, these methods may prove useful at the nonresponse treatment stage. In this article, we conduct an empirical investigation in order to compare several machine learning procedures in terms of bias and efficiency. In addition to the classical machine learning procedures, we assess the performance of ensemble approaches that make use of different machine learning procedures to produce a set of weights adjusted for nonresponse.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100013Description: This discussion of the paper by Rao and Lohr focuses on the use of machine learning procedures for estimating finite population parameters. While there is growing interest in these methods within national statistical offices, several areas remain largely unexplored and warrant significant attention in the coming years. In this discussion, I highlight potential topics for future research and development in this rapidly evolving field.Release date: 2025-06-30
- Articles and reports: 12-001-X202300200017Description: Jean-Claude Deville, who passed away in October 2021, was one of the most influential researchers in the field of survey statistics over the past 40 years. This article traces some of his contributions that have had a profound impact on both survey theory and practice. This article will cover the topics of balanced sampling using the cube method, calibration, the weight-sharing method, the development of variance expressions of complex estimators using influence function and quota sampling.Release date: 2024-01-03
- Articles and reports: 12-001-X202200100006Description:
In the last two decades, survey response rates have been steadily falling. In that context, it has become increasingly important for statistical agencies to develop and use methods that reduce the adverse effects of non-response on the accuracy of survey estimates. Follow-up of non-respondents may be an effective, albeit time and resource-intensive, remedy for non-response bias. We conducted a simulation study using real business survey data to shed some light on several questions about non-response follow-up. For instance, assuming a fixed non-response follow-up budget, what is the best way to select non-responding units to be followed up? How much effort should be dedicated to repeatedly following up non-respondents until a response is received? Should they all be followed up or a sample of them? If a sample is followed up, how should it be selected? We compared Monte Carlo relative biases and relative root mean square errors under different follow-up sampling designs, sample sizes and non-response scenarios. We also determined an expression for the minimum follow-up sample size required to expend the budget, on average, and showed that it maximizes the expected response rate. A main conclusion of our simulation experiment is that this sample size also appears to approximately minimize the bias and mean square error of the estimates.
Release date: 2022-06-21 - Articles and reports: 12-001-X202100100009Description:
Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.
Release date: 2021-06-24 - Articles and reports: 12-001-X201600214662Description:
Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.
Release date: 2016-12-20 - 8. A method of determining the winsorization threshold, with an application to domain estimation ArchivedArticles and reports: 12-001-X201500114199Description:
In business surveys, it is not unusual to collect economic variables for which the distribution is highly skewed. In this context, winsorization is often used to treat the problem of influential values. This technique requires the determination of a constant that corresponds to the threshold above which large values are reduced. In this paper, we consider a method of determining the constant which involves minimizing the largest estimated conditional bias in the sample. In the context of domain estimation, we also propose a method of ensuring consistency between the domain-level winsorized estimates and the population-level winsorized estimate. The results of two simulation studies suggest that the proposed methods lead to winsorized estimators that have good bias and relative efficiency properties.
Release date: 2015-06-29 - Articles and reports: 12-001-X201000211385Description:
In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.
Release date: 2010-12-21 - Articles and reports: 12-001-X201000111246Description:
Many surveys employ weight adjustment procedures to reduce nonresponse bias. These adjustments make use of available auxiliary data. This paper addresses the issue of jackknife variance estimation for estimators that have been adjusted for nonresponse. Using the reverse approach for variance estimation proposed by Fay (1991) and Shao and Steel (1999), we study the effect of not re-calculating the nonresponse weight adjustment within each jackknife replicate. We show that the resulting 'shortcut' jackknife variance estimator tends to overestimate the true variance of point estimators in the case of several weight adjustment procedures used in practice. These theoretical results are confirmed through a simulation study where we compare the shortcut jackknife variance estimator with the full jackknife variance estimator obtained by re-calculating the nonresponse weight adjustment within each jackknife replicate.
Release date: 2010-06-29
Articles and reports (16)
Articles and reports (16) (0 to 10 of 16 results)
- Articles and reports: 11-522-X202500100025Description: National statistical offices have increasingly adopted machine learning (ML) for its potential to improve survey estimates. ML techniques offer significant advantages, notably the ability to manage high-dimensional data and to capture complex, nonlinear relationships, thereby enhancing the overall quality of survey statistics. In this article, following the approach of Chernozhukov et al. (2018), we describe a double debiased machine learning framework that enables valid statistical inference when imputed estimators are derived from ML procedures. Simulation results suggest that the proposed framework performs well in a wide range of scenarios.Release date: 2025-09-08
- Articles and reports: 12-001-X202500100003Description: In recent years, there has been a significant interest in machine learning in national statistical offices. Thanks to their flexibility, these methods may prove useful at the nonresponse treatment stage. In this article, we conduct an empirical investigation in order to compare several machine learning procedures in terms of bias and efficiency. In addition to the classical machine learning procedures, we assess the performance of ensemble approaches that make use of different machine learning procedures to produce a set of weights adjusted for nonresponse.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100013Description: This discussion of the paper by Rao and Lohr focuses on the use of machine learning procedures for estimating finite population parameters. While there is growing interest in these methods within national statistical offices, several areas remain largely unexplored and warrant significant attention in the coming years. In this discussion, I highlight potential topics for future research and development in this rapidly evolving field.Release date: 2025-06-30
- Articles and reports: 12-001-X202300200017Description: Jean-Claude Deville, who passed away in October 2021, was one of the most influential researchers in the field of survey statistics over the past 40 years. This article traces some of his contributions that have had a profound impact on both survey theory and practice. This article will cover the topics of balanced sampling using the cube method, calibration, the weight-sharing method, the development of variance expressions of complex estimators using influence function and quota sampling.Release date: 2024-01-03
- Articles and reports: 12-001-X202200100006Description:
In the last two decades, survey response rates have been steadily falling. In that context, it has become increasingly important for statistical agencies to develop and use methods that reduce the adverse effects of non-response on the accuracy of survey estimates. Follow-up of non-respondents may be an effective, albeit time and resource-intensive, remedy for non-response bias. We conducted a simulation study using real business survey data to shed some light on several questions about non-response follow-up. For instance, assuming a fixed non-response follow-up budget, what is the best way to select non-responding units to be followed up? How much effort should be dedicated to repeatedly following up non-respondents until a response is received? Should they all be followed up or a sample of them? If a sample is followed up, how should it be selected? We compared Monte Carlo relative biases and relative root mean square errors under different follow-up sampling designs, sample sizes and non-response scenarios. We also determined an expression for the minimum follow-up sample size required to expend the budget, on average, and showed that it maximizes the expected response rate. A main conclusion of our simulation experiment is that this sample size also appears to approximately minimize the bias and mean square error of the estimates.
Release date: 2022-06-21 - Articles and reports: 12-001-X202100100009Description:
Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.
Release date: 2021-06-24 - Articles and reports: 12-001-X201600214662Description:
Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.
Release date: 2016-12-20 - 8. A method of determining the winsorization threshold, with an application to domain estimation ArchivedArticles and reports: 12-001-X201500114199Description:
In business surveys, it is not unusual to collect economic variables for which the distribution is highly skewed. In this context, winsorization is often used to treat the problem of influential values. This technique requires the determination of a constant that corresponds to the threshold above which large values are reduced. In this paper, we consider a method of determining the constant which involves minimizing the largest estimated conditional bias in the sample. In the context of domain estimation, we also propose a method of ensuring consistency between the domain-level winsorized estimates and the population-level winsorized estimate. The results of two simulation studies suggest that the proposed methods lead to winsorized estimators that have good bias and relative efficiency properties.
Release date: 2015-06-29 - Articles and reports: 12-001-X201000211385Description:
In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.
Release date: 2010-12-21 - Articles and reports: 12-001-X201000111246Description:
Many surveys employ weight adjustment procedures to reduce nonresponse bias. These adjustments make use of available auxiliary data. This paper addresses the issue of jackknife variance estimation for estimators that have been adjusted for nonresponse. Using the reverse approach for variance estimation proposed by Fay (1991) and Shao and Steel (1999), we study the effect of not re-calculating the nonresponse weight adjustment within each jackknife replicate. We show that the resulting 'shortcut' jackknife variance estimator tends to overestimate the true variance of point estimators in the case of several weight adjustment procedures used in practice. These theoretical results are confirmed through a simulation study where we compare the shortcut jackknife variance estimator with the full jackknife variance estimator obtained by re-calculating the nonresponse weight adjustment within each jackknife replicate.
Release date: 2010-06-29