### Filter results by

Search Help#### Keyword(s)

#### Year of publication

#### Author(s)

- Selected: Beaumont, Jean-François (18)
- Haziza, David (4)
- Bocci, Cynthia (3)
- Alavi, Asma (2)
- Bissonnette, Joël (2)
- Yung, Wesley (2)
- Charbonnier, C. (1)
- Chow, O.S.Y. (1)
- Dasylva, Abel (1)
- Favre-Martinoz, Cyril (1)
- Fonberg, Jonathan David (1)
- Hidiroglou, Michael (1)
- Hidiroglou, Mike (1)
- Lesage, Éric (1)
- Mitchell, Charles (1)
- Neusy, Elisabeth (1)
- Schellenberg, Grant (1)
- You, Yong (1)

### Results

## All (18)

## All (18) (0 to 10 of 18 results)

- Articles and reports: 12-001-X202200100006Description:
In the last two decades, survey response rates have been steadily falling. In that context, it has become increasingly important for statistical agencies to develop and use methods that reduce the adverse effects of non-response on the accuracy of survey estimates. Follow-up of non-respondents may be an effective, albeit time and resource-intensive, remedy for non-response bias. We conducted a simulation study using real business survey data to shed some light on several questions about non-response follow-up. For instance, assuming a fixed non-response follow-up budget, what is the best way to select non-responding units to be followed up? How much effort should be dedicated to repeatedly following up non-respondents until a response is received? Should they all be followed up or a sample of them? If a sample is followed up, how should it be selected? We compared Monte Carlo relative biases and relative root mean square errors under different follow-up sampling designs, sample sizes and non-response scenarios. We also determined an expression for the minimum follow-up sample size required to expend the budget, on average, and showed that it maximizes the expected response rate. A main conclusion of our simulation experiment is that this sample size also appears to approximately minimize the bias and mean square error of the estimates.

Release date: 2022-06-21 - Articles and reports: 12-001-X202100200001Description:
The Fay-Herriot model is often used to produce small area estimates. These estimates are generally more efficient than standard direct estimates. In order to evaluate the efficiency gains obtained by small area estimation methods, model mean square error estimates are usually produced. However, these estimates do not reflect all the peculiarities of a given domain (or area) because model mean square errors integrate out the local effects. An alternative is to estimate the design mean square error of small area estimators, which is often more attractive from a user point of view. However, it is known that design mean square error estimates can be very unstable, especially for domains with few sampled units. In this paper, we propose two local diagnostics that aim to choose between the empirical best predictor and the direct estimator for a particular domain. We first find an interval for the local effect such that the best predictor is more efficient under the design than the direct estimator. Then, we consider two different approaches to assess whether it is plausible that the local effect falls in this interval. We evaluate our diagnostics using a simulation study. Our preliminary results indicate that our diagnostics are effective for choosing between the empirical best predictor and the direct estimator.

Release date: 2022-01-06 - Articles and reports: 11-633-X2021007Description:
Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

Release date: 2021-11-16 - Articles and reports: 11-522-X202100100008Description:
Non-probability samples are being increasingly explored by National Statistical Offices as a complement to probability samples. We consider the scenario where the variable of interest and auxiliary variables are observed in both a probability and non-probability sample. Our objective is to use data from the non-probability sample to improve the efficiency of survey-weighted estimates obtained from the probability sample. Recently, Sakshaug, Wisniowski, Ruiz and Blom (2019) and Wisniowski, Sakshaug, Ruiz and Blom (2020) proposed a Bayesian approach to integrating data from both samples for the estimation of model parameters. In their approach, non-probability sample data are used to determine the prior distribution of model parameters, and the posterior distribution is obtained under the assumption that the probability sampling design is ignorable (or not informative). We extend this Bayesian approach to the prediction of finite population parameters under non-ignorable (or informative) sampling by conditioning on appropriate survey-weighted statistics. We illustrate the properties of our predictor through a simulation study.

Key Words: Bayesian prediction; Gibbs sampling; Non-ignorable sampling; Statistical data integration.

Release date: 2021-10-29 - Articles and reports: 12-001-X202000100001Description:
For several decades, national statistical agencies around the world have been using probability surveys as their preferred tool to meet information needs about a population of interest. In the last few years, there has been a wind of change and other data sources are being increasingly explored. Five key factors are behind this trend: the decline in response rates in probability surveys, the high cost of data collection, the increased burden on respondents, the desire for access to “real-time” statistics, and the proliferation of non-probability data sources. Some people have even come to believe that probability surveys could gradually disappear. In this article, we review some approaches that can reduce, or even eliminate, the use of probability surveys, all the while preserving a valid statistical inference framework. All the approaches we consider use data from a non-probability source; data from a probability survey are also used in most cases. Some of these approaches rely on the validity of model assumptions, which contrasts with approaches based on the probability sampling design. These design-based approaches are generally not as efficient; yet, they are not subject to the risk of bias due to model misspecification.

Release date: 2020-06-30 - Articles and reports: 12-001-X201900100009Description:
The demand for small area estimates by users of Statistics Canada’s data has been steadily increasing over recent years. In this paper, we provide a summary of procedures that have been incorporated into a SAS based production system for producing official small area estimates at Statistics Canada. This system includes: procedures based on unit or area level models; the incorporation of the sampling design; the ability to smooth the design variance for each small area if an area level model is used; the ability to ensure that the small area estimates add up to reliable higher level estimates; and the development of diagnostic tools to test the adequacy of the model. The production system has been used to produce small area estimates on an experimental basis for several surveys at Statistics Canada that include: the estimation of health characteristics, the estimation of under-coverage in the census, the estimation of manufacturing sales and the estimation of unemployment rates and employment counts for the Labour Force Survey. Some of the diagnostics implemented in the system are illustrated using Labour Force Survey data along with administrative auxiliary data.

Release date: 2019-05-07 - Articles and reports: 12-001-X201600214662Description:
Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

Release date: 2016-12-20 - 8. A method of determining the winsorization threshold, with an application to domain estimation ArchivedArticles and reports: 12-001-X201500114199Description:
In business surveys, it is not unusual to collect economic variables for which the distribution is highly skewed. In this context, winsorization is often used to treat the problem of influential values. This technique requires the determination of a constant that corresponds to the threshold above which large values are reduced. In this paper, we consider a method of determining the constant which involves minimizing the largest estimated conditional bias in the sample. In the context of domain estimation, we also propose a method of ensuring consistency between the domain-level winsorized estimates and the population-level winsorized estimate. The results of two simulation studies suggest that the proposed methods lead to winsorized estimators that have good bias and relative efficiency properties.

Release date: 2015-06-29 - Articles and reports: 12-001-X201100211605Description:
Composite imputation is often used in business surveys. The term "composite" means that more than a single imputation method is used to impute missing values for a variable of interest. The literature on variance estimation in the presence of composite imputation is rather limited. To deal with this problem, we consider an extension of the methodology developed by Särndal (1992). Our extension is quite general and easy to implement provided that linear imputation methods are used to fill in the missing values. This class of imputation methods contains linear regression imputation, donor imputation and auxiliary value imputation, sometimes called cold-deck or substitution imputation. It thus covers the most common methods used by national statistical agencies for the imputation of missing values. Our methodology has been implemented in the System for the Estimation of Variance due to Nonresponse and Imputation (SEVANI) developed at Statistics Canada. Its performance is evaluated in a simulation study.

Release date: 2011-12-21 - Articles and reports: 11-536-X200900110811Description:
Composite imputation is often used in business surveys. It occurs when several imputation methods are used to impute a single variable of interest. The choice of one method instead of another depends on the availability or not of some auxiliary variables. For instance, ratio imputation could be used to impute a missing value when an auxiliary variable is available and, otherwise, mean imputation could be used.

Although composite imputation is frequent in practice, the literature on variance estimation when composite imputation is used is limited. We consider the general methodology proposed by Särndal et al. (1992), which requires the validity of an imputation model i.e., a model for the variable being imputed. At first glance, the extension of this methodology to composite imputation seems quite tedious until we notice that most imputation methods used in practice lead to imputed estimators that are linear in the observed values of the variable of interest. This considerably simplifies the derivation of a variance estimator even when there is a single imputation method. Regarding the estimation of the sampling portion of the total variance, we use a methodology slightly different than the one proposed by Särndal et al. (1992). Our methodology is similar to the sampling variance estimator under multiple imputation with an infinite number of imputations.

This methodology is the central part of version 2.0 of the System for Estimation of Variance due to Nonresponse and Imputation (SEVANI), which is being developed at Statistics Canada. Using SEVANI, we will illustrate our method through an example based on real data.

Release date: 2009-08-11

## Stats in brief (0)

## Stats in brief (0) (0 results)

No content available at this time.

## Articles and reports (18)

## Articles and reports (18) (0 to 10 of 18 results)

- Articles and reports: 12-001-X202200100006Description:
In the last two decades, survey response rates have been steadily falling. In that context, it has become increasingly important for statistical agencies to develop and use methods that reduce the adverse effects of non-response on the accuracy of survey estimates. Follow-up of non-respondents may be an effective, albeit time and resource-intensive, remedy for non-response bias. We conducted a simulation study using real business survey data to shed some light on several questions about non-response follow-up. For instance, assuming a fixed non-response follow-up budget, what is the best way to select non-responding units to be followed up? How much effort should be dedicated to repeatedly following up non-respondents until a response is received? Should they all be followed up or a sample of them? If a sample is followed up, how should it be selected? We compared Monte Carlo relative biases and relative root mean square errors under different follow-up sampling designs, sample sizes and non-response scenarios. We also determined an expression for the minimum follow-up sample size required to expend the budget, on average, and showed that it maximizes the expected response rate. A main conclusion of our simulation experiment is that this sample size also appears to approximately minimize the bias and mean square error of the estimates.

Release date: 2022-06-21 - Articles and reports: 12-001-X202100200001Description:
The Fay-Herriot model is often used to produce small area estimates. These estimates are generally more efficient than standard direct estimates. In order to evaluate the efficiency gains obtained by small area estimation methods, model mean square error estimates are usually produced. However, these estimates do not reflect all the peculiarities of a given domain (or area) because model mean square errors integrate out the local effects. An alternative is to estimate the design mean square error of small area estimators, which is often more attractive from a user point of view. However, it is known that design mean square error estimates can be very unstable, especially for domains with few sampled units. In this paper, we propose two local diagnostics that aim to choose between the empirical best predictor and the direct estimator for a particular domain. We first find an interval for the local effect such that the best predictor is more efficient under the design than the direct estimator. Then, we consider two different approaches to assess whether it is plausible that the local effect falls in this interval. We evaluate our diagnostics using a simulation study. Our preliminary results indicate that our diagnostics are effective for choosing between the empirical best predictor and the direct estimator.

Release date: 2022-01-06 - Articles and reports: 11-633-X2021007Description:
Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

Release date: 2021-11-16 - Articles and reports: 11-522-X202100100008Description:
Non-probability samples are being increasingly explored by National Statistical Offices as a complement to probability samples. We consider the scenario where the variable of interest and auxiliary variables are observed in both a probability and non-probability sample. Our objective is to use data from the non-probability sample to improve the efficiency of survey-weighted estimates obtained from the probability sample. Recently, Sakshaug, Wisniowski, Ruiz and Blom (2019) and Wisniowski, Sakshaug, Ruiz and Blom (2020) proposed a Bayesian approach to integrating data from both samples for the estimation of model parameters. In their approach, non-probability sample data are used to determine the prior distribution of model parameters, and the posterior distribution is obtained under the assumption that the probability sampling design is ignorable (or not informative). We extend this Bayesian approach to the prediction of finite population parameters under non-ignorable (or informative) sampling by conditioning on appropriate survey-weighted statistics. We illustrate the properties of our predictor through a simulation study.

Key Words: Bayesian prediction; Gibbs sampling; Non-ignorable sampling; Statistical data integration.

Release date: 2021-10-29 - Articles and reports: 12-001-X202000100001Description:
For several decades, national statistical agencies around the world have been using probability surveys as their preferred tool to meet information needs about a population of interest. In the last few years, there has been a wind of change and other data sources are being increasingly explored. Five key factors are behind this trend: the decline in response rates in probability surveys, the high cost of data collection, the increased burden on respondents, the desire for access to “real-time” statistics, and the proliferation of non-probability data sources. Some people have even come to believe that probability surveys could gradually disappear. In this article, we review some approaches that can reduce, or even eliminate, the use of probability surveys, all the while preserving a valid statistical inference framework. All the approaches we consider use data from a non-probability source; data from a probability survey are also used in most cases. Some of these approaches rely on the validity of model assumptions, which contrasts with approaches based on the probability sampling design. These design-based approaches are generally not as efficient; yet, they are not subject to the risk of bias due to model misspecification.

Release date: 2020-06-30 - Articles and reports: 12-001-X201900100009Description:
The demand for small area estimates by users of Statistics Canada’s data has been steadily increasing over recent years. In this paper, we provide a summary of procedures that have been incorporated into a SAS based production system for producing official small area estimates at Statistics Canada. This system includes: procedures based on unit or area level models; the incorporation of the sampling design; the ability to smooth the design variance for each small area if an area level model is used; the ability to ensure that the small area estimates add up to reliable higher level estimates; and the development of diagnostic tools to test the adequacy of the model. The production system has been used to produce small area estimates on an experimental basis for several surveys at Statistics Canada that include: the estimation of health characteristics, the estimation of under-coverage in the census, the estimation of manufacturing sales and the estimation of unemployment rates and employment counts for the Labour Force Survey. Some of the diagnostics implemented in the system are illustrated using Labour Force Survey data along with administrative auxiliary data.

Release date: 2019-05-07 - Articles and reports: 12-001-X201600214662Description:
Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

Release date: 2016-12-20 - 8. A method of determining the winsorization threshold, with an application to domain estimation ArchivedArticles and reports: 12-001-X201500114199Description:
In business surveys, it is not unusual to collect economic variables for which the distribution is highly skewed. In this context, winsorization is often used to treat the problem of influential values. This technique requires the determination of a constant that corresponds to the threshold above which large values are reduced. In this paper, we consider a method of determining the constant which involves minimizing the largest estimated conditional bias in the sample. In the context of domain estimation, we also propose a method of ensuring consistency between the domain-level winsorized estimates and the population-level winsorized estimate. The results of two simulation studies suggest that the proposed methods lead to winsorized estimators that have good bias and relative efficiency properties.

Release date: 2015-06-29 - Articles and reports: 12-001-X201100211605Description:
Composite imputation is often used in business surveys. The term "composite" means that more than a single imputation method is used to impute missing values for a variable of interest. The literature on variance estimation in the presence of composite imputation is rather limited. To deal with this problem, we consider an extension of the methodology developed by Särndal (1992). Our extension is quite general and easy to implement provided that linear imputation methods are used to fill in the missing values. This class of imputation methods contains linear regression imputation, donor imputation and auxiliary value imputation, sometimes called cold-deck or substitution imputation. It thus covers the most common methods used by national statistical agencies for the imputation of missing values. Our methodology has been implemented in the System for the Estimation of Variance due to Nonresponse and Imputation (SEVANI) developed at Statistics Canada. Its performance is evaluated in a simulation study.

Release date: 2011-12-21 - Articles and reports: 11-536-X200900110811Description:
Composite imputation is often used in business surveys. It occurs when several imputation methods are used to impute a single variable of interest. The choice of one method instead of another depends on the availability or not of some auxiliary variables. For instance, ratio imputation could be used to impute a missing value when an auxiliary variable is available and, otherwise, mean imputation could be used.

Although composite imputation is frequent in practice, the literature on variance estimation when composite imputation is used is limited. We consider the general methodology proposed by Särndal et al. (1992), which requires the validity of an imputation model i.e., a model for the variable being imputed. At first glance, the extension of this methodology to composite imputation seems quite tedious until we notice that most imputation methods used in practice lead to imputed estimators that are linear in the observed values of the variable of interest. This considerably simplifies the derivation of a variance estimator even when there is a single imputation method. Regarding the estimation of the sampling portion of the total variance, we use a methodology slightly different than the one proposed by Särndal et al. (1992). Our methodology is similar to the sampling variance estimator under multiple imputation with an infinite number of imputations.

This methodology is the central part of version 2.0 of the System for Estimation of Variance due to Nonresponse and Imputation (SEVANI), which is being developed at Statistics Canada. Using SEVANI, we will illustrate our method through an example based on real data.

Release date: 2009-08-11

## Journals and periodicals (0)

## Journals and periodicals (0) (0 results)

No content available at this time.

- Date modified: