Weighting and estimation
Filter results by
Search HelpKeyword(s)
Type
Survey or statistical program
- Survey of Labour and Income Dynamics (5)
- Census of Population (5)
- Survey of Household Spending (2)
- Longitudinal and International Study of Adults (2)
- Survey of Employment, Payrolls and Hours (1)
- Canadian Cancer Registry (1)
- Canadian Community Health Survey - Annual Component (1)
- Uniform Crime Reporting Survey (1)
- Quarterly Demographic Estimates (1)
- Annual Demographic Estimates: Canada, Provinces and Territories (1)
- Estimates of the number of census families for July 1st, Canada, provinces and territories (1)
- Annual Demographic Estimates : Subprovincial Areas (1)
- Labour Force Survey (1)
- Longitudinal Administrative Databank (1)
- General Social Survey - Social Identity (1)
- Canadian Community Health Survey - Nutrition (1)
- Canadian Income Survey (1)
- Residential Property Values (1)
- Canadian Survey on Business Conditions (1)
Results
All (624)
All (624) (0 to 10 of 624 results)
- Articles and reports: 11-522-X202500100006Description: Small area estimation is frequently used to produce estimates at a disaggregated level where direct survey estimation does not have sufficient sample to produce precise estimates. Often this is done using the area-level Fay-Herriot model, by assuming the direct estimates are independent under the design and have a known variance, and applying a smoothing process to the variance estimates of the direct estimates to better meet that last assumption. It is not rare that small area estimates are benchmarked/raked to aggregated level direct estimates. This article shows that wrongly assuming independence can have a big impact on the MSE of the raked estimates. Values of the covariances between direct estimates are thus required for good point and MSE estimates. Getting good estimates of those covariances is difficult given the small sample sizes in some areas. An original way of deriving values for those covariances, by reverse-engineering a hypothetical raking process, is presented.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100007Description: This paper employs the Pseudo Maximum Likelihood (PML) estimator to the non-probability two-phase sampling when relevant auxiliary information is available from both probability survey sample and non-probability survey sample. To accommodate various weight adjustments and estimates variance beyond totals and means such as medians and quantiles, a simplified pseudo-population bootstrap procedure is proposed to approximately estimate the second-phase variance. Specifically, the simplification ignores the second phase sampling variability (i.e., treated as fixed, while in fact it is random), if the first-phase sampling fraction of the non-probability sample is negligible. Using the Bank of Canada 2020 Cash Alternative Survey Wave 2, the performance of the proposed method is compared to alternative methods, which either do not explicitly model the selection probability (i.e., raking) or ignore the valuable information from Phase 1 (i.e., Phase-2-Only). The results show that the PML-based approach performs better than raking and Phase-2-Only estimates in terms of reducing the selection bias for both phases' payment-related variables, especially for the low-response youth group. Estimated variances of the PML-based estimates are stable.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100009Description: Three series of web panels were implemented at Statistics Canada from 2020 to 2024. Participants for these web panel series were recruited from respondents of large probabilistic social surveys (recruitment surveys), and subsequently were invited to complete a series of short online surveys. Estimates of recruitment survey variables were calculated using both recruitment survey weights and web panel weights, and these were compared; differences signal the possibility of residual bias that was not corrected by the web panel weighting process. This investigation found more significant differences than would be expected if the web panel estimator fully corrected for the bias resulting from the web panel response process. Questions related to certain topics such as politics and voting, sense of belonging, and media consumption were found to have the most significant differences between web panel estimates and recruitment survey estimates.Release date: 2025-09-08
- 4. Data-driven Imputation Strategies and their Associated Quality Indicators in Economic Surveys ArchivedArticles and reports: 11-522-X202500100011Description: The use of modern "data"-driven imputation methods to treat non-response in the context of surveys processed in the Integrated Business Statistics Program at Statistics Canada has previously been explored. It was observed that these methods can lead to high quality imputation and further have the potential to result in broad efficiencies when setting up a particular survey's edit and imputation strategy. However, estimation of the associated total variance, more specifically the component due to imputation, remains a challenge. In this article, two methods for estimation of total variance are proposed and show preliminary results that have motivated us to pursue further research in this area.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100028Description: The United Nations Sustainable Development Goals require detailed, disaggregated data, typically obtained through household surveys. However, surveys alone cannot meet these needs for granular statistics. To address this, National Statistical Institutes adopt small area methods, but these face challenges as auxiliary variables, often derived from surveys, introduce measurement errors into the models. The aim is the application of measurement error correction in classic Fay-Herriot area-level model. The results demonstrate the robustness of the standard approach and ignoring measurement error but show there are specific scenarios where correction for measurement errors is beneficial. The approach is applied to a case study utilizing Indonesian household survey data.Release date: 2025-09-08
- Articles and reports: 12-001-X202500100003Description: In recent years, there has been a significant interest in machine learning in national statistical offices. Thanks to their flexibility, these methods may prove useful at the nonresponse treatment stage. In this article, we conduct an empirical investigation in order to compare several machine learning procedures in terms of bias and efficiency. In addition to the classical machine learning procedures, we assess the performance of ensemble approaches that make use of different machine learning procedures to produce a set of weights adjusted for nonresponse.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100005Description: In this paper, we derive a second-order unbiased (or nearly unbiased) mean squared prediction error (MSPE) estimator of the empirical best linear unbiased predictor (EBLUP) of a small area mean for a semi-parametric extension to the well-known Fay-Herriot model. Specifically, we derive our MSPE estimator essentially assuming certain moment conditions on both the sampling errors and random effects distributions. The normality-based Prasad-Rao MSPE estimator has a surprising robustness property in that it remains second-order unbiased under the non-normality of random effects when a simple Prasad-Rao method-of-moments estimator is used for the variance component and the sampling error distribution is normal. We show that the normality-based MSPE estimator is no longer second-order unbiased when the sampling error distribution has non-zero kurtosis or when the Fay-Herriot moment method is used to estimate the variance component, even when the sampling error distribution is normal. Interestingly, when the simple method-of moments estimator is used for the variance component, our proposed MSPE estimator does not require the estimation of kurtosis of the random effects. Results of a simulation study on the accuracy of the proposed MSPE estimator, under non-normality of both sampling and random effects distributions, are also presented.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100006Description: Survey practitioners have increasingly embraced the benefits of modern machine learning techniques, including classification and regression tree algorithms, in the development of nonresponse adjustments. These methods, which do not require a predefined functional relationship between outcomes and predictors, offer a practical means of conducting variable selection and deriving interpretable structures that link response propensity with explanatory variables. However, when applying these algorithms to survey data, it is common to overlook crucial factors like sampling weights, as well as sample design features such as stratification and clustering. To bridge this shortcoming, we propose an extension of the Chi-square Automatic Interaction Detector (CHAID) approach, and we describe the design-based asymptotic properties of the resulting “survey CHAID” (sCHAID) method. To facilitate the practical use of sCHAID, we incorporate a Rao-Scott correction into the splitting criterion, accounting for the survey design. Using data from the U.S. American Community Survey, we illustrate the use of the method and evaluate its performance through comparisons with existing weighted and unweighted algorithms.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100007Description: We introduce a novel approach to model-assisted calibration estimation in survey sampling using generalized entropy. The method builds upon recent work by Kwon, Kim and Qiu (2024) and extends it to a model-assisted framework. Unlike traditional calibration techniques, this approach employs a generalized entropy function as the objective for optimization and incorporates a debiasing calibration constraint to ensure design consistency. The proposed estimator is shown to be asymptotically equivalent to an augmented generalized regression (GREG) estimator. It allows for unequal model variance, potentially improving efficiency when the sampling design is informative. The paper presents both design-based and model-based justifications for the method, along with asymptotic properties and variance estimation techniques. Computational aspects are discussed, including an unconstrained optimization approach that facilitates implementation, especially for high-dimensional auxiliary variables. The method’s performance is evaluated through a simulation study, demonstrating its effectiveness in improving estimation efficiency, particularly when the sampling design is informative.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100008Description: Tightened budgets, continuing decrease of response rates in traditional probability surveys and increasing pressure by users for more timely data, has stimulated research on the use of nonprobability sample data, such as administrative records, web scraping, mobile phone data and voluntary internet surveys, for inference on finite population parameters like means and totals. These data are often easier, faster and cheaper to collect than traditional probability samples. However, a major concern with the use of this kind of data for official statistics is their nonrepresentativeness due to possible selection bias, which if not accounted for properly, could bias the inference. In this article, we review and discuss methods considered in the literature to deal with this problem and propose new methods, distinguishing between methods based on integration of the nonprobability sample with an appropriate probability sample, and methods that base the inference solely on the nonprobability sample. Empirical illustrations, based on simulated data are provided.Release date: 2025-06-30
- Previous Go to previous page of All results
- 1 (current) Go to page 1 of All results
- 2 Go to page 2 of All results
- 3 Go to page 3 of All results
- 4 Go to page 4 of All results
- 5 Go to page 5 of All results
- 6 Go to page 6 of All results
- 7 Go to page 7 of All results
- ...
- 63 Go to page 63 of All results
- Next Go to next page of All results
Data (0)
Data (0) (0 results)
No content available at this time.
Analysis (597)
Analysis (597) (0 to 10 of 597 results)
- Articles and reports: 11-522-X202500100006Description: Small area estimation is frequently used to produce estimates at a disaggregated level where direct survey estimation does not have sufficient sample to produce precise estimates. Often this is done using the area-level Fay-Herriot model, by assuming the direct estimates are independent under the design and have a known variance, and applying a smoothing process to the variance estimates of the direct estimates to better meet that last assumption. It is not rare that small area estimates are benchmarked/raked to aggregated level direct estimates. This article shows that wrongly assuming independence can have a big impact on the MSE of the raked estimates. Values of the covariances between direct estimates are thus required for good point and MSE estimates. Getting good estimates of those covariances is difficult given the small sample sizes in some areas. An original way of deriving values for those covariances, by reverse-engineering a hypothetical raking process, is presented.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100007Description: This paper employs the Pseudo Maximum Likelihood (PML) estimator to the non-probability two-phase sampling when relevant auxiliary information is available from both probability survey sample and non-probability survey sample. To accommodate various weight adjustments and estimates variance beyond totals and means such as medians and quantiles, a simplified pseudo-population bootstrap procedure is proposed to approximately estimate the second-phase variance. Specifically, the simplification ignores the second phase sampling variability (i.e., treated as fixed, while in fact it is random), if the first-phase sampling fraction of the non-probability sample is negligible. Using the Bank of Canada 2020 Cash Alternative Survey Wave 2, the performance of the proposed method is compared to alternative methods, which either do not explicitly model the selection probability (i.e., raking) or ignore the valuable information from Phase 1 (i.e., Phase-2-Only). The results show that the PML-based approach performs better than raking and Phase-2-Only estimates in terms of reducing the selection bias for both phases' payment-related variables, especially for the low-response youth group. Estimated variances of the PML-based estimates are stable.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100009Description: Three series of web panels were implemented at Statistics Canada from 2020 to 2024. Participants for these web panel series were recruited from respondents of large probabilistic social surveys (recruitment surveys), and subsequently were invited to complete a series of short online surveys. Estimates of recruitment survey variables were calculated using both recruitment survey weights and web panel weights, and these were compared; differences signal the possibility of residual bias that was not corrected by the web panel weighting process. This investigation found more significant differences than would be expected if the web panel estimator fully corrected for the bias resulting from the web panel response process. Questions related to certain topics such as politics and voting, sense of belonging, and media consumption were found to have the most significant differences between web panel estimates and recruitment survey estimates.Release date: 2025-09-08
- 4. Data-driven Imputation Strategies and their Associated Quality Indicators in Economic Surveys ArchivedArticles and reports: 11-522-X202500100011Description: The use of modern "data"-driven imputation methods to treat non-response in the context of surveys processed in the Integrated Business Statistics Program at Statistics Canada has previously been explored. It was observed that these methods can lead to high quality imputation and further have the potential to result in broad efficiencies when setting up a particular survey's edit and imputation strategy. However, estimation of the associated total variance, more specifically the component due to imputation, remains a challenge. In this article, two methods for estimation of total variance are proposed and show preliminary results that have motivated us to pursue further research in this area.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100028Description: The United Nations Sustainable Development Goals require detailed, disaggregated data, typically obtained through household surveys. However, surveys alone cannot meet these needs for granular statistics. To address this, National Statistical Institutes adopt small area methods, but these face challenges as auxiliary variables, often derived from surveys, introduce measurement errors into the models. The aim is the application of measurement error correction in classic Fay-Herriot area-level model. The results demonstrate the robustness of the standard approach and ignoring measurement error but show there are specific scenarios where correction for measurement errors is beneficial. The approach is applied to a case study utilizing Indonesian household survey data.Release date: 2025-09-08
- Articles and reports: 12-001-X202500100003Description: In recent years, there has been a significant interest in machine learning in national statistical offices. Thanks to their flexibility, these methods may prove useful at the nonresponse treatment stage. In this article, we conduct an empirical investigation in order to compare several machine learning procedures in terms of bias and efficiency. In addition to the classical machine learning procedures, we assess the performance of ensemble approaches that make use of different machine learning procedures to produce a set of weights adjusted for nonresponse.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100005Description: In this paper, we derive a second-order unbiased (or nearly unbiased) mean squared prediction error (MSPE) estimator of the empirical best linear unbiased predictor (EBLUP) of a small area mean for a semi-parametric extension to the well-known Fay-Herriot model. Specifically, we derive our MSPE estimator essentially assuming certain moment conditions on both the sampling errors and random effects distributions. The normality-based Prasad-Rao MSPE estimator has a surprising robustness property in that it remains second-order unbiased under the non-normality of random effects when a simple Prasad-Rao method-of-moments estimator is used for the variance component and the sampling error distribution is normal. We show that the normality-based MSPE estimator is no longer second-order unbiased when the sampling error distribution has non-zero kurtosis or when the Fay-Herriot moment method is used to estimate the variance component, even when the sampling error distribution is normal. Interestingly, when the simple method-of moments estimator is used for the variance component, our proposed MSPE estimator does not require the estimation of kurtosis of the random effects. Results of a simulation study on the accuracy of the proposed MSPE estimator, under non-normality of both sampling and random effects distributions, are also presented.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100006Description: Survey practitioners have increasingly embraced the benefits of modern machine learning techniques, including classification and regression tree algorithms, in the development of nonresponse adjustments. These methods, which do not require a predefined functional relationship between outcomes and predictors, offer a practical means of conducting variable selection and deriving interpretable structures that link response propensity with explanatory variables. However, when applying these algorithms to survey data, it is common to overlook crucial factors like sampling weights, as well as sample design features such as stratification and clustering. To bridge this shortcoming, we propose an extension of the Chi-square Automatic Interaction Detector (CHAID) approach, and we describe the design-based asymptotic properties of the resulting “survey CHAID” (sCHAID) method. To facilitate the practical use of sCHAID, we incorporate a Rao-Scott correction into the splitting criterion, accounting for the survey design. Using data from the U.S. American Community Survey, we illustrate the use of the method and evaluate its performance through comparisons with existing weighted and unweighted algorithms.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100007Description: We introduce a novel approach to model-assisted calibration estimation in survey sampling using generalized entropy. The method builds upon recent work by Kwon, Kim and Qiu (2024) and extends it to a model-assisted framework. Unlike traditional calibration techniques, this approach employs a generalized entropy function as the objective for optimization and incorporates a debiasing calibration constraint to ensure design consistency. The proposed estimator is shown to be asymptotically equivalent to an augmented generalized regression (GREG) estimator. It allows for unequal model variance, potentially improving efficiency when the sampling design is informative. The paper presents both design-based and model-based justifications for the method, along with asymptotic properties and variance estimation techniques. Computational aspects are discussed, including an unconstrained optimization approach that facilitates implementation, especially for high-dimensional auxiliary variables. The method’s performance is evaluated through a simulation study, demonstrating its effectiveness in improving estimation efficiency, particularly when the sampling design is informative.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100008Description: Tightened budgets, continuing decrease of response rates in traditional probability surveys and increasing pressure by users for more timely data, has stimulated research on the use of nonprobability sample data, such as administrative records, web scraping, mobile phone data and voluntary internet surveys, for inference on finite population parameters like means and totals. These data are often easier, faster and cheaper to collect than traditional probability samples. However, a major concern with the use of this kind of data for official statistics is their nonrepresentativeness due to possible selection bias, which if not accounted for properly, could bias the inference. In this article, we review and discuss methods considered in the literature to deal with this problem and propose new methods, distinguishing between methods based on integration of the nonprobability sample with an appropriate probability sample, and methods that base the inference solely on the nonprobability sample. Empirical illustrations, based on simulated data are provided.Release date: 2025-06-30
- Previous Go to previous page of Analysis results
- 1 (current) Go to page 1 of Analysis results
- 2 Go to page 2 of Analysis results
- 3 Go to page 3 of Analysis results
- 4 Go to page 4 of Analysis results
- 5 Go to page 5 of Analysis results
- 6 Go to page 6 of Analysis results
- 7 Go to page 7 of Analysis results
- ...
- 60 Go to page 60 of Analysis results
- Next Go to next page of Analysis results
Reference (27)
Reference (27) (0 to 10 of 27 results)
- Surveys and statistical programs – Documentation: 98-306-XDescription:
This report describes sampling, weighting and estimation procedures used in the Census of Population. It provides operational and theoretical justifications for them, and presents the results of the evaluations of these procedures.
Release date: 2023-10-04 - Notices and consultations: 75F0002M2019006Description:
In 2018, Statistics Canada released two new data tables with estimates of effective tax and transfer rates for individual tax filers and census families. These estimates are derived from the Longitudinal Administrative Databank. This publication provides a detailed description of the methods used to derive the estimates of effective tax and transfer rates.
Release date: 2019-04-16 - 3. Revisions to 2006 to 2011 income data ArchivedSurveys and statistical programs – Documentation: 75F0002M2015003Description:
This note discusses revised income estimates from the Survey of Labour and Income Dynamics (SLID). These revisions to the SLID estimates make it possible to compare results from the Canadian Income Survey (CIS) to earlier years. The revisions address the issue of methodology differences between SLID and CIS.
Release date: 2015-12-17 - Surveys and statistical programs – Documentation: 91-528-XDescription: This manual provides detailed descriptions of the data sources and methods used by Statistics Canada to estimate population. They comprise Postcensal and intercensal population estimates; base population; births and deaths; immigration; emigration; non-permanent residents; interprovincial migration; subprovincial estimates of population; population estimates by age, sex and marital status; and census family estimates. A glossary of principal terms is contained at the end of the manual, followed by the standard notation used.
Until now, literature on the methodological changes for estimates calculations has always been spread throughout various Statistics Canada publications and background papers. This manual provides users of demographic statistics with a comprehensive compilation of the current procedures used by Statistics Canada to prepare population and family estimates.
Release date: 2015-11-17 - Surveys and statistical programs – Documentation: 13-605-X201500414166Description:
Estimates of the underground economy by province and territory for the period 2007 to 2012 are now available for the first time. The objective of this technical note is to explain how the methodology employed to derive upper-bound estimates of the underground economy for the provinces and territories differs from that used to derive national estimates.
Release date: 2015-04-29 - Surveys and statistical programs – Documentation: 99-002-X2011001Description:
This report describes sampling and weighting procedures used in the 2011 National Household Survey. It provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.
Release date: 2015-01-28 - Surveys and statistical programs – Documentation: 99-002-XDescription: This report describes sampling and weighting procedures used in the 2011 National Household Survey. It provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.Release date: 2015-01-28
- Surveys and statistical programs – Documentation: 92-568-XDescription:
This report describes sampling and weighting procedures used in the 2006 Census. It reviews the history of these procedures in Canadian censuses, provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.
Release date: 2009-08-11 - Surveys and statistical programs – Documentation: 71F0031X2006003Description:
This paper introduces and explains modifications made to the Labour Force Survey estimates in January 2006. Some of these modifications include changes to the population estimates, improvements to the public and private sector estimates and historical updates to several small Census Agglomerations (CA).
Release date: 2006-01-25 - 10. The Effects of the Revised Estimation Methodology on Estimates from Household Expenditure Surveys ArchivedSurveys and statistical programs – Documentation: 62F0026M2005002Description:
This document will provide an overview of the differences between the old and the new weighting methodologies and the effect of the new weighting system on estimations.
Release date: 2005-06-30