Weighting and estimation

Sort Help
entries

Results

All (561)

All (561) (0 to 10 of 561 results)

  • Surveys and statistical programs – Documentation: 98-306-X
    Description:

    This report describes sampling, weighting and estimation procedures used in the Census of Population. It provides operational and theoretical justifications for them, and presents the results of the evaluations of these procedures.

    Release date: 2023-10-04

  • Articles and reports: 12-001-X202300100003
    Description: To improve the precision of inferences and reduce costs there is considerable interest in combining data from several sources such as sample surveys and administrative data. Appropriate methodology is required to ensure satisfactory inferences since the target populations and methods for acquiring data may be quite different. To provide improved inferences we use methodology that has a more general structure than the ones in current practice. We start with the case where the analyst has only summary statistics from each of the sources. In our primary method, uncertain pooling, it is assumed that the analyst can regard one source, survey r, as the single best choice for inference. This method starts with the data from survey r and adds data from those other sources that are shown to form clusters that include survey r. We also consider Dirichlet process mixtures, one of the most popular nonparametric Bayesian methods. We use analytical expressions and the results from numerical studies to show properties of the methodology.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100004
    Description: The Dutch Health Survey (DHS), conducted by Statistics Netherlands, is designed to produce reliable direct estimates at an annual frequency. Data collection is based on a combination of web interviewing and face-to-face interviewing. Due to lockdown measures during the Covid-19 pandemic there was no or less face-to-face interviewing possible, which resulted in a sudden change in measurement and selection effects in the survey outcomes. Furthermore, the production of annual data about the effect of Covid-19 on health-related themes with a delay of about one year compromises the relevance of the survey. The sample size of the DHS does not allow the production of figures for shorter reference periods. Both issues are solved by developing a bivariate structural time series model (STM) to estimate quarterly figures for eight key health indicators. This model combines two series of direct estimates, a series based on complete response and a series based on web response only and provides model-based predictions for the indicators that are corrected for the loss of face-to-face interviews during the lockdown periods. The model is also used as a form of small area estimation and borrows sample information observed in previous reference periods. In this way timely and relevant statistics describing the effects of the corona crisis on the development of Dutch health are published. In this paper the method based on the bivariate STM is compared with two alternative methods. The first one uses a univariate STM where no correction for the lack of face-to-face observation is applied to the estimates. The second one uses a univariate STM that also contains an intervention variable that models the effect of the loss of face-to-face response during the lockdown.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100005
    Description: Weight smoothing is a useful technique in improving the efficiency of design-based estimators at the risk of bias due to model misspecification. As an extension of the work of Kim and Skinner (2013), we propose using weight smoothing to construct the conditional likelihood for efficient analytic inference under informative sampling. The Beta prime distribution can be used to build a parameter model for weights in the sample. A score test is developed to test for model misspecification in the weight model. A pretest estimator using the score test can be developed naturally. The pretest estimator is nearly unbiased and can be more efficient than the design-based estimator when the weight model is correctly specified, or the original weights are highly variable. A limited simulation study is presented to investigate the performance of the proposed methods.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100011
    Description: The definition of statistical units is a recurring issue in the domain of sample surveys. Indeed, not all the populations surveyed have a readily available sampling frame. For some populations, the sampled units are distinct from the observation units and producing estimates on the population of interest raises complex questions, which can be addressed by using the weight share method (Deville and Lavallée, 2006). However, the two populations considered in this approach are discrete. In some fields of study, the sampled population is continuous: this is for example the case of forest inventories for which, frequently, the trees surveyed are those located on plots of which the centers are points randomly drawn in a given area. The production of statistical estimates from the sample of trees surveyed poses methodological difficulties, as do the associated variance calculations. The purpose of this paper is to generalize the weight share method to the continuous (sampled population) ? discrete (surveyed population) case, from the extension proposed by Cordy (1993) of the Horvitz-Thompson estimator for drawing points carried out in a continuous universe.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202200200010
    Description:

    Multilevel time series (MTS) models are applied to estimate trends in time series of antenatal care coverage at several administrative levels in Bangladesh, based on repeated editions of the Bangladesh Demographic and Health Survey (BDHS) within the period 1994-2014. MTS models are expressed in an hierarchical Bayesian framework and fitted using Markov Chain Monte Carlo simulations. The models account for varying time lags of three or four years between the editions of the BDHS and provide predictions for the intervening years as well. It is proposed to apply cross-sectional Fay-Herriot models to the survey years separately at district level, which is the most detailed regional level. Time series of these small domain predictions at the district level and their variance-covariance matrices are used as input series for the MTS models. Spatial correlations among districts, random intercept and slope at the district level, and different trend models at district level and higher regional levels are examined in the MTS models to borrow strength over time and space. Trend estimates at district level are obtained directly from the model outputs, while trend estimates at higher regional and national levels are obtained by aggregation of the district level predictions, resulting in a numerically consistent set of trend estimates.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200011
    Description:

    Two-phase sampling is a cost effective sampling design employed extensively in surveys. In this paper a method of most efficient linear estimation of totals in two-phase sampling is proposed, which exploits optimally auxiliary survey information. First, a best linear unbiased estimator (BLUE) of any total is formally derived in analytic form, and shown to be also a calibration estimator. Then, a proper reformulation of such a BLUE and estimation of its unknown coefficients leads to the construction of an “optimal” regression estimator, which can also be obtained through a suitable calibration procedure. A distinctive feature of such calibration is the alignment of estimates from the two phases in an one-step procedure involving the combined first-and-second phase samples. Optimal estimation is feasible for certain two-phase designs that are used often in large scale surveys. For general two-phase designs, an alternative calibration procedure gives a generalized regression estimator as an approximate optimal estimator. The proposed general approach to optimal estimation leads to the most effective use of the available auxiliary information in any two-phase survey. The advantages of this approach over existing methods of estimation in two-phase sampling are shown both theoretically and through a simulation study.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200012
    Description:

    In many applications, the population means of geographically adjacent small areas exhibit a spatial variation. If available auxiliary variables do not adequately account for the spatial pattern, the residual variation will be included in the random effects. As a result, the independent and identical distribution assumption on random effects of the Fay-Herriot model will fail. Furthermore, limited resources often prevent numerous sub-populations from being included in the sample, resulting in non-sampled small areas. The problem can be exacerbated for predicting means of non-sampled small areas using the above Fay-Herriot model as the predictions will be made based solely on the auxiliary variables. To address such inadequacy, we consider Bayesian spatial random-effect models that can accommodate multiple non-sampled areas. Under mild conditions, we establish the propriety of the posterior distributions for various spatial models for a useful class of improper prior densities on model parameters. The effectiveness of these spatial models is assessed based on simulated and real data. Specifically, we examine predictions of statewide four-person family median incomes based on the 1990 Current Population Survey and the 1980 Census for the United States of America.

    Release date: 2022-12-15

  • Articles and reports: 75F0002M2022006
    Description:

    This technical paper describes how the cost for "other necessities" is estimated in the 2018-base MBM. It provides a brief overview of the theory and application of techniques for estimating costs of "other necessities" in poverty lines and deconstructs the 2018-base MBM other necessities component to provide insights on how it is constructed. The aim of this paper is to provide a more detailed understanding of how the other necessities component of the MBM is estimated.

    Release date: 2022-12-08

  • Articles and reports: 89-648-X2022001
    Description:

    This report explores the size and nature of the attrition challenges faced by the Longitudinal and International Study of Adults (LISA) survey, as well as the use of a non-response weight adjustment and calibration strategy to mitigate the effects of attrition on the LISA estimates. The study focuses on data from waves 1 (2012) to 4 (2018) and uses practical examples based on selected demographic variables, to illustrate how attrition be assessed and treated.

    Release date: 2022-11-14
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (534)

Analysis (534) (0 to 10 of 534 results)

  • Articles and reports: 12-001-X202300100003
    Description: To improve the precision of inferences and reduce costs there is considerable interest in combining data from several sources such as sample surveys and administrative data. Appropriate methodology is required to ensure satisfactory inferences since the target populations and methods for acquiring data may be quite different. To provide improved inferences we use methodology that has a more general structure than the ones in current practice. We start with the case where the analyst has only summary statistics from each of the sources. In our primary method, uncertain pooling, it is assumed that the analyst can regard one source, survey r, as the single best choice for inference. This method starts with the data from survey r and adds data from those other sources that are shown to form clusters that include survey r. We also consider Dirichlet process mixtures, one of the most popular nonparametric Bayesian methods. We use analytical expressions and the results from numerical studies to show properties of the methodology.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100004
    Description: The Dutch Health Survey (DHS), conducted by Statistics Netherlands, is designed to produce reliable direct estimates at an annual frequency. Data collection is based on a combination of web interviewing and face-to-face interviewing. Due to lockdown measures during the Covid-19 pandemic there was no or less face-to-face interviewing possible, which resulted in a sudden change in measurement and selection effects in the survey outcomes. Furthermore, the production of annual data about the effect of Covid-19 on health-related themes with a delay of about one year compromises the relevance of the survey. The sample size of the DHS does not allow the production of figures for shorter reference periods. Both issues are solved by developing a bivariate structural time series model (STM) to estimate quarterly figures for eight key health indicators. This model combines two series of direct estimates, a series based on complete response and a series based on web response only and provides model-based predictions for the indicators that are corrected for the loss of face-to-face interviews during the lockdown periods. The model is also used as a form of small area estimation and borrows sample information observed in previous reference periods. In this way timely and relevant statistics describing the effects of the corona crisis on the development of Dutch health are published. In this paper the method based on the bivariate STM is compared with two alternative methods. The first one uses a univariate STM where no correction for the lack of face-to-face observation is applied to the estimates. The second one uses a univariate STM that also contains an intervention variable that models the effect of the loss of face-to-face response during the lockdown.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100005
    Description: Weight smoothing is a useful technique in improving the efficiency of design-based estimators at the risk of bias due to model misspecification. As an extension of the work of Kim and Skinner (2013), we propose using weight smoothing to construct the conditional likelihood for efficient analytic inference under informative sampling. The Beta prime distribution can be used to build a parameter model for weights in the sample. A score test is developed to test for model misspecification in the weight model. A pretest estimator using the score test can be developed naturally. The pretest estimator is nearly unbiased and can be more efficient than the design-based estimator when the weight model is correctly specified, or the original weights are highly variable. A limited simulation study is presented to investigate the performance of the proposed methods.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100011
    Description: The definition of statistical units is a recurring issue in the domain of sample surveys. Indeed, not all the populations surveyed have a readily available sampling frame. For some populations, the sampled units are distinct from the observation units and producing estimates on the population of interest raises complex questions, which can be addressed by using the weight share method (Deville and Lavallée, 2006). However, the two populations considered in this approach are discrete. In some fields of study, the sampled population is continuous: this is for example the case of forest inventories for which, frequently, the trees surveyed are those located on plots of which the centers are points randomly drawn in a given area. The production of statistical estimates from the sample of trees surveyed poses methodological difficulties, as do the associated variance calculations. The purpose of this paper is to generalize the weight share method to the continuous (sampled population) ? discrete (surveyed population) case, from the extension proposed by Cordy (1993) of the Horvitz-Thompson estimator for drawing points carried out in a continuous universe.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202200200010
    Description:

    Multilevel time series (MTS) models are applied to estimate trends in time series of antenatal care coverage at several administrative levels in Bangladesh, based on repeated editions of the Bangladesh Demographic and Health Survey (BDHS) within the period 1994-2014. MTS models are expressed in an hierarchical Bayesian framework and fitted using Markov Chain Monte Carlo simulations. The models account for varying time lags of three or four years between the editions of the BDHS and provide predictions for the intervening years as well. It is proposed to apply cross-sectional Fay-Herriot models to the survey years separately at district level, which is the most detailed regional level. Time series of these small domain predictions at the district level and their variance-covariance matrices are used as input series for the MTS models. Spatial correlations among districts, random intercept and slope at the district level, and different trend models at district level and higher regional levels are examined in the MTS models to borrow strength over time and space. Trend estimates at district level are obtained directly from the model outputs, while trend estimates at higher regional and national levels are obtained by aggregation of the district level predictions, resulting in a numerically consistent set of trend estimates.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200011
    Description:

    Two-phase sampling is a cost effective sampling design employed extensively in surveys. In this paper a method of most efficient linear estimation of totals in two-phase sampling is proposed, which exploits optimally auxiliary survey information. First, a best linear unbiased estimator (BLUE) of any total is formally derived in analytic form, and shown to be also a calibration estimator. Then, a proper reformulation of such a BLUE and estimation of its unknown coefficients leads to the construction of an “optimal” regression estimator, which can also be obtained through a suitable calibration procedure. A distinctive feature of such calibration is the alignment of estimates from the two phases in an one-step procedure involving the combined first-and-second phase samples. Optimal estimation is feasible for certain two-phase designs that are used often in large scale surveys. For general two-phase designs, an alternative calibration procedure gives a generalized regression estimator as an approximate optimal estimator. The proposed general approach to optimal estimation leads to the most effective use of the available auxiliary information in any two-phase survey. The advantages of this approach over existing methods of estimation in two-phase sampling are shown both theoretically and through a simulation study.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200012
    Description:

    In many applications, the population means of geographically adjacent small areas exhibit a spatial variation. If available auxiliary variables do not adequately account for the spatial pattern, the residual variation will be included in the random effects. As a result, the independent and identical distribution assumption on random effects of the Fay-Herriot model will fail. Furthermore, limited resources often prevent numerous sub-populations from being included in the sample, resulting in non-sampled small areas. The problem can be exacerbated for predicting means of non-sampled small areas using the above Fay-Herriot model as the predictions will be made based solely on the auxiliary variables. To address such inadequacy, we consider Bayesian spatial random-effect models that can accommodate multiple non-sampled areas. Under mild conditions, we establish the propriety of the posterior distributions for various spatial models for a useful class of improper prior densities on model parameters. The effectiveness of these spatial models is assessed based on simulated and real data. Specifically, we examine predictions of statewide four-person family median incomes based on the 1990 Current Population Survey and the 1980 Census for the United States of America.

    Release date: 2022-12-15

  • Articles and reports: 75F0002M2022006
    Description:

    This technical paper describes how the cost for "other necessities" is estimated in the 2018-base MBM. It provides a brief overview of the theory and application of techniques for estimating costs of "other necessities" in poverty lines and deconstructs the 2018-base MBM other necessities component to provide insights on how it is constructed. The aim of this paper is to provide a more detailed understanding of how the other necessities component of the MBM is estimated.

    Release date: 2022-12-08

  • Articles and reports: 89-648-X2022001
    Description:

    This report explores the size and nature of the attrition challenges faced by the Longitudinal and International Study of Adults (LISA) survey, as well as the use of a non-response weight adjustment and calibration strategy to mitigate the effects of attrition on the LISA estimates. The study focuses on data from waves 1 (2012) to 4 (2018) and uses practical examples based on selected demographic variables, to illustrate how attrition be assessed and treated.

    Release date: 2022-11-14

  • Stats in brief: 11-001-X202231822683
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2022-11-14
Reference (27)

Reference (27) (0 to 10 of 27 results)

  • Surveys and statistical programs – Documentation: 98-306-X
    Description:

    This report describes sampling, weighting and estimation procedures used in the Census of Population. It provides operational and theoretical justifications for them, and presents the results of the evaluations of these procedures.

    Release date: 2023-10-04

  • Notices and consultations: 75F0002M2019006
    Description:

    In 2018, Statistics Canada released two new data tables with estimates of effective tax and transfer rates for individual tax filers and census families. These estimates are derived from the Longitudinal Administrative Databank. This publication provides a detailed description of the methods used to derive the estimates of effective tax and transfer rates.

    Release date: 2019-04-16

  • Surveys and statistical programs – Documentation: 75F0002M2015003
    Description:

    This note discusses revised income estimates from the Survey of Labour and Income Dynamics (SLID). These revisions to the SLID estimates make it possible to compare results from the Canadian Income Survey (CIS) to earlier years. The revisions address the issue of methodology differences between SLID and CIS.

    Release date: 2015-12-17

  • Surveys and statistical programs – Documentation: 91-528-X
    Description:

    This manual provides detailed descriptions of the data sources and methods used by Statistics Canada to estimate population. They comprise Postcensal and intercensal population estimates; base population; births and deaths; immigration; emigration; non-permanent residents; interprovincial migration; subprovincial estimates of population; population estimates by age, sex and marital status; and census family estimates. A glossary of principal terms is contained at the end of the manual, followed by the standard notation used.

    Until now, literature on the methodological changes for estimates calculations has always been spread throughout various Statistics Canada publications and background papers. This manual provides users of demographic statistics with a comprehensive compilation of the current procedures used by Statistics Canada to prepare population and family estimates.

    Release date: 2015-11-17

  • Surveys and statistical programs – Documentation: 13-605-X201500414166
    Description:

    Estimates of the underground economy by province and territory for the period 2007 to 2012 are now available for the first time. The objective of this technical note is to explain how the methodology employed to derive upper-bound estimates of the underground economy for the provinces and territories differs from that used to derive national estimates.

    Release date: 2015-04-29

  • Surveys and statistical programs – Documentation: 99-002-X2011001
    Description:

    This report describes sampling and weighting procedures used in the 2011 National Household Survey. It provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.

    Release date: 2015-01-28

  • Surveys and statistical programs – Documentation: 99-002-X
    Description: This report describes sampling and weighting procedures used in the 2011 National Household Survey. It provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.
    Release date: 2015-01-28

  • Surveys and statistical programs – Documentation: 92-568-X
    Description:

    This report describes sampling and weighting procedures used in the 2006 Census. It reviews the history of these procedures in Canadian censuses, provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.

    Release date: 2009-08-11

  • Surveys and statistical programs – Documentation: 71F0031X2006003
    Description:

    This paper introduces and explains modifications made to the Labour Force Survey estimates in January 2006. Some of these modifications include changes to the population estimates, improvements to the public and private sector estimates and historical updates to several small Census Agglomerations (CA).

    Release date: 2006-01-25

  • Surveys and statistical programs – Documentation: 62F0026M2005002
    Description:

    This document will provide an overview of the differences between the old and the new weighting methodologies and the effect of the new weighting system on estimations.

    Release date: 2005-06-30
Date modified: