Weighting and estimation

Sort Help
entries

Results

All (588)

All (588) (0 to 10 of 588 results)

  • Articles and reports: 12-001-X202400100001
    Description: Inspired by the two excellent discussions of our paper, we offer some new insights and developments into the problem of estimating participation probabilities for non-probability samples. First, we propose an improvement of the method of Chen, Li and Wu (2020), based on best linear unbiased estimation theory, that more efficiently leverages the available probability and non-probability sample data. We also develop a sample likelihood approach, similar in spirit to the method of Elliott (2009), that properly accounts for the overlap between both samples when it can be identified in at least one of the samples. We use best linear unbiased prediction theory to handle the scenario where the overlap is unknown. Interestingly, our two proposed approaches coincide in the case of unknown overlap. Then, we show that many existing methods can be obtained as a special case of a general unbiased estimating function. Finally, we conclude with some comments on nonparametric estimation of participation probabilities.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100002
    Description: We provide comparisons among three parametric methods for the estimation of participation probabilities and some brief comments on homogeneous groups and post-stratification.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100003
    Description: Beaumont, Bosa, Brennan, Charlebois and Chu (2024) propose innovative model selection approaches for estimation of participation probabilities for non-probability sample units. We focus our discussion on the choice of a likelihood and parameterization of the model, which are key for the effectiveness of the techniques developed in the paper. We consider alternative likelihood and pseudo-likelihood based methods for estimation of participation probabilities and present simulations implementing and comparing the AIC based variable selection. We demonstrate that, under important practical scenarios, the approach based on a likelihood formulated over the observed pooled non-probability and probability samples performed better than the pseudo-likelihood based alternatives. The contrast in sensitivity of the AIC criteria is especially large for small probability sample sizes and low overlap in covariates domains.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100004
    Description: Non-probability samples are being increasingly explored in National Statistical Offices as an alternative to probability samples. However, it is well known that the use of a non-probability sample alone may produce estimates with significant bias due to the unknown nature of the underlying selection mechanism. Bias reduction can be achieved by integrating data from the non-probability sample with data from a probability sample provided that both samples contain auxiliary variables in common. We focus on inverse probability weighting methods, which involve modelling the probability of participation in the non-probability sample. First, we consider the logistic model along with pseudo maximum likelihood estimation. We propose a variable selection procedure based on a modified Akaike Information Criterion (AIC) that properly accounts for the data structure and the probability sampling design. We also propose a simple rank-based method of forming homogeneous post-strata. Then, we extend the Classification and Regression Trees (CART) algorithm to this data integration scenario, while again properly accounting for the probability sampling design. A bootstrap variance estimator is proposed that reflects two sources of variability: the probability sampling design and the participation model. Our methods are illustrated using Statistics Canada’s crowdsourcing and survey data.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100005
    Description: In this rejoinder, I address the comments from the discussants, Dr. Takumi Saegusa, Dr. Jae-Kwang Kim and Ms. Yonghyun Kwon. Dr. Saegusa’s comments about the differences between the conditional exchangeability (CE) assumption for causal inferences versus the CE assumption for finite population inferences using nonprobability samples, and the distinction between design-based versus model-based approaches for finite population inference using nonprobability samples, are elaborated and clarified in the context of my paper. Subsequently, I respond to Dr. Kim and Ms. Kwon’s comprehensive framework for categorizing existing approaches for estimating propensity scores (PS) into conditional and unconditional approaches. I expand their simulation studies to vary the sampling weights, allow for misspecified PS models, and include an additional estimator, i.e., scaled adjusted logistic propensity estimator (Wang, Valliant and Li (2021), denoted by sWBS). In my simulations, it is observed that the sWBS estimator consistently outperforms or is comparable to the other estimators under the misspecified PS model. The sWBS, as well as WBS or ABS described in my paper, do not assume that the overlapped units in both the nonprobability and probability reference samples are negligible, nor do they require the identification of overlap units as needed by the estimators proposed by Dr. Kim and Ms. Kwon.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100006
    Description: In some of non-probability sample literature, the conditional exchangeability assumption is considered to be necessary for valid statistical inference. This assumption is rooted in causal inference though its potential outcome framework differs greatly from that of non-probability samples. We describe similarities and differences of two frameworks and discuss issues to consider when adopting the conditional exchangeability assumption in non-probability sample setups. We also discuss the role of finite population inference in different approaches of propensity scores and outcome regression modeling to non-probability samples.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100007
    Description: Pseudo weight construction for data integration can be understood in the two-phase sampling framework. Using the two-phase sampling framework, we discuss two approaches to the estimation of propensity scores and develop a new way to construct the propensity score function for data integration using the conditional maximum likelihood method. Results from a limited simulation study are also presented.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100008
    Description: Nonprobability samples emerge rapidly to address time-sensitive priority topics in different areas. These data are timely but subject to selection bias. To reduce selection bias, there has been wide literature in survey research investigating the use of propensity-score (PS) adjustment methods to improve the population representativeness of nonprobability samples, using probability-based survey samples as external references. Conditional exchangeability (CE) assumption is one of the key assumptions required by PS-based adjustment methods. In this paper, I first explore the validity of the CE assumption conditional on various balancing score estimates that are used in existing PS-based adjustment methods. An adaptive balancing score is proposed for unbiased estimation of population means. The population mean estimators under the three CE assumptions are evaluated via Monte Carlo simulation studies and illustrated using the NIH SARS-CoV-2 seroprevalence study to estimate the proportion of U.S. adults with COVID-19 antibodies from April 01-August 04, 2020.
    Release date: 2024-06-25

  • Articles and reports: 18-001-X2024001
    Description: This study applies small area estimation (SAE) and a new geographic concept called Self-contained Labor Area (SLA) to the Canadian Survey on Business Conditions (CSBC) with a focus on remote work opportunities in rural labor markets. Through SAE modelling, we estimate the proportions of businesses, classified by general industrial sector (service providers and goods producers), that would primarily offer remote work opportunities to their workforce.
    Release date: 2024-04-22

  • Stats in brief: 11-001-X202411338008
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2024-04-22
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (561)

Analysis (561) (0 to 10 of 561 results)

  • Articles and reports: 12-001-X202400100001
    Description: Inspired by the two excellent discussions of our paper, we offer some new insights and developments into the problem of estimating participation probabilities for non-probability samples. First, we propose an improvement of the method of Chen, Li and Wu (2020), based on best linear unbiased estimation theory, that more efficiently leverages the available probability and non-probability sample data. We also develop a sample likelihood approach, similar in spirit to the method of Elliott (2009), that properly accounts for the overlap between both samples when it can be identified in at least one of the samples. We use best linear unbiased prediction theory to handle the scenario where the overlap is unknown. Interestingly, our two proposed approaches coincide in the case of unknown overlap. Then, we show that many existing methods can be obtained as a special case of a general unbiased estimating function. Finally, we conclude with some comments on nonparametric estimation of participation probabilities.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100002
    Description: We provide comparisons among three parametric methods for the estimation of participation probabilities and some brief comments on homogeneous groups and post-stratification.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100003
    Description: Beaumont, Bosa, Brennan, Charlebois and Chu (2024) propose innovative model selection approaches for estimation of participation probabilities for non-probability sample units. We focus our discussion on the choice of a likelihood and parameterization of the model, which are key for the effectiveness of the techniques developed in the paper. We consider alternative likelihood and pseudo-likelihood based methods for estimation of participation probabilities and present simulations implementing and comparing the AIC based variable selection. We demonstrate that, under important practical scenarios, the approach based on a likelihood formulated over the observed pooled non-probability and probability samples performed better than the pseudo-likelihood based alternatives. The contrast in sensitivity of the AIC criteria is especially large for small probability sample sizes and low overlap in covariates domains.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100004
    Description: Non-probability samples are being increasingly explored in National Statistical Offices as an alternative to probability samples. However, it is well known that the use of a non-probability sample alone may produce estimates with significant bias due to the unknown nature of the underlying selection mechanism. Bias reduction can be achieved by integrating data from the non-probability sample with data from a probability sample provided that both samples contain auxiliary variables in common. We focus on inverse probability weighting methods, which involve modelling the probability of participation in the non-probability sample. First, we consider the logistic model along with pseudo maximum likelihood estimation. We propose a variable selection procedure based on a modified Akaike Information Criterion (AIC) that properly accounts for the data structure and the probability sampling design. We also propose a simple rank-based method of forming homogeneous post-strata. Then, we extend the Classification and Regression Trees (CART) algorithm to this data integration scenario, while again properly accounting for the probability sampling design. A bootstrap variance estimator is proposed that reflects two sources of variability: the probability sampling design and the participation model. Our methods are illustrated using Statistics Canada’s crowdsourcing and survey data.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100005
    Description: In this rejoinder, I address the comments from the discussants, Dr. Takumi Saegusa, Dr. Jae-Kwang Kim and Ms. Yonghyun Kwon. Dr. Saegusa’s comments about the differences between the conditional exchangeability (CE) assumption for causal inferences versus the CE assumption for finite population inferences using nonprobability samples, and the distinction between design-based versus model-based approaches for finite population inference using nonprobability samples, are elaborated and clarified in the context of my paper. Subsequently, I respond to Dr. Kim and Ms. Kwon’s comprehensive framework for categorizing existing approaches for estimating propensity scores (PS) into conditional and unconditional approaches. I expand their simulation studies to vary the sampling weights, allow for misspecified PS models, and include an additional estimator, i.e., scaled adjusted logistic propensity estimator (Wang, Valliant and Li (2021), denoted by sWBS). In my simulations, it is observed that the sWBS estimator consistently outperforms or is comparable to the other estimators under the misspecified PS model. The sWBS, as well as WBS or ABS described in my paper, do not assume that the overlapped units in both the nonprobability and probability reference samples are negligible, nor do they require the identification of overlap units as needed by the estimators proposed by Dr. Kim and Ms. Kwon.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100006
    Description: In some of non-probability sample literature, the conditional exchangeability assumption is considered to be necessary for valid statistical inference. This assumption is rooted in causal inference though its potential outcome framework differs greatly from that of non-probability samples. We describe similarities and differences of two frameworks and discuss issues to consider when adopting the conditional exchangeability assumption in non-probability sample setups. We also discuss the role of finite population inference in different approaches of propensity scores and outcome regression modeling to non-probability samples.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100007
    Description: Pseudo weight construction for data integration can be understood in the two-phase sampling framework. Using the two-phase sampling framework, we discuss two approaches to the estimation of propensity scores and develop a new way to construct the propensity score function for data integration using the conditional maximum likelihood method. Results from a limited simulation study are also presented.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100008
    Description: Nonprobability samples emerge rapidly to address time-sensitive priority topics in different areas. These data are timely but subject to selection bias. To reduce selection bias, there has been wide literature in survey research investigating the use of propensity-score (PS) adjustment methods to improve the population representativeness of nonprobability samples, using probability-based survey samples as external references. Conditional exchangeability (CE) assumption is one of the key assumptions required by PS-based adjustment methods. In this paper, I first explore the validity of the CE assumption conditional on various balancing score estimates that are used in existing PS-based adjustment methods. An adaptive balancing score is proposed for unbiased estimation of population means. The population mean estimators under the three CE assumptions are evaluated via Monte Carlo simulation studies and illustrated using the NIH SARS-CoV-2 seroprevalence study to estimate the proportion of U.S. adults with COVID-19 antibodies from April 01-August 04, 2020.
    Release date: 2024-06-25

  • Articles and reports: 18-001-X2024001
    Description: This study applies small area estimation (SAE) and a new geographic concept called Self-contained Labor Area (SLA) to the Canadian Survey on Business Conditions (CSBC) with a focus on remote work opportunities in rural labor markets. Through SAE modelling, we estimate the proportions of businesses, classified by general industrial sector (service providers and goods producers), that would primarily offer remote work opportunities to their workforce.
    Release date: 2024-04-22

  • Stats in brief: 11-001-X202411338008
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2024-04-22
Reference (27)

Reference (27) (0 to 10 of 27 results)

  • Surveys and statistical programs – Documentation: 98-306-X
    Description:

    This report describes sampling, weighting and estimation procedures used in the Census of Population. It provides operational and theoretical justifications for them, and presents the results of the evaluations of these procedures.

    Release date: 2023-10-04

  • Notices and consultations: 75F0002M2019006
    Description:

    In 2018, Statistics Canada released two new data tables with estimates of effective tax and transfer rates for individual tax filers and census families. These estimates are derived from the Longitudinal Administrative Databank. This publication provides a detailed description of the methods used to derive the estimates of effective tax and transfer rates.

    Release date: 2019-04-16

  • Surveys and statistical programs – Documentation: 75F0002M2015003
    Description:

    This note discusses revised income estimates from the Survey of Labour and Income Dynamics (SLID). These revisions to the SLID estimates make it possible to compare results from the Canadian Income Survey (CIS) to earlier years. The revisions address the issue of methodology differences between SLID and CIS.

    Release date: 2015-12-17

  • Surveys and statistical programs – Documentation: 91-528-X
    Description:

    This manual provides detailed descriptions of the data sources and methods used by Statistics Canada to estimate population. They comprise Postcensal and intercensal population estimates; base population; births and deaths; immigration; emigration; non-permanent residents; interprovincial migration; subprovincial estimates of population; population estimates by age, sex and marital status; and census family estimates. A glossary of principal terms is contained at the end of the manual, followed by the standard notation used.

    Until now, literature on the methodological changes for estimates calculations has always been spread throughout various Statistics Canada publications and background papers. This manual provides users of demographic statistics with a comprehensive compilation of the current procedures used by Statistics Canada to prepare population and family estimates.

    Release date: 2015-11-17

  • Surveys and statistical programs – Documentation: 13-605-X201500414166
    Description:

    Estimates of the underground economy by province and territory for the period 2007 to 2012 are now available for the first time. The objective of this technical note is to explain how the methodology employed to derive upper-bound estimates of the underground economy for the provinces and territories differs from that used to derive national estimates.

    Release date: 2015-04-29

  • Surveys and statistical programs – Documentation: 99-002-X2011001
    Description:

    This report describes sampling and weighting procedures used in the 2011 National Household Survey. It provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.

    Release date: 2015-01-28

  • Surveys and statistical programs – Documentation: 99-002-X
    Description: This report describes sampling and weighting procedures used in the 2011 National Household Survey. It provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.
    Release date: 2015-01-28

  • Surveys and statistical programs – Documentation: 92-568-X
    Description:

    This report describes sampling and weighting procedures used in the 2006 Census. It reviews the history of these procedures in Canadian censuses, provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.

    Release date: 2009-08-11

  • Surveys and statistical programs – Documentation: 71F0031X2006003
    Description:

    This paper introduces and explains modifications made to the Labour Force Survey estimates in January 2006. Some of these modifications include changes to the population estimates, improvements to the public and private sector estimates and historical updates to several small Census Agglomerations (CA).

    Release date: 2006-01-25

  • Surveys and statistical programs – Documentation: 62F0026M2005002
    Description:

    This document will provide an overview of the differences between the old and the new weighting methodologies and the effect of the new weighting system on estimations.

    Release date: 2005-06-30
Date modified: