Inference and foundations
Filter results by
Search HelpKeyword(s)
Type
Survey or statistical program
Results
All (119)
All (119) (0 to 10 of 119 results)
- Articles and reports: 12-001-X202500200009Description: We present and apply methodology to improve inference for small area parameters by using data from several sources. This work extends Cahoy and Sedransk (2023) who showed how to integrate summary statistics from several sources. Our methodology uses hierarchical global-local prior distributions to make inferences for the proportion of individuals in Florida’s counties who do not have health insurance. Results from an extensive simulation study show that this methodology will provide improved inference by using several data sources. Among the five model variants evaluated the ones using horseshoe priors for all variances have better performance than the ones using lasso priors for the local variances.Release date: 2025-12-23
- Articles and reports: 12-001-X202500200011Description: We propose an approximate hierarchical Bayes approach that uses the Natural Exponential Family with Quadratic Variance Function (NEF-QVF) in combining information from multiple sources to improve traditional survey estimates of finite population means for small areas. Unlike other Bayesian approaches in finite population sampling, we do not assume a model for all units of the finite population and do not require linking sampled units to the finite population frame. We assume a model only for the finite population units in which the outcome variable is observed; because, for these units, the assumed model can be checked using existing statistical tools. We do not posit an elaborate model on the true means for unobserved units. Instead, we assume that population means of cells with the same combination of factor levels are identical across small areas, and that the population mean for a cell is identical to the mean of the observed units in that cell. We apply our proposed methodology to a real-life survey, linking information from multiple disparate data sources. We also provide practical ways of model selection that can be applied to a wider class of models under similar setting but for a diverse range of scientific problems.Release date: 2025-12-23
- Articles and reports: 11-522-X202500100031Description: Several recent quasi-randomization methods for inferences from non-probability samples will be compared. The considered techniques are developed under the assumption that the sample selection is governed by an underlying latent random mechanism and that it can be uncovered by combining non-probability survey data with a "reference" probability-based sample, obtained from the same target population. Challenges prompting the development of alternative procedures include (i) non-probability sample participation indicators are available only on the observed sample units and (ii) it is not generally known which units from the underlying population belong to both the non-probability and reference samples. The ways different procedures address these challenges are considered, theoretical properties of the methods are discussed and their comparison is made using simulations.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100032Description: Although non-probability data sources are not new to official statistics, a revived interest in the topic has emerged from pressures due to falling survey response rates, increasing data collection costs and a desire to take advantage of new data source opportunities from the ongoing societal digitalisation. Due to the exclusion of certain segments of the target population, inference derived solely from a non-probability data source is likely to result in bias. This work approaches the challenge of addressing the bias by integrating non-probability data with reference probability samples. The focus will be on methods to model the propensity of inclusion in the non-probability dataset with the help of the accompanying reference sample, with the modelled propensities then applied in an inverse probability weighting approach to produce population estimates. The reference sample is sometimes assumed as given. In this presentation however, an objective of finding an optimal strategy will be pursued that is, the combination of a data integration-based estimator and sample design for the reference probability sample. Recent work is discussed in which advantage is taken of the good unit identification possibilities in business surveys to study an estimator based on propensities and derive optimal (unequal) selection probabilities for the reference sample.Release date: 2025-09-08
- Articles and reports: 12-001-X202500100005Description: In this paper, we derive a second-order unbiased (or nearly unbiased) mean squared prediction error (MSPE) estimator of the empirical best linear unbiased predictor (EBLUP) of a small area mean for a semi-parametric extension to the well-known Fay-Herriot model. Specifically, we derive our MSPE estimator essentially assuming certain moment conditions on both the sampling errors and random effects distributions. The normality-based Prasad-Rao MSPE estimator has a surprising robustness property in that it remains second-order unbiased under the non-normality of random effects when a simple Prasad-Rao method-of-moments estimator is used for the variance component and the sampling error distribution is normal. We show that the normality-based MSPE estimator is no longer second-order unbiased when the sampling error distribution has non-zero kurtosis or when the Fay-Herriot moment method is used to estimate the variance component, even when the sampling error distribution is normal. Interestingly, when the simple method-of moments estimator is used for the variance component, our proposed MSPE estimator does not require the estimation of kurtosis of the random effects. Results of a simulation study on the accuracy of the proposed MSPE estimator, under non-normality of both sampling and random effects distributions, are also presented.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100009Description: BigData users and the BigData research community are expanding rapidly, while statisticians at large are seemingly becoming divided between those who are enthusiastic and those who are concerned, if not downright hostile. Is BigData also a big step ahead, truly advancing our ability to extract meaningful information and actual knowledge from data? Is BigData underplaying traditional statistical inference as we know it, supplanting survey methodology as a low-cost futuristic option? In this paper I will attempt to unravel the multifaceted relationship bridging BigData to sampling methodology. Starting by reasoning why it should be interesting to look at BigData from a sampling statistician’s perspective, I will delve deeper into the somewhat ambiguous definition of BigData and share some very personal considerations and views on the matter. In the process, several open questions will arise while discussing a personal selection of insights that are traceable through the vast body of statistical literature around BigData and sampling methodology. The discussion will take various angles explored across nine key points, and it will conclude with a forward-looking perspective on a main challenge for future research: addressing the strong assumptions needed to manage deviations from purely randomized data collection.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100014Description: Rao (1999) summarized trends in sample survey theory and methods at the turn of the millenium. We provide an updated discussion of some current trends in survey design and estimation methods for the 50th anniversary of Survey Methodology. Recent innovations in survey design include research on anticipating nonsampling errors at the design stage and development of balanced and adaptive sampling designs to take advantage of detailed sampling frame information or data gathered during the survey process. Nonparametric and machine learning methods are increasingly used for data editing as well as for model-assisted estimation and nonresponse adjustments. Small area models have been expanded to incorporate spatial and time series information, increase the flexibility and robustness of the linking and variance models, benchmark to large-area direct estimators, and (for unit level models) account for informative sampling designs. The increasing availability of large administrative datasets, sensor and satellite data, and convenience samples has spurred research on how to use these sources - on their own and when integrated with probability samples. We conclude by discussing some frontiers for survey research.Release date: 2025-06-30
- 8. Comments by Mary E. Thompson on “Progress in survey science and practice: Yesterday-today-tomorrow”Articles and reports: 12-001-X202500100016Description: These comments on C.-E. Särndal’s paper, “Progress in survey science and practice: yesterday-today-tomorrow”, will touch on probability sampling fundamentals, progress through competing approaches to inference, connections with other parts of statistics, and data in the twenty-first century.Release date: 2025-06-30
- Articles and reports: 12-001-X202400200008Description: When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our approach adheres to the general strategy proposed by Rubin (1993). Specifically, we generate pseudo-populations by applying the weighted finite population Bayesian bootstrap to account for survey weights, take simple random samples from those pseudo-populations, estimate synthesis models using these simple random samples, and release simulated data drawn from the models as public use files. To facilitate variance estimation, we use the framework of multiple imputation with two data generation strategies. In the first, we generate multiple data sets from each simple random sample. In the second, we generate a single synthetic data set from each simple random sample. We present multiple imputation combining rules for each setting. We illustrate the repeated sampling properties of the combining rules via simulation studies, including comparisons with synthetic data generation based on pseudo-likelihood methods. We apply the proposed methods to a subset of data from the American Community Survey.Release date: 2024-12-20
- Articles and reports: 12-001-X202400200014Description: Adaptive cluster sampling designs were proposed as a method that could be used when sampling rare populations whose units tend to appear in clusters. The resulting estimator is not based on any model assumptions and is design unbiased. It can have smaller variance than the standard estimator which does not incorporate the fact that one is dealing with a rare population. Here we will demonstrate that, when adaptive cluster sampling is appropriate, its estimator does not take into account all the available information in the design. We present a quasi Bayesian approach which incorporates the information which is now ignored. We will see that the resulting estimator is a significant improvement over the current methods.Release date: 2024-12-20
- Previous Go to previous page of All results
- 1 (current) Go to page 1 of All results
- 2 Go to page 2 of All results
- 3 Go to page 3 of All results
- 4 Go to page 4 of All results
- 5 Go to page 5 of All results
- 6 Go to page 6 of All results
- 7 Go to page 7 of All results
- ...
- 12 Go to page 12 of All results
- Next Go to next page of All results
Data (0)
Data (0) (0 results)
No content available at this time.
Analysis (111)
Analysis (111) (0 to 10 of 111 results)
- Articles and reports: 12-001-X202500200009Description: We present and apply methodology to improve inference for small area parameters by using data from several sources. This work extends Cahoy and Sedransk (2023) who showed how to integrate summary statistics from several sources. Our methodology uses hierarchical global-local prior distributions to make inferences for the proportion of individuals in Florida’s counties who do not have health insurance. Results from an extensive simulation study show that this methodology will provide improved inference by using several data sources. Among the five model variants evaluated the ones using horseshoe priors for all variances have better performance than the ones using lasso priors for the local variances.Release date: 2025-12-23
- Articles and reports: 12-001-X202500200011Description: We propose an approximate hierarchical Bayes approach that uses the Natural Exponential Family with Quadratic Variance Function (NEF-QVF) in combining information from multiple sources to improve traditional survey estimates of finite population means for small areas. Unlike other Bayesian approaches in finite population sampling, we do not assume a model for all units of the finite population and do not require linking sampled units to the finite population frame. We assume a model only for the finite population units in which the outcome variable is observed; because, for these units, the assumed model can be checked using existing statistical tools. We do not posit an elaborate model on the true means for unobserved units. Instead, we assume that population means of cells with the same combination of factor levels are identical across small areas, and that the population mean for a cell is identical to the mean of the observed units in that cell. We apply our proposed methodology to a real-life survey, linking information from multiple disparate data sources. We also provide practical ways of model selection that can be applied to a wider class of models under similar setting but for a diverse range of scientific problems.Release date: 2025-12-23
- Articles and reports: 11-522-X202500100031Description: Several recent quasi-randomization methods for inferences from non-probability samples will be compared. The considered techniques are developed under the assumption that the sample selection is governed by an underlying latent random mechanism and that it can be uncovered by combining non-probability survey data with a "reference" probability-based sample, obtained from the same target population. Challenges prompting the development of alternative procedures include (i) non-probability sample participation indicators are available only on the observed sample units and (ii) it is not generally known which units from the underlying population belong to both the non-probability and reference samples. The ways different procedures address these challenges are considered, theoretical properties of the methods are discussed and their comparison is made using simulations.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100032Description: Although non-probability data sources are not new to official statistics, a revived interest in the topic has emerged from pressures due to falling survey response rates, increasing data collection costs and a desire to take advantage of new data source opportunities from the ongoing societal digitalisation. Due to the exclusion of certain segments of the target population, inference derived solely from a non-probability data source is likely to result in bias. This work approaches the challenge of addressing the bias by integrating non-probability data with reference probability samples. The focus will be on methods to model the propensity of inclusion in the non-probability dataset with the help of the accompanying reference sample, with the modelled propensities then applied in an inverse probability weighting approach to produce population estimates. The reference sample is sometimes assumed as given. In this presentation however, an objective of finding an optimal strategy will be pursued that is, the combination of a data integration-based estimator and sample design for the reference probability sample. Recent work is discussed in which advantage is taken of the good unit identification possibilities in business surveys to study an estimator based on propensities and derive optimal (unequal) selection probabilities for the reference sample.Release date: 2025-09-08
- Articles and reports: 12-001-X202500100005Description: In this paper, we derive a second-order unbiased (or nearly unbiased) mean squared prediction error (MSPE) estimator of the empirical best linear unbiased predictor (EBLUP) of a small area mean for a semi-parametric extension to the well-known Fay-Herriot model. Specifically, we derive our MSPE estimator essentially assuming certain moment conditions on both the sampling errors and random effects distributions. The normality-based Prasad-Rao MSPE estimator has a surprising robustness property in that it remains second-order unbiased under the non-normality of random effects when a simple Prasad-Rao method-of-moments estimator is used for the variance component and the sampling error distribution is normal. We show that the normality-based MSPE estimator is no longer second-order unbiased when the sampling error distribution has non-zero kurtosis or when the Fay-Herriot moment method is used to estimate the variance component, even when the sampling error distribution is normal. Interestingly, when the simple method-of moments estimator is used for the variance component, our proposed MSPE estimator does not require the estimation of kurtosis of the random effects. Results of a simulation study on the accuracy of the proposed MSPE estimator, under non-normality of both sampling and random effects distributions, are also presented.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100009Description: BigData users and the BigData research community are expanding rapidly, while statisticians at large are seemingly becoming divided between those who are enthusiastic and those who are concerned, if not downright hostile. Is BigData also a big step ahead, truly advancing our ability to extract meaningful information and actual knowledge from data? Is BigData underplaying traditional statistical inference as we know it, supplanting survey methodology as a low-cost futuristic option? In this paper I will attempt to unravel the multifaceted relationship bridging BigData to sampling methodology. Starting by reasoning why it should be interesting to look at BigData from a sampling statistician’s perspective, I will delve deeper into the somewhat ambiguous definition of BigData and share some very personal considerations and views on the matter. In the process, several open questions will arise while discussing a personal selection of insights that are traceable through the vast body of statistical literature around BigData and sampling methodology. The discussion will take various angles explored across nine key points, and it will conclude with a forward-looking perspective on a main challenge for future research: addressing the strong assumptions needed to manage deviations from purely randomized data collection.Release date: 2025-06-30
- Articles and reports: 12-001-X202500100014Description: Rao (1999) summarized trends in sample survey theory and methods at the turn of the millenium. We provide an updated discussion of some current trends in survey design and estimation methods for the 50th anniversary of Survey Methodology. Recent innovations in survey design include research on anticipating nonsampling errors at the design stage and development of balanced and adaptive sampling designs to take advantage of detailed sampling frame information or data gathered during the survey process. Nonparametric and machine learning methods are increasingly used for data editing as well as for model-assisted estimation and nonresponse adjustments. Small area models have been expanded to incorporate spatial and time series information, increase the flexibility and robustness of the linking and variance models, benchmark to large-area direct estimators, and (for unit level models) account for informative sampling designs. The increasing availability of large administrative datasets, sensor and satellite data, and convenience samples has spurred research on how to use these sources - on their own and when integrated with probability samples. We conclude by discussing some frontiers for survey research.Release date: 2025-06-30
- 8. Comments by Mary E. Thompson on “Progress in survey science and practice: Yesterday-today-tomorrow”Articles and reports: 12-001-X202500100016Description: These comments on C.-E. Särndal’s paper, “Progress in survey science and practice: yesterday-today-tomorrow”, will touch on probability sampling fundamentals, progress through competing approaches to inference, connections with other parts of statistics, and data in the twenty-first century.Release date: 2025-06-30
- Articles and reports: 12-001-X202400200008Description: When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our approach adheres to the general strategy proposed by Rubin (1993). Specifically, we generate pseudo-populations by applying the weighted finite population Bayesian bootstrap to account for survey weights, take simple random samples from those pseudo-populations, estimate synthesis models using these simple random samples, and release simulated data drawn from the models as public use files. To facilitate variance estimation, we use the framework of multiple imputation with two data generation strategies. In the first, we generate multiple data sets from each simple random sample. In the second, we generate a single synthetic data set from each simple random sample. We present multiple imputation combining rules for each setting. We illustrate the repeated sampling properties of the combining rules via simulation studies, including comparisons with synthetic data generation based on pseudo-likelihood methods. We apply the proposed methods to a subset of data from the American Community Survey.Release date: 2024-12-20
- Articles and reports: 12-001-X202400200014Description: Adaptive cluster sampling designs were proposed as a method that could be used when sampling rare populations whose units tend to appear in clusters. The resulting estimator is not based on any model assumptions and is design unbiased. It can have smaller variance than the standard estimator which does not incorporate the fact that one is dealing with a rare population. Here we will demonstrate that, when adaptive cluster sampling is appropriate, its estimator does not take into account all the available information in the design. We present a quasi Bayesian approach which incorporates the information which is now ignored. We will see that the resulting estimator is a significant improvement over the current methods.Release date: 2024-12-20
- Previous Go to previous page of Analysis results
- 1 (current) Go to page 1 of Analysis results
- 2 Go to page 2 of Analysis results
- 3 Go to page 3 of Analysis results
- 4 Go to page 4 of Analysis results
- 5 Go to page 5 of Analysis results
- 6 Go to page 6 of Analysis results
- 7 Go to page 7 of Analysis results
- ...
- 12 Go to page 12 of Analysis results
- Next Go to next page of Analysis results
Reference (8)
Reference (8) ((8 results))
- 1. The Potential Use of Remote Sensing to Produce Field Crop Statistics at Statistics Canada ArchivedSurveys and statistical programs – Documentation: 11-522-X201300014259Description:
In an effort to reduce response burden on farm operators, Statistics Canada is studying alternative approaches to telephone surveys for producing field crop estimates. One option is to publish harvested area and yield estimates in September as is currently done, but to calculate them using models based on satellite and weather data, and data from the July telephone survey. However before adopting such an approach, a method must be found which produces estimates with a sufficient level of accuracy. Research is taking place to investigate different possibilities. Initial research results and issues to consider are discussed in this paper.
Release date: 2014-10-31 - Surveys and statistical programs – Documentation: 12-002-X20040027035Description:
As part of the processing of the National Longitudinal Survey of Children and Youth (NLSCY) cycle 4 data, historical revisions have been made to the data of the first 3 cycles, either to correct errors or to update the data. During processing, particular attention was given to the PERSRUK (Person Identifier) and the FIELDRUK (Household Identifier). The same level of attention has not been given to the other identifiers that are included in the data base, the CHILDID (Person identifier) and the _IDHD01 (Household identifier). These identifiers have been created for the public files and can also be found in the master files by default. The PERSRUK should be used to link records between files and the FIELDRUK to determine the household when using the master files.
Release date: 2004-10-05 - 3. Survey of Financial Security - Methodology for Estimating the Value of Employer Pension Plan Benefits ArchivedSurveys and statistical programs – Documentation: 13F0026M2001003Description:
Initial results from the Survey of Financial Security (SFS), which provides information on the net worth of Canadians, were released on March 15 2001, in The daily. The survey collected information on the value of the financial and non-financial assets owned by each family unit and on the amount of their debt.
Statistics Canada is currently refining this initial estimate of net worth by adding to it an estimate of the value of benefits accrued in employer pension plans. This is an important addition to any asset and debt survey as, for many family units, it is likely to be one of the largest assets. With the aging of the population, information on pension accumulations is greatly needed to better understand the financial situation of those nearing retirement. These updated estimates of the Survey of Financial Security will be released in late fall 2001.
The process for estimating the value of employer pension plan benefits is a complex one. This document describes the methodology for estimating that value, for the following groups: a) persons who belonged to an RPP at the time of the survey (referred to as current plan members); b) persons who had previously belonged to an RPP and either left the money in the plan or transferred it to a new plan; c) persons who are receiving RPP benefits.
This methodology was proposed by Hubert Frenken and Michael Cohen. The former has many years of experience with Statistics Canada working with data on employer pension plans; the latter is a principal with the actuarial consulting firm William M. Mercer. Earlier this year, Statistics Canada carried out a public consultation on the proposed methodology. This report includes updates made as a result of feedback received from data users.
Release date: 2001-09-05 - 4. Survey of Financial Security - Estimating the Value of Employer Pension Plan Benefits - A Discussion Paper ArchivedSurveys and statistical programs – Documentation: 13F0026M2001002Description:
The Survey of Financial Security (SFS) will provide information on the net worth of Canadians. In order to do this, information was collected - in May and June 1999 - on the value of the assets and debts of each of the families or unattached individuals in the sample. The value of one particular asset is not easy to determine, or to estimate. That is the present value of the amount people have accrued in their employer pension plan. These plans are often called registered pension plans (RPP), as they must be registered with Canada Customs and Revenue Agency. Although some RPP members receive estimates of the value of their accrued benefit, in most cases plan members would not know this amount. However, it is likely to be one of the largest assets for many family units. And, as the baby boomers approach retirement, information on their pension accumulations is much needed to better understand their financial readiness for this transition.
The intent of this paper is to: present, for discussion, a methodology for estimating the present value of employer pension plan benefits for the Survey of Financial Security; and to seek feedback on the proposed methodology. This document proposes a methodology for estimating the value of employer pension plan benefits for the following groups:a) persons who belonged to an RPP at the time of the survey (referred to as current plan members); b) persons who had previously belonged to an RPP and either left the money in the plan or transferred it to a new plan; c) persons who are receiving RPP benefits.
Release date: 2001-02-07 - Surveys and statistical programs – Documentation: 11-522-X19990015642Description:
The Longitudinal Immigration Database (IMDB) links immigration and taxation administrative records into a comprehensive source of data on the labour market behaviour of the landed immigrant population in Canada. It covers the period 1980 to 1995 and will be updated annually starting with the 1996 tax year in 1999. Statistics Canada manages the database on behalf of a federal-provincial consortium led by Citizenship and Immigration Canada. The IMDB was created specifically to respond to the need for detailed and reliable data on the performance and impact of immigration policies and programs. It is the only source of data at Statistics Canada that provides a direct link between immigration policy levers and the economic performance of immigrants. The paper will examine the issues related to the development of a longitudinal database combining administrative records to support policy-relevant research and analysis. Discussion will focus specifically on the methodological, conceptual, analytical and privacy issues involved in the creation and ongoing development of this database. The paper will also touch briefly on research findings, which illustrate the policy outcome links the IMDB allows policy-makers to investigate.
Release date: 2000-03-02 - Surveys and statistical programs – Documentation: 11-522-X19990015650Description:
The U.S. Manufacturing Plant Ownership Change Database (OCD) was constructed using plant-level data taken from the Census Bureau's Longitudinal Research Database (LRD). It contains data on all manufacturing plants that have experienced ownership change at least once during the period 1963-92. This paper reports the status of the OCD and discuss its research possibilities. For an empirical demonstration, data taken from the database are used to study the effects of ownership changes on plant closure.
Release date: 2000-03-02 - Surveys and statistical programs – Documentation: 11-522-X19990015658Description:
Radon, a naturally occurring gas found at some level in most homes, is an established risk factor for human lung cancer. The U.S. National Research Council (1999) has recently completed a comprehensive evaluation of the health risks of residential exposure to radon, and developed models for projecting radon lung cancer risks in the general population. This analysis suggests that radon may play a role in the etiology of 10-15% of all lung cancer cases in the United States, although these estimates are subject to considerable uncertainty. In this article, we present a partial analysis of uncertainty and variability in estimates of lung cancer risk due to residential exposure to radon in the United States using a general framework for the analysis of uncertainty and variability that we have developed previously. Specifically, we focus on estimates of the age-specific excess relative risk (ERR) and lifetime relative risk (LRR), both of which vary substantially among individuals.
Release date: 2000-03-02 - Geographic files and documentation: 92F0138M1993001Geography: CanadaDescription:
The Geography Divisions of Statistics Canada and the U.S. Bureau of the Census have commenced a cooperative research program in order to foster an improved and expanded perspective on geographic areas and their relevance. One of the major objectives is to determine a common geographic area to form a geostatistical basis for cross-border research, analysis and mapping.
This report, which represents the first stage of the research, provides a list of comparable pairs of Canadian and U.S. standard geographic areas based on current definitions. Statistics Canada and the U.S. Bureau of the Census have two basic types of standard geographic entities: legislative/administrative areas (called "legal" entities in the U.S.) and statistical areas.
The preliminary pairing of geographic areas are based on face-value definitions only. The definitions are based on the June 4, 1991 Census of Population and Housing for Canada and the April 1, 1990 Census of Population and Housing for the U.S.A. The important aspect is the overall conceptual comparability, not the precise numerical thresholds used for delineating the areas.
Data users should use this report as a general guide to compare the census geographic areas of Canada and the United States, and should be aware that differences in settlement patterns and population levels preclude a precise one-to-one relationship between conceptually similar areas. The geographic areas compared in this report provide a framework for further empirical research and analysis.
Release date: 1999-03-05