Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Year of publication

6 facets displayed. 0 facets selected.

Content

1 facets displayed. 1 facets selected.
Sort Help
entries

Results

All (26)

All (26) (20 to 30 of 26 results)

  • Articles and reports: 12-001-X202100100004
    Description:

    Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202000200005
    Description:

    In surveys, text answers from open-ended questions are important because they allow respondents to provide more information without constraints. When classifying open-ended questions automatically using supervised learning, often the accuracy is not high enough. Alternatively, a semi-automated classification strategy can be considered: answers in the easy-to-classify group are classified automatically, answers in the hard-to-classify group are classified manually. This paper presents a semi-automated classification method for multi-label open-ended questions where text answers may be associated with multiple classes simultaneously. The proposed method effectively combines multiple probabilistic classifier chains while avoiding prohibitive computational costs. The performance evaluation on three different data sets demonstrates the effectiveness of the proposed method.

    Release date: 2020-12-15

  • Articles and reports: 12-001-X202000100003
    Description:

    Probability sampling designs are sometimes used in conjunction with model-based predictors of finite population quantities. These designs should minimize the anticipated variance (AV), which is the variance over both the superpopulation and sampling processes, of the predictor of interest. The AV-optimal design is well known for model-assisted estimators which attain the Godambe-Joshi lower bound for the AV of design-unbiased estimators. However, no optimal probability designs have been found for model-based prediction, except under conditions such that the model-based and model-assisted estimators coincide; these cases can be limiting. This paper shows that the Godambe-Joshi lower bound is an upper bound for the AV of the best linear unbiased estimator of a population total, where the upper bound is over the space of all covariate sets. Therefore model-assisted optimal designs are a sensible choice for model-based prediction when there is uncertainty about the form of the final model, as there often would be prior to conducting the survey. Simulations confirm the result over a range of scenarios, including when the relationship between the target and auxiliary variables is nonlinear and modeled using splines. The AV is lowest relative to the bound when an important design variable is not associated with the target variable.

    Release date: 2020-06-30

  • Articles and reports: 11-633-X2020001
    Description:

    This paper reviews alternative measures of income mixing within geographic units and applies them using geographically detailed income data derived from tax records. It highlights the characteristics of these measures, particularly their ease of interpretation and their suitability to decomposition across different levels of analysis, from neighbourhoods to individual apartment buildings. The discussion focuses on three measures: the dissimilarity index, the information theory index and the divergence index (D-index). Particular emphasis is placed on the D-index because it most effectively describes how income distributions at the sub-metropolitan level (e.g., neighbourhoods) differ from distributions at the metropolitan level (i.e., how much income sorting occurs across neighbourhoods). Furthermore, the D-index can consistently measure the contributions of income sorting within neighbourhoods (e.g., across individual apartment buildings) to the degree of income mixing at the neighbourhood and metropolitan scales.

    Release date: 2020-01-21

  • Articles and reports: 82-003-X201901200003
    Description:

    This article provides a description of the Canadian Census Health and Environment Cohorts (CanCHECs), a population-based linked datasets of the household population at the time of census collection. The CanCHEC datasets are rich national data resources that can be used to measure and examine health inequalities across socioeconomic and ethnocultural dimensions for different periods and locations. These datasets can also be used to examine the effects of exposure to environmental factors on human health.

    Release date: 2019-12-18

  • Articles and reports: 11-633-X2019004
    Description:

    This paper shows how to estimate the effect of the Canada-United States border on non-energy goods trade at a sub-provincial/state level using Statistics Canada’s Surface Transportation File (STF), augmented with United States domestic trade data. It uses a gravity model framework to compare cross-border to domestic trade flows among 201 Canadian and United States regions in year 2012. It shows that some 25 years after the Canada-United States Free Trade Agreement (the North American Free Trade Agreement’s predecessor) was ratified, the cost of trading goods across the border still amounts to a 30% tariff on bilateral trade between Canadian and United States regions. The paper also demonstrates how these estimates can be used along with general equilibrium Poisson pseudo maximum likelihood (GEPPML) methods to describe the effect of changing border costs on North American trade patterns and regional welfare.

    Release date: 2019-09-24
Stats in brief (2)

Stats in brief (2) ((2 results))

  • Stats in brief: 89-20-00062022004
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. In this video, we will discuss the importance of considering data ethics throughout the process of producing statistical information.

    As a pre-requisite to this video, make sure to watch the video titled “Data Ethics: An introduction” also available in Statistics Canada’s data literacy training catalogue.

    Release date: 2022-10-17

  • Stats in brief: 89-20-00062022005
    Description:

    In this video, you will learn the answers to the following questions: What are the different types of error? What are the types of error that lead to statistical bias? Where during the data journey statistical bias can occur?

    Release date: 2022-10-17
Articles and reports (23)

Articles and reports (23) (0 to 10 of 23 results)

  • Articles and reports: 12-001-X202300200005
    Description: Population undercoverage is one of the main hurdles faced by statistical analysis with non-probability survey samples. We discuss two typical scenarios of undercoverage, namely, stochastic undercoverage and deterministic undercoverage. We argue that existing estimation methods under the positivity assumption on the propensity scores (i.e., the participation probabilities) can be directly applied to handle the scenario of stochastic undercoverage. We explore strategies for mitigating biases in estimating the mean of the target population under deterministic undercoverage. In particular, we examine a split population approach based on a convex hull formulation, and construct estimators with reduced biases. A doubly robust estimator can be constructed if a followup subsample of the reference probability survey with measurements on the study variable becomes feasible. Performances of six competing estimators are investigated through a simulation study and issues which require further investigation are briefly discussed.
    Release date: 2024-01-03

  • Articles and reports: 11-633-X2023003
    Description: This paper spans the academic work and estimation strategies used in national statistics offices. It addresses the issue of producing fine, grid-level geography estimates for Canada by exploring the measurement of subprovincial and subterritorial gross domestic product using Yukon as a test case.
    Release date: 2023-12-15

  • Articles and reports: 12-001-X202300100001
    Description: Recent work in survey domain estimation allows for estimation of population domain means under a priori assumptions expressed in terms of linear inequality constraints. For example, it might be known that the population means are non-decreasing along ordered domains. Imposing the constraints has been shown to provide estimators with smaller variance and tighter confidence intervals. In this paper we consider a formal test of the null hypothesis that all the constraints are binding, versus the alternative that at least one constraint is non-binding. The test of constant versus increasing domain means is a special case. The power of the test is substantially better than the test with the same null hypothesis and an unconstrained alternative. The new test is used with data from the National Survey of College Graduates, to show that salaries are positively related to the subject’s father’s educational level, across fields of study and over several years of cohorts.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100002
    Description: We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the information projection and model calibration weighting. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.
    Release date: 2023-06-30

  • Articles and reports: 11-637-X202200100007
    Description:

    As the seventh goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to ensure access to affordable, reliable, sustainable and modern energy for all by 2030. This 2022 infographic provides an overview of indicators underlying the seventh Sustainable Development Goal in support of affordable and clean energy, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2022-12-13

  • Articles and reports: 11-637-X202200100008
    Description:

    As the eighth goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all by 2030. This 2022 infographic provides an overview of indicators underlying the eighth Sustainable Development Goal in support of decent work and economic growth, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2022-12-13

  • Articles and reports: 11-637-X202200100009
    Description:

    As the ninth goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation by 2030. This 2022 infographic provides an overview of indicators underlying the ninth Sustainable Development Goal in support of industry, innovation and infrastructure, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2022-12-13

  • Articles and reports: 11-637-X202200100010
    Description:

    As the tenth goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to reduce inequalities within and among countries by 2030. This 2022 infographic provides an overview of indicators underlying the tenth Sustainable Development Goal in support of reduced inequalities, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2022-12-13

  • Articles and reports: 11-637-X202200100011
    Description:

    As the eleventh goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to make cities and human settlements inclusive, safe, resilient and sustainable by 2030. This 2022 infographic provides an overview of indicators underlying the eleventh Sustainable Development Goal in support of sustainable cities and communities, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2022-12-13

  • Articles and reports: 11-637-X202200100004
    Description:

    As the fourth goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to ensure inclusive and equitable quality education and promote lifelong learning opportunities for all by 2030. This 2022 infographic provides an overview of indicators underlying the fourth Sustainable Development Goal in support of Quality Education, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2022-09-28
Journals and periodicals (1)

Journals and periodicals (1) ((1 result))

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2024-09-11
Date modified: