Keyword search

Sort Help
entries

Results

All (17)

All (17) (0 to 10 of 17 results)

  • Articles and reports: 11-633-X2021008
    Description:

    The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years. The IMDB includes Immigration, Refugees and Citizenship Canada (IRCC) administrative records which contain exhaustive information about immigrants who were admitted to Canada since 1952. It also includes data about non-permanent residents who have been issued temporary resident permits since 1980. This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.

    Release date: 2021-12-06

  • Articles and reports: 11-633-X2021007
    Description:

    Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

    Release date: 2021-11-16

  • Articles and reports: 75F0002M2021007
    Description:

    This discussion paper describes the proposed methodology for a Northern Market Basket Measure (MBM-N) for Yukon and the Northwest Territories, as well as identifies research which could be conducted in preparation for the 2023 review. The paper presents initial MBM-N thresholds and provides preliminary poverty estimates for reference years 2018 and 2019. A review period will follow the release of this paper, during which time Statistics Canada and Employment and Social Development Canada will welcome feedback from interested parties and work with experts, stakeholders, indigenous organizations, federal, provincial and territorial officials to validate the results.

    Release date: 2021-11-12

  • Articles and reports: 13-604-M2021001
    Description:

    This documentation outlines the methodology used to develop the Distributions of household economic accounts published in September 2021 for the reference years 2010 to 2020. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

    Release date: 2021-09-07

  • Surveys and statistical programs – Documentation: 12-004-X
    Description:

    Statistics: Power from Data! is a web resource that was created in 2001 to assist secondary students and teachers of Mathematics and Information Studies in getting the most from statistics. Over the past 20 years, this product has become one of Statistics Canada most popular references for students, teachers, and many other members of the general population. This product was last updated in 2021.

    Release date: 2021-09-02

  • Articles and reports: 12-001-X202100100001
    Description:

    In a previous paper, we developed a model to make inference about small area proportions under selection bias in which the binary responses and the selection probabilities are correlated. This is the homogeneous nonignorable selection model; nonignorable selection means that the selection probabilities and the binary responses are correlated. The homogeneous nonignorable selection model was shown to perform better than a baseline ignorable selection model. However, one limitation of the homogeneous nonignorable selection model is that the distributions of the selection probabilities are assumed to be identical across areas. Therefore, we introduce a more general model, the heterogeneous nonignorable selection model, in which the selection probabilities are not identically distributed over areas. We used Markov chain Monte Carlo methods to fit the three models. We illustrate our methodology and compare our models using an example on severe activity limitation of the U.S. National Health Interview Survey. We also perform a simulation study to demonstrate that our heterogeneous nonignorable selection model is needed when there is moderate to strong selection bias.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100002
    Description:

    We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty about its parameters through a prior distribution. The method is illustrated with a real dataset, yielding satisfactory results. As a baseline, we use the strategy that couples probability proportional-to-size sampling with the difference estimator, as it is known to be optimal when the superpopulation model is fully known. We show that, even under moderate misspecifications of the model, this strategy is not robust and can be outperformed by some alternatives.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100003
    Description:

    One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100004
    Description:

    Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100005
    Description:

    Bayesian pooling strategies are used to solve precision problems related to statistical analyses of data from small areas. In such cases, the subpopulation samples are usually small, even though the population might not be. As an alternative, similar data can be pooled in order to reduce the number of parameters in the model. Many surveys consist of categorical data on each area, collected into a contingency table. We consider hierarchical Bayesian pooling models with a Dirichlet process prior for analyzing categorical data based on small areas. However, the prior used to pool such data frequently results in an overshrinkage problem. To mitigate for this problem, the parameters are separated into global and local effects. This study focuses on data pooling using a Dirichlet process prior. We compare the pooling models using bone mineral density (BMD) data taken from the Third National Health and Nutrition Examination Survey for the period 1988 to 1994 in the United States. Our analyses of the BMD data are performed using a Gibbs sampler and slice sampling to carry out the posterior computations.

    Release date: 2021-06-24
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (16)

Analysis (16) (0 to 10 of 16 results)

  • Articles and reports: 11-633-X2021008
    Description:

    The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years. The IMDB includes Immigration, Refugees and Citizenship Canada (IRCC) administrative records which contain exhaustive information about immigrants who were admitted to Canada since 1952. It also includes data about non-permanent residents who have been issued temporary resident permits since 1980. This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.

    Release date: 2021-12-06

  • Articles and reports: 11-633-X2021007
    Description:

    Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

    Release date: 2021-11-16

  • Articles and reports: 75F0002M2021007
    Description:

    This discussion paper describes the proposed methodology for a Northern Market Basket Measure (MBM-N) for Yukon and the Northwest Territories, as well as identifies research which could be conducted in preparation for the 2023 review. The paper presents initial MBM-N thresholds and provides preliminary poverty estimates for reference years 2018 and 2019. A review period will follow the release of this paper, during which time Statistics Canada and Employment and Social Development Canada will welcome feedback from interested parties and work with experts, stakeholders, indigenous organizations, federal, provincial and territorial officials to validate the results.

    Release date: 2021-11-12

  • Articles and reports: 13-604-M2021001
    Description:

    This documentation outlines the methodology used to develop the Distributions of household economic accounts published in September 2021 for the reference years 2010 to 2020. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

    Release date: 2021-09-07

  • Articles and reports: 12-001-X202100100001
    Description:

    In a previous paper, we developed a model to make inference about small area proportions under selection bias in which the binary responses and the selection probabilities are correlated. This is the homogeneous nonignorable selection model; nonignorable selection means that the selection probabilities and the binary responses are correlated. The homogeneous nonignorable selection model was shown to perform better than a baseline ignorable selection model. However, one limitation of the homogeneous nonignorable selection model is that the distributions of the selection probabilities are assumed to be identical across areas. Therefore, we introduce a more general model, the heterogeneous nonignorable selection model, in which the selection probabilities are not identically distributed over areas. We used Markov chain Monte Carlo methods to fit the three models. We illustrate our methodology and compare our models using an example on severe activity limitation of the U.S. National Health Interview Survey. We also perform a simulation study to demonstrate that our heterogeneous nonignorable selection model is needed when there is moderate to strong selection bias.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100002
    Description:

    We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty about its parameters through a prior distribution. The method is illustrated with a real dataset, yielding satisfactory results. As a baseline, we use the strategy that couples probability proportional-to-size sampling with the difference estimator, as it is known to be optimal when the superpopulation model is fully known. We show that, even under moderate misspecifications of the model, this strategy is not robust and can be outperformed by some alternatives.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100003
    Description:

    One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100004
    Description:

    Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100005
    Description:

    Bayesian pooling strategies are used to solve precision problems related to statistical analyses of data from small areas. In such cases, the subpopulation samples are usually small, even though the population might not be. As an alternative, similar data can be pooled in order to reduce the number of parameters in the model. Many surveys consist of categorical data on each area, collected into a contingency table. We consider hierarchical Bayesian pooling models with a Dirichlet process prior for analyzing categorical data based on small areas. However, the prior used to pool such data frequently results in an overshrinkage problem. To mitigate for this problem, the parameters are separated into global and local effects. This study focuses on data pooling using a Dirichlet process prior. We compare the pooling models using bone mineral density (BMD) data taken from the Third National Health and Nutrition Examination Survey for the period 1988 to 1994 in the United States. Our analyses of the BMD data are performed using a Gibbs sampler and slice sampling to carry out the posterior computations.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X202100100006
    Description:

    It is now possible to manage surveys using statistical models and other tools that can be applied in real time. This paper focuses on three developments that reflect the attempt to take a more scientific approach to the management of survey field work: 1) the use of responsive and adaptive designs to reduce nonresponse bias, other sources of error, or costs; 2) optimal routing of interviewer travel to reduce costs; and 3) rapid feedback to interviewers to reduce measurement error. The article begins by reviewing experiments and simulation studies examining the effectiveness of responsive and adaptive designs. These studies suggest that these designs can produce modest gains in the representativeness of survey samples or modest cost savings, but can also backfire. The next section of the paper examines efforts to provide interviewers with a recommended route for their next trip to the field. The aim is to bring interviewers’ field work into closer alignment with research priorities while reducing travel time. However, a study testing this strategy found that interviewers often ignore such instructions. Then, the paper describes attempts to give rapid feedback to interviewers, based on automated recordings of their interviews. Interviewers often read questions in ways that affect respondents’ answers; correcting these problems quickly yielded marked improvements in data quality. All of the methods are efforts to replace the judgment of interviewers, field supervisors, and survey managers with statistical models and scientific findings.

    Release date: 2021-06-24
Reference (1)

Reference (1) ((1 result))

  • Surveys and statistical programs – Documentation: 12-004-X
    Description:

    Statistics: Power from Data! is a web resource that was created in 2001 to assist secondary students and teachers of Mathematics and Information Studies in getting the most from statistics. Over the past 20 years, this product has become one of Statistics Canada most popular references for students, teachers, and many other members of the general population. This product was last updated in 2021.

    Release date: 2021-09-02
Date modified: