Analysis

A century of trust, safety, facts

Statistics Canada's Trust Centre: Learn how Statistics Canada keeps your data safe and protects your privacy.

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

52 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (1,558)

All (1,558) (0 to 10 of 1,558 results)

  • Articles and reports: 11-633-X2020001
    Description:

    This paper reviews alternative measures of income mixing within geographic units and applies them using geographically detailed income data derived from tax records. It highlights the characteristics of these measures, particularly their ease of interpretation and their suitability to decomposition across different levels of analysis, from neighbourhoods to individual apartment buildings. The discussion focuses on three measures: the dissimilarity index, the information theory index and the divergence index (D-index). Particular emphasis is placed on the D-index because it most effectively describes how income distributions at the sub-metropolitan level (e.g., neighbourhoods) differ from distributions at the metropolitan level (i.e., how much income sorting occurs across neighbourhoods). Furthermore, the D-index can consistently measure the contributions of income sorting within neighbourhoods (e.g., across individual apartment buildings) to the degree of income mixing at the neighbourhood and metropolitan scales.

    Release date: 2020-01-21

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2020-01-21

  • Stats in brief: 11-631-X2020001
    Description:

    This booklet provides a snapshot of data offered by Statistics Canada.

    Release date: 2020-01-16

  • Journals and periodicals: 75F0002M
    Description:

    This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.

    Release date: 2020-01-15

  • Articles and reports: 82-003-X201901200003
    Description:

    This article provides a description of the Canadian Census Health and Environment Cohorts (CanCHECs), a population-based linked datasets of the household population at the time of census collection. The CanCHEC datasets are rich national data resources that can be used to measure and examine health inequalities across socioeconomic and ethnocultural dimensions for different periods and locations. These datasets can also be used to examine the effects of exposure to environmental factors on human health.

    Release date: 2019-12-18

  • Stats in brief: 11-629-X2019006
    Description:

    This video describes a new health surveillance program at Statistics Canada: The Canadian Census Health and Environment Cohorts (CanCHECs). The video describes the attributes of and the datasets included in the CanCHECs, how the CanCHECs can be used, and their strengths and limitations. Recent examples of research projects based on the CanCHECs are presented along with information about how to apply for access to these data.

    Release date: 2019-12-18

  • Articles and reports: 12-001-X201900300001
    Description:

    Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300002
    Description:

    Paradata is often collected during the survey process to monitor the quality of the survey response. One such paradata is a respondent behavior, which can be used to construct response models. The propensity score weight using the respondent behavior information can be applied to the final analysis to reduce the nonresponse bias. However, including the surrogate variable in the propensity score weighting does not always guarantee the efficiency gain. We show that the surrogate variable is useful only when it is correlated with the study variable. Results from a limited simulation study confirm the finding. A real data application using the Korean Workplace Panel Survey data is also presented.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300003
    Description:

    The widely used formulas for the variance of the ratio estimator may lead to serious underestimates when the sample size is small; see Sukhatme (1954), Koop (1968), Rao (1969), and Cochran (1977, pages 163-164). In order to solve this classical problem, we propose in this paper new estimators for the variance and the mean square error of the ratio estimator that do not suffer from such a large negative bias. Similar estimation formulas can be derived for alternative ratio estimators as discussed in Tin (1965). We compare three mean square error estimators for the ratio estimator in a simulation study.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17
Stats in brief (35)

Stats in brief (35) (0 to 10 of 35 results)

Articles and reports (1,497)

Articles and reports (1,497) (0 to 10 of 1,497 results)

  • Articles and reports: 11-633-X2020001
    Description:

    This paper reviews alternative measures of income mixing within geographic units and applies them using geographically detailed income data derived from tax records. It highlights the characteristics of these measures, particularly their ease of interpretation and their suitability to decomposition across different levels of analysis, from neighbourhoods to individual apartment buildings. The discussion focuses on three measures: the dissimilarity index, the information theory index and the divergence index (D-index). Particular emphasis is placed on the D-index because it most effectively describes how income distributions at the sub-metropolitan level (e.g., neighbourhoods) differ from distributions at the metropolitan level (i.e., how much income sorting occurs across neighbourhoods). Furthermore, the D-index can consistently measure the contributions of income sorting within neighbourhoods (e.g., across individual apartment buildings) to the degree of income mixing at the neighbourhood and metropolitan scales.

    Release date: 2020-01-21

  • Articles and reports: 82-003-X201901200003
    Description:

    This article provides a description of the Canadian Census Health and Environment Cohorts (CanCHECs), a population-based linked datasets of the household population at the time of census collection. The CanCHEC datasets are rich national data resources that can be used to measure and examine health inequalities across socioeconomic and ethnocultural dimensions for different periods and locations. These datasets can also be used to examine the effects of exposure to environmental factors on human health.

    Release date: 2019-12-18

  • Articles and reports: 12-001-X201900300001
    Description:

    Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300002
    Description:

    Paradata is often collected during the survey process to monitor the quality of the survey response. One such paradata is a respondent behavior, which can be used to construct response models. The propensity score weight using the respondent behavior information can be applied to the final analysis to reduce the nonresponse bias. However, including the surrogate variable in the propensity score weighting does not always guarantee the efficiency gain. We show that the surrogate variable is useful only when it is correlated with the study variable. Results from a limited simulation study confirm the finding. A real data application using the Korean Workplace Panel Survey data is also presented.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300003
    Description:

    The widely used formulas for the variance of the ratio estimator may lead to serious underestimates when the sample size is small; see Sukhatme (1954), Koop (1968), Rao (1969), and Cochran (1977, pages 163-164). In order to solve this classical problem, we propose in this paper new estimators for the variance and the mean square error of the ratio estimator that do not suffer from such a large negative bias. Similar estimation formulas can be derived for alternative ratio estimators as discussed in Tin (1965). We compare three mean square error estimators for the ratio estimator in a simulation study.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300005
    Description:

    Monthly estimates of provincial unemployment based on the Dutch Labour Force Survey (LFS) are obtained using time series models. The models account for rotation group bias and serial correlation due to the rotating panel design of the LFS. This paper compares two approaches of estimating structural time series models (STM). In the first approach STMs are expressed as state space models, fitted using a Kalman filter and smoother in a frequentist framework. As an alternative, these STMs are expressed as time series multilevel models in an hierarchical Bayesian framework, and estimated using a Gibbs sampler. Monthly unemployment estimates and standard errors based on these models are compared for the twelve provinces of the Netherlands. Pros and cons of the multilevel approach and state space approach are discussed. Multivariate STMs are appropriate to borrow strength over time and space. Modeling the full correlation matrix between time series components rapidly increases the numbers of hyperparameters to be estimated. Modeling common factors is one possibility to obtain more parsimonious models that still account for cross-sectional correlation. In this paper an even more parsimonious approach is proposed, where domains share one overall trend, and have their own independent trends for the domain-specific deviations from this overall trend. The time series modeling approach is particularly appropriate to estimate month-to-month change of unemployment.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300006
    Description:

    High nonresponse is a very common problem in sample surveys today. In statistical terms we are worried about increased bias and variance of estimators for population quantities such as totals or means. Different methods have been suggested in order to compensate for this phenomenon. We can roughly divide them into imputation and calibration and it is the latter approach we will focus on here. A wide spectrum of possibilities is included in the class of calibration estimators. We explore linear calibration, where we suggest using a nonresponse version of the design-based optimal regression estimator. Comparisons are made between this estimator and a GREG type estimator. Distance measures play a very important part in the construction of calibration estimators. We show that an estimator of the average response propensity (probability) can be included in the “optimal” distance measure under nonresponse, which will help to reduce the bias of the resulting estimator. To illustrate empirically the theoretically derived results for the suggested estimators, a simulation study has been carried out. The population is called KYBOK and consists of clerical municipalities in Sweden, where the variables include financial as well as size measurements. The results are encouraging for the “optimal” estimator in combination with the estimated average response propensity, where the bias was reduced for most of the Poisson sampling cases in the study.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300007
    Description:

    Finding the optimal stratification and sample size in univariate and multivariate sample design is hard when the population frame is large. There are alternative ways of modelling and solving this problem, and one of the most natural uses genetic algorithms (GA) combined with the Bethel-Chromy evaluation algorithm. The GA iteratively searches for the minimum sample size necessary to meet precision constraints in partitionings of atomic strata created by the Cartesian product of auxiliary variables. We point out a drawback with classical GAs when applied to the grouping problem, and propose a new GA approach using “grouping” genetic operators instead of traditional operators. Experiments show a significant improvement in solution quality for similar computational effort.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300008
    Description:

    Dual frame surveys are useful when no single frame with adequate coverage exists. However estimators from dual frame designs require knowledge of the frame memberships of each sampled unit. When this information is not available from the frame itself, it is often collected from the respondent. When respondents provide incorrect membership information, the resulting estimators of means or totals can be biased. A method for reducing this bias, using accurate membership information obtained about a subsample of respondents, is proposed. The properties of the new estimator are examined and compared to alternative estimators. The proposed estimator is applied to the data from the motivating example, which was a recreational angler survey, using an address frame and an incomplete fishing license frame.

    Release date: 2019-12-17
Journals and periodicals (26)

Journals and periodicals (26) (0 to 10 of 26 results)

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2020-01-21

  • Journals and periodicals: 75F0002M
    Description:

    This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.

    Release date: 2020-01-15

  • Journals and periodicals: 12-001-X
    Geography: Canada
    Description:

    The journal publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves.

    Release date: 2019-12-17

  • Journals and periodicals: 12-206-X
    Description:

    This report summarizes the achievements program sponsored by the three methodology divisions of Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the Agency's survey programs, which would not otherwise have been carried out during the provision of methodology services to those survey programs. They also include tasks that provided client support in the application of past successful developments in order to promote the utilization of the results of research and development work.

    Release date: 2019-11-15

  • Journals and periodicals: 92F0138M
    Description:

    The Geography working paper series is intended to stimulate discussion on a variety of topics covering conceptual, methodological or technical work to support the development and dissemination of the division's data, products and services. Readers of the series are encouraged to contact the Geography Division with comments and suggestions.

    Release date: 2019-11-13

  • Journals and periodicals: 11-633-X2019003
    Description:

    This report provides an overview of the definitions and competency frameworks of data literacy, as well as the assessment tools used to measure it. These are based on the existing literature and current practices around the world. Data literacy, or the ability to derive meaningful information from data, is a relatively new concept. However, it is gaining increasing recognition as a vital skillset in the information age. Existing approaches to measuring data literacy—from self-assessment tools to objective measures, and from individual to organizational assessments—are discussed in this report to inform the development of an assessment tool for data literacy in the Canadian public service.

    Release date: 2019-08-14

  • Journals and periodicals: 89-20-0001
    Description:

    Historical works allow readers to peer into the past, not only to satisfy our curiosity about “the way things were,” but also to see how far we’ve come, and to learn from the past. For Statistics Canada, such works are also opportunities to commemorate the agency’s contributions to Canada and its people, and serve as a reminder that an institution such as this continues to evolve each and every day.

    On the occasion of Statistics Canada’s 100th anniversary in 2018, Standing on the shoulders of giants: History of Statistics Canada: 1970 to 2008, builds on the work of two significant publications on the history of the agency, picking up the story in 1970 and carrying it through the next 36 years, until 2008. To that end, when enough time has passed to allow for sufficient objectivity, it will again be time to document the agency’s next chapter as it continues to tell Canada’s story in numbers.

    Release date: 2018-12-03

  • Journals and periodicals: 12-605-X
    Description:

    The Record Linkage Project Process Model (RLPPM) was developed by Statistics Canada to identify the processes and activities involved in record linkage. The RLPPM applies to linkage projects conducted at the individual and enterprise level using diverse data sources to create new data sources to meet analytical and operational needs.

    Release date: 2017-06-05

  • Journals and periodicals: 91-621-X
    Description:

    This document briefly describes Demosim, the microsimulation population projection model, how it works as well as its methods and data sources. It is a methodological complement to the analytical products produced using Demosim.

    Release date: 2017-01-25

  • Journals and periodicals: 11-634-X
    Description:

    This publication is a catalogue of strategies and mechanisms that a statistical organization should consider adopting, according to its particular context. This compendium is based on lessons learned and best practices of leadership and management of statistical agencies within the scope of Statistics Canada’s International Statistical Fellowship Program (ISFP). It contains four broad sections including, characteristics of an effective national statistical system; core management practices; improving, modernizing and finding efficiencies; and, strategies to better inform and engage key stakeholders.

    Release date: 2016-07-06
Date modified: