Keyword search

Sort Help
entries

Results

All (80)

All (80) (0 to 10 of 80 results)

  • Articles and reports: 11-633-X2021001
    Description:

    Using data from the Canadian Housing Survey, this project aimed to construct a measure of social inclusion, using indicators identified by the Canada Mortgage and Housing Corporation (CMHC), to report a social inclusion score for each geographic stratum separately for dwellings that are and are not in social and affordable housing. This project also sought to examine associations between social inclusion and a set of economic, social and health variables.

    Release date: 2021-01-05

  • Surveys and statistical programs – Documentation: 98-301-X
    Description:

    The Census Dictionary is a reference document which contains detailed definitions of Census of Population concepts, variables and geographic terms, as well as historical information.

    By referring to the Census Dictionary, both beginner and intermediate data users will gain a better understanding of the data and how to compare variables between census years.

    The Census Dictionary will be released iteratively, starting with geography definitions, with additional definitions made available based on subsequent topic releases.

    Release date: 2017-11-29

  • Surveys and statistical programs – Documentation: 91F0015M2016012
    Description:

    This article provides information on using family-related variables from the microdata files of Canada’s Census of Population. These files exist internally at Statistics Canada, in the Research Data Centres (RDCs), and as public-use microdata files (PUMFs). This article explains certain technical aspects of all three versions, including the creation of multi-level variables for analytical purposes.

    Release date: 2016-12-22

  • Articles and reports: 11F0019M2016376
    Description:

    The degree to which workers move across geographic areas in response to emerging employment opportunities or negative labour demand shocks is a key element in the adjustment process of an economy, and its ability to reach a desired allocation of resources.

    This study estimates the causal impact of real after-tax annual wages and salaries on the propensity of young men to migrate to Alberta or to accept jobs in that province while maintaining residence in their home province. To do so, it exploits the cross-provincial variation in earnings growth plausibly induced by increases in world oil prices that occurred during the 2000s.

    Release date: 2016-04-11

  • Articles and reports: 12-001-X201500214236
    Description:

    We propose a model-assisted extension of weighting design-effect measures. We develop a summary-level statistic for different variables of interest, in single-stage sampling and under calibration weight adjustments. Our proposed design effect measure captures the joint effects of a non-epsem sampling design, unequal weights produced using calibration adjustments, and the strength of the association between an analysis variable and the auxiliaries used in calibration. We compare our proposed measure to existing design effect measures in simulations using variables like those collected in establishment surveys and telephone surveys of households.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114199
    Description:

    In business surveys, it is not unusual to collect economic variables for which the distribution is highly skewed. In this context, winsorization is often used to treat the problem of influential values. This technique requires the determination of a constant that corresponds to the threshold above which large values are reduced. In this paper, we consider a method of determining the constant which involves minimizing the largest estimated conditional bias in the sample. In the context of domain estimation, we also propose a method of ensuring consistency between the domain-level winsorized estimates and the population-level winsorized estimate. The results of two simulation studies suggest that the proposed methods lead to winsorized estimators that have good bias and relative efficiency properties.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201400114002
    Description:

    We propose an approach for multiple imputation of items missing at random in large-scale surveys with exclusively categorical variables that have structural zeros. Our approach is to use mixtures of multinomial distributions as imputation engines, accounting for structural zeros by conceiving of the observed data as a truncated sample from a hypothetical population without structural zeros. This approach has several appealing features: imputations are generated from coherent, Bayesian joint models that automatically capture complex dependencies and readily scale to large numbers of variables. We outline a Gibbs sampling algorithm for implementing the approach, and we illustrate its potential with a repeated sampling study using public use census microdata from the state of New York, U.S.A.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211871
    Description:

    Regression models are routinely used in the analysis of survey data, where one common issue of interest is to identify influential factors that are associated with certain behavioral, social, or economic indices within a target population. When data are collected through complex surveys, the properties of classical variable selection approaches developed in i.i.d. non-survey settings need to be re-examined. In this paper, we derive a pseudo-likelihood-based BIC criterion for variable selection in the analysis of survey data and suggest a sample-based penalized likelihood approach for its implementation. The sampling weights are appropriately assigned to correct the biased selection result caused by the distortion between the sample and the target population. Under a joint randomization framework, we establish the consistency of the proposed selection procedure. The finite-sample performance of the approach is assessed through analysis and computer simulations based on data from the hypertension component of the 2009 Survey on Living with Chronic Diseases in Canada.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211884
    Description:

    This paper offers a solution to the problem of finding the optimal stratification of the available population frame, so as to ensure the minimization of the cost of the sample required to satisfy precision constraints on a set of different target estimates. The solution is searched by exploring the universe of all possible stratifications obtainable by cross-classifying the categorical auxiliary variables available in the frame (continuous auxiliary variables can be transformed into categorical ones by means of suitable methods). Therefore, the followed approach is multivariate with respect to both target and auxiliary variables. The proposed algorithm is based on a non deterministic evolutionary approach, making use of the genetic algorithm paradigm. The key feature of the algorithm is in considering each possible stratification as an individual subject to evolution, whose fitness is given by the cost of the associated sample required to satisfy a set of precision constraints, the cost being calculated by applying the Bethel algorithm for multivariate allocation. This optimal stratification algorithm, implemented in an R package (SamplingStrata), has been so far applied to a number of current surveys in the Italian National Institute of Statistics: the obtained results always show significant improvements in the efficiency of the samples obtained, with respect to previously adopted stratifications.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (50)

Analysis (50) (0 to 10 of 50 results)

  • Articles and reports: 11-633-X2021001
    Description:

    Using data from the Canadian Housing Survey, this project aimed to construct a measure of social inclusion, using indicators identified by the Canada Mortgage and Housing Corporation (CMHC), to report a social inclusion score for each geographic stratum separately for dwellings that are and are not in social and affordable housing. This project also sought to examine associations between social inclusion and a set of economic, social and health variables.

    Release date: 2021-01-05

  • Articles and reports: 11F0019M2016376
    Description:

    The degree to which workers move across geographic areas in response to emerging employment opportunities or negative labour demand shocks is a key element in the adjustment process of an economy, and its ability to reach a desired allocation of resources.

    This study estimates the causal impact of real after-tax annual wages and salaries on the propensity of young men to migrate to Alberta or to accept jobs in that province while maintaining residence in their home province. To do so, it exploits the cross-provincial variation in earnings growth plausibly induced by increases in world oil prices that occurred during the 2000s.

    Release date: 2016-04-11

  • Articles and reports: 12-001-X201500214236
    Description:

    We propose a model-assisted extension of weighting design-effect measures. We develop a summary-level statistic for different variables of interest, in single-stage sampling and under calibration weight adjustments. Our proposed design effect measure captures the joint effects of a non-epsem sampling design, unequal weights produced using calibration adjustments, and the strength of the association between an analysis variable and the auxiliaries used in calibration. We compare our proposed measure to existing design effect measures in simulations using variables like those collected in establishment surveys and telephone surveys of households.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114199
    Description:

    In business surveys, it is not unusual to collect economic variables for which the distribution is highly skewed. In this context, winsorization is often used to treat the problem of influential values. This technique requires the determination of a constant that corresponds to the threshold above which large values are reduced. In this paper, we consider a method of determining the constant which involves minimizing the largest estimated conditional bias in the sample. In the context of domain estimation, we also propose a method of ensuring consistency between the domain-level winsorized estimates and the population-level winsorized estimate. The results of two simulation studies suggest that the proposed methods lead to winsorized estimators that have good bias and relative efficiency properties.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201400114002
    Description:

    We propose an approach for multiple imputation of items missing at random in large-scale surveys with exclusively categorical variables that have structural zeros. Our approach is to use mixtures of multinomial distributions as imputation engines, accounting for structural zeros by conceiving of the observed data as a truncated sample from a hypothetical population without structural zeros. This approach has several appealing features: imputations are generated from coherent, Bayesian joint models that automatically capture complex dependencies and readily scale to large numbers of variables. We outline a Gibbs sampling algorithm for implementing the approach, and we illustrate its potential with a repeated sampling study using public use census microdata from the state of New York, U.S.A.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211871
    Description:

    Regression models are routinely used in the analysis of survey data, where one common issue of interest is to identify influential factors that are associated with certain behavioral, social, or economic indices within a target population. When data are collected through complex surveys, the properties of classical variable selection approaches developed in i.i.d. non-survey settings need to be re-examined. In this paper, we derive a pseudo-likelihood-based BIC criterion for variable selection in the analysis of survey data and suggest a sample-based penalized likelihood approach for its implementation. The sampling weights are appropriately assigned to correct the biased selection result caused by the distortion between the sample and the target population. Under a joint randomization framework, we establish the consistency of the proposed selection procedure. The finite-sample performance of the approach is assessed through analysis and computer simulations based on data from the hypertension component of the 2009 Survey on Living with Chronic Diseases in Canada.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211884
    Description:

    This paper offers a solution to the problem of finding the optimal stratification of the available population frame, so as to ensure the minimization of the cost of the sample required to satisfy precision constraints on a set of different target estimates. The solution is searched by exploring the universe of all possible stratifications obtainable by cross-classifying the categorical auxiliary variables available in the frame (continuous auxiliary variables can be transformed into categorical ones by means of suitable methods). Therefore, the followed approach is multivariate with respect to both target and auxiliary variables. The proposed algorithm is based on a non deterministic evolutionary approach, making use of the genetic algorithm paradigm. The key feature of the algorithm is in considering each possible stratification as an individual subject to evolution, whose fitness is given by the cost of the associated sample required to satisfy a set of precision constraints, the cost being calculated by applying the Bethel algorithm for multivariate allocation. This optimal stratification algorithm, implemented in an R package (SamplingStrata), has been so far applied to a number of current surveys in the Italian National Institute of Statistics: the obtained results always show significant improvements in the efficiency of the samples obtained, with respect to previously adopted stratifications.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300111823
    Description:

    Although weights are widely used in survey sampling their ultimate justification from the design perspective is often problematical. Here we will argue for a stepwise Bayes justification for weights that does not depend explicitly on the sampling design. This approach will make use of the standard kind of information present in auxiliary variables however it will not assume a model relating the auxiliary variables to the characteristic of interest. The resulting weight for a unit in the sample can be given the usual interpretation as the number of units in the population which it represents.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111828
    Description:

    A question that commonly arises in longitudinal surveys is the issue of how to combine differing cohorts of the survey. In this paper we present a novel method for combining different cohorts, and using all available data, in a longitudinal survey to estimate parameters of a semiparametric model, which relates the response variable to a set of covariates. The procedure builds upon the Weighted Generalized Estimation Equation method for handling missing waves in longitudinal studies. Our method is set up under a joint-randomization framework for estimation of model parameters, which takes into account the superpopulation model as well as the survey design randomization. We also propose a design-based, and a joint-randomization, variance estimation method. To illustrate the methodology we apply it to the Survey of Doctorate Recipients, conducted by the U.S. National Science Foundation.

    Release date: 2013-06-28
Reference (30)

Reference (30) (0 to 10 of 30 results)

  • Surveys and statistical programs – Documentation: 98-301-X
    Description:

    The Census Dictionary is a reference document which contains detailed definitions of Census of Population concepts, variables and geographic terms, as well as historical information.

    By referring to the Census Dictionary, both beginner and intermediate data users will gain a better understanding of the data and how to compare variables between census years.

    The Census Dictionary will be released iteratively, starting with geography definitions, with additional definitions made available based on subsequent topic releases.

    Release date: 2017-11-29

  • Surveys and statistical programs – Documentation: 91F0015M2016012
    Description:

    This article provides information on using family-related variables from the microdata files of Canada’s Census of Population. These files exist internally at Statistics Canada, in the Research Data Centres (RDCs), and as public-use microdata files (PUMFs). This article explains certain technical aspects of all three versions, including the creation of multi-level variables for analytical purposes.

    Release date: 2016-12-22

  • Surveys and statistical programs – Documentation: 99-012-X2011006
    Geography: Canada
    Description:

    This reference guide provides information that enables users to effectively use, apply and interpret data from the 2011 National Household Survey (NHS). This guide contains definitions and explanations of concepts, classifications, data quality and comparability to other sources. Additional information is included for specific variables to help general users better understand the concepts and questions used in the NHS.

    Release date: 2013-06-26

  • Surveys and statistical programs – Documentation: 99-012-X2011007
    Description:

    This reference guide provides information that enables users to effectively use, apply and interpret data from the 2011 National Household Survey (NHS). This guide contains definitions and explanations of concepts, classifications, data quality and comparability to other sources. Additional information is included for specific variables to help general users better understand the concepts and questions used in the NHS.

    Release date: 2013-06-26

  • Surveys and statistical programs – Documentation: 99-012-X2011008
    Description:

    This reference guide provides information that enables users to effectively use, apply and interpret data from the 2011 National Household Survey (NHS). This guide contains definitions and explanations of concepts, classifications, data quality and comparability to other sources. Additional information is included for specific variables to help general users better understand the concepts and questions used in the NHS.

    Release date: 2013-06-26

  • Surveys and statistical programs – Documentation: 71-221-X
    Description:

    This electronic product provides information on all Workplace and Employee Survey (WES) variables, descriptions and response categories, as well as range of values. Starting with content themes, information is accessible through a hierarchical fashion, quickly guiding data users to variables of interest.

    Release date: 2007-05-17

  • Surveys and statistical programs – Documentation: 12-002-X20040027035
    Description:

    As part of the processing of the National Longitudinal Survey of Children and Youth (NLSCY) cycle 4 data, historical revisions have been made to the data of the first 3 cycles, either to correct errors or to update the data. During processing, particular attention was given to the PERSRUK (Person Identifier) and the FIELDRUK (Household Identifier). The same level of attention has not been given to the other identifiers that are included in the data base, the CHILDID (Person identifier) and the _IDHD01 (Household identifier). These identifiers have been created for the public files and can also be found in the master files by default. The PERSRUK should be used to link records between files and the FIELDRUK to determine the household when using the master files.

    Release date: 2004-10-05

  • Surveys and statistical programs – Documentation: 92-379-X
    Description:

    The 2001 Census Handbook is a reference tool covering every aspect of the 2001 Census of Population and Census of Agriculture. It provides an overview of every phase of the census, from content determination to data dissemination. It traces the history of the census from the early days of New France to the present. It also contains information about the protection of confidential information in census questions and variables, along with information about data quality and the possible uses of census data. Also covered are census geography and the range of products and services available from the 2001 Census database.

    This series includes six general reference products: Preview of Products and Services, Census Dictionary, Catalogue, Standard Products Stubsets, Census Handbook and Technical Reports.

    Release date: 2002-08-06

  • Surveys and statistical programs – Documentation: 85-602-X
    Description:

    The purpose of this report is to provide an overview of existing methods and techniques making use of personal identifiers to support record linkage. Record linkage can be loosely defined as a methodology for manipulating and / or transforming personal identifiers from individual data records from one or more operational databases and subsequently attempting to match these personal identifiers to create a composite record about an individual. Record linkage is not intended to uniquely identify individuals for operational purposes; however, it does provide probabilistic matches of varying degrees of reliability for use in statistical reporting. Techniques employed in record linkage may also be of use for investigative purposes to help narrow the field of search against existing databases when some form of personal identification information exists.

    Release date: 2000-12-05

  • Surveys and statistical programs – Documentation: 75F0002M2000010
    Description:

    This report explains the concept of income and provides definitions of the various sources of income and derived income variables. It also documents the various aspects of the census that can have an impact on census income estimates.

    Release date: 2000-07-26
Date modified: