Keyword search
Filter results by
Search HelpKeyword(s)
Subject
- Business performance and ownership (1)
- Crime and justice (1)
- Education, training and learning (2)
- Families, households and marital status (2)
- Health (1)
- Housing (1)
- Income, pensions, spending and wealth (10)
- Labour (7)
- Population and demography (2)
- Science and technology (3)
- Statistical methods (72)
- Transportation (1)
Type
Year of publication
Survey or statistical program
- Census of Population (6)
- Survey of Household Spending (4)
- National Household Survey (4)
- Survey of Labour and Income Dynamics (3)
- Workplace and Employee Survey (2)
- Survey of Innovation (2)
- Uniform Crime Reporting Survey (1)
- General Social Survey - Family (1)
- National Population Health Survey: Health Institutions Component, Longitudinal (1)
Results
All (84)
All (84) (0 to 10 of 84 results)
- Journals and periodicals: 62F0026MDescription: This series provides detailed documentation on the issues, concepts, methodology, data quality and other relevant research related to household expenditures from the Survey of Household Spending, the Homeowner Repair and Renovation Survey and the Food Expenditure Survey.Release date: 2023-10-18
- Journals and periodicals: 12-206-XDescription: This report summarizes the annual achievements of the Methodology Research and Development Program (MRDP) sponsored by the Modern Statistical Methods and Data Science Branch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the agency’s statistical programs; these activities would otherwise be less likely to be carried out during the provision of regular methodology services to those programs. The MRDP also includes activities that provide support in the application of past successful developments in order to promote the use of the results of research and development work. Selected prospective research activities are also presented.Release date: 2023-10-11
- Articles and reports: 12-001-X202200100008Description:
The Multiple Imputation of Latent Classes (MILC) method combines multiple imputation and latent class analysis to correct for misclassification in combined datasets. Furthermore, MILC generates a multiply imputed dataset which can be used to estimate different statistics in a straightforward manner, ensuring that uncertainty due to misclassification is incorporated when estimating the total variance. In this paper, it is investigated how the MILC method can be adjusted to be applied for census purposes. More specifically, it is investigated how the MILC method deals with a finite and complete population register, how the MILC method can simultaneously correct misclassification in multiple latent variables and how multiple edit restrictions can be incorporated. A simulation study shows that the MILC method is in general able to reproduce cell frequencies in both low- and high-dimensional tables with low amounts of bias. In addition, variance can also be estimated appropriately, although variance is overestimated when cell frequencies are small.
Release date: 2022-06-21 - Articles and reports: 12-001-X202000100006Description:
In surveys, logical boundaries among variables or among waves of surveys make imputation of missing values complicated. We propose a new regression-based multiple imputation method to deal with survey nonresponses with two-sided logical boundaries. This imputation method automatically satisfies the boundary conditions without an additional acceptance/rejection procedure and utilizes the boundary information to derive an imputed value and to determine the suitability of the imputed value. Simulation results show that our new imputation method outperforms the existing imputation methods for both mean and quantile estimations regardless of missing rates, error distributions, and missing-mechanisms. We apply our method to impute the self-reported variable “years of smoking” in successive health screenings of Koreans.
Release date: 2020-06-30 - Surveys and statistical programs – Documentation: 12-539-XDescription:
This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.
Release date: 2019-12-04 - Articles and reports: 12-001-X201800254957Description:
When a linear imputation method is used to correct non-response based on certain assumptions, total variance can be assigned to non-responding units. Linear imputation is not as limited as it seems, given that the most common methods – ratio, donor, mean and auxiliary value imputation – are all linear imputation methods. We will discuss the inference framework and the unit-level decomposition of variance due to non-response. Simulation results will also be presented. This decomposition can be used to prioritize non-response follow-up or manual corrections, or simply to guide data analysis.
Release date: 2018-12-20 - Articles and reports: 11-633-X2017006Description:
This paper describes a method of imputing missing postal codes in a longitudinal database. The 1991 Canadian Census Health and Environment Cohort (CanCHEC), which contains information on individuals from the 1991 Census long-form questionnaire linked with T1 tax return files for the 1984-to-2011 period, is used to illustrate and validate the method. The cohort contains up to 28 consecutive fields for postal code of residence, but because of frequent gaps in postal code history, missing postal codes must be imputed. To validate the imputation method, two experiments were devised where 5% and 10% of all postal codes from a subset with full history were randomly removed and imputed.
Release date: 2017-03-13 - 8. Statistical matching using fractional imputation ArchivedArticles and reports: 12-001-X201600114539Description:
Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.
Release date: 2016-06-22 - Articles and reports: 12-001-X201500114193Description:
Imputed micro data often contain conflicting information. The situation may e.g., arise from partial imputation, where one part of the imputed record consists of the observed values of the original record and the other the imputed values. Edit-rules that involve variables from both parts of the record will often be violated. Or, inconsistency may be caused by adjustment for errors in the observed data, also referred to as imputation in Editing. Under the assumption that the remaining inconsistency is not due to systematic errors, we propose to make adjustments to the micro data such that all constraints are simultaneously satisfied and the adjustments are minimal according to a chosen distance metric. Different approaches to the distance metric are considered, as well as several extensions of the basic situation, including the treatment of categorical data, unit imputation and macro-level benchmarking. The properties and interpretations of the proposed methods are illustrated using business-economic data.
Release date: 2015-06-29 - 10. Combining information from multiple complex surveys ArchivedArticles and reports: 12-001-X201400214089Description:
This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounts for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).
Release date: 2014-12-19
- Previous Go to previous page of All results
- 1 (current) Go to page 1 of All results
- 2 Go to page 2 of All results
- 3 Go to page 3 of All results
- 4 Go to page 4 of All results
- 5 Go to page 5 of All results
- 6 Go to page 6 of All results
- 7 Go to page 7 of All results
- 8 Go to page 8 of All results
- 9 Go to page 9 of All results
- Next Go to next page of All results
Data (2)
Data (2) ((2 results))
- Public use microdata: 82M0010XDescription:
The National Population Health Survey (NPHS) program is designed to collect information related to the health of the Canadian population. The first cycle of data collection began in 1994. The institutional component includes long-term residents (expected to stay longer than six months) in health care facilities with four or more beds in Canada with the principal exclusion of the Yukon and the Northwest Teritories. The document has been produced to facilitate the manipulation of the 1996-1997 microdata file containing survey results. The main variables include: demography, health status, chronic conditions, restriction of activity, socio-demographic, and others.
Release date: 2000-08-02 - Public use microdata: 12M0010XDescription:
Cycle 10 collected data from persons 15 years and older and concentrated on the respondent's family. Topics covered include marital history, common- law unions, biological, adopted and step children, family origins, child leaving and fertility intentions.
The target population of the GSS (General Social Survey) consisted of all individuals aged 15 and over living in a private household in one of the ten provinces.
Release date: 1997-02-28
Analysis (58)
Analysis (58) (0 to 10 of 58 results)
- Journals and periodicals: 62F0026MDescription: This series provides detailed documentation on the issues, concepts, methodology, data quality and other relevant research related to household expenditures from the Survey of Household Spending, the Homeowner Repair and Renovation Survey and the Food Expenditure Survey.Release date: 2023-10-18
- Journals and periodicals: 12-206-XDescription: This report summarizes the annual achievements of the Methodology Research and Development Program (MRDP) sponsored by the Modern Statistical Methods and Data Science Branch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the agency’s statistical programs; these activities would otherwise be less likely to be carried out during the provision of regular methodology services to those programs. The MRDP also includes activities that provide support in the application of past successful developments in order to promote the use of the results of research and development work. Selected prospective research activities are also presented.Release date: 2023-10-11
- Articles and reports: 12-001-X202200100008Description:
The Multiple Imputation of Latent Classes (MILC) method combines multiple imputation and latent class analysis to correct for misclassification in combined datasets. Furthermore, MILC generates a multiply imputed dataset which can be used to estimate different statistics in a straightforward manner, ensuring that uncertainty due to misclassification is incorporated when estimating the total variance. In this paper, it is investigated how the MILC method can be adjusted to be applied for census purposes. More specifically, it is investigated how the MILC method deals with a finite and complete population register, how the MILC method can simultaneously correct misclassification in multiple latent variables and how multiple edit restrictions can be incorporated. A simulation study shows that the MILC method is in general able to reproduce cell frequencies in both low- and high-dimensional tables with low amounts of bias. In addition, variance can also be estimated appropriately, although variance is overestimated when cell frequencies are small.
Release date: 2022-06-21 - Articles and reports: 12-001-X202000100006Description:
In surveys, logical boundaries among variables or among waves of surveys make imputation of missing values complicated. We propose a new regression-based multiple imputation method to deal with survey nonresponses with two-sided logical boundaries. This imputation method automatically satisfies the boundary conditions without an additional acceptance/rejection procedure and utilizes the boundary information to derive an imputed value and to determine the suitability of the imputed value. Simulation results show that our new imputation method outperforms the existing imputation methods for both mean and quantile estimations regardless of missing rates, error distributions, and missing-mechanisms. We apply our method to impute the self-reported variable “years of smoking” in successive health screenings of Koreans.
Release date: 2020-06-30 - Articles and reports: 12-001-X201800254957Description:
When a linear imputation method is used to correct non-response based on certain assumptions, total variance can be assigned to non-responding units. Linear imputation is not as limited as it seems, given that the most common methods – ratio, donor, mean and auxiliary value imputation – are all linear imputation methods. We will discuss the inference framework and the unit-level decomposition of variance due to non-response. Simulation results will also be presented. This decomposition can be used to prioritize non-response follow-up or manual corrections, or simply to guide data analysis.
Release date: 2018-12-20 - Articles and reports: 11-633-X2017006Description:
This paper describes a method of imputing missing postal codes in a longitudinal database. The 1991 Canadian Census Health and Environment Cohort (CanCHEC), which contains information on individuals from the 1991 Census long-form questionnaire linked with T1 tax return files for the 1984-to-2011 period, is used to illustrate and validate the method. The cohort contains up to 28 consecutive fields for postal code of residence, but because of frequent gaps in postal code history, missing postal codes must be imputed. To validate the imputation method, two experiments were devised where 5% and 10% of all postal codes from a subset with full history were randomly removed and imputed.
Release date: 2017-03-13 - 7. Statistical matching using fractional imputation ArchivedArticles and reports: 12-001-X201600114539Description:
Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.
Release date: 2016-06-22 - Articles and reports: 12-001-X201500114193Description:
Imputed micro data often contain conflicting information. The situation may e.g., arise from partial imputation, where one part of the imputed record consists of the observed values of the original record and the other the imputed values. Edit-rules that involve variables from both parts of the record will often be violated. Or, inconsistency may be caused by adjustment for errors in the observed data, also referred to as imputation in Editing. Under the assumption that the remaining inconsistency is not due to systematic errors, we propose to make adjustments to the micro data such that all constraints are simultaneously satisfied and the adjustments are minimal according to a chosen distance metric. Different approaches to the distance metric are considered, as well as several extensions of the basic situation, including the treatment of categorical data, unit imputation and macro-level benchmarking. The properties and interpretations of the proposed methods are illustrated using business-economic data.
Release date: 2015-06-29 - Articles and reports: 12-001-X201400214089Description:
This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounts for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).
Release date: 2014-12-19 - 10. Fractional hot deck imputation for robust inference under item nonresponse in survey sampling ArchivedArticles and reports: 12-001-X201400214091Description:
Parametric fractional imputation (PFI), proposed by Kim (2011), is a tool for general purpose parameter estimation under missing data. We propose a fractional hot deck imputation (FHDI) which is more robust than PFI or multiple imputation. In the proposed method, the imputed values are chosen from the set of respondents and assigned proper fractional weights. The weights are then adjusted to meet certain calibration conditions, which makes the resulting FHDI estimator efficient. Two simulation studies are presented to compare the proposed method with existing methods.
Release date: 2014-12-19
- Previous Go to previous page of Analysis results
- 1 (current) Go to page 1 of Analysis results
- 2 Go to page 2 of Analysis results
- 3 Go to page 3 of Analysis results
- 4 Go to page 4 of Analysis results
- 5 Go to page 5 of Analysis results
- 6 Go to page 6 of Analysis results
- Next Go to next page of Analysis results
Reference (24)
Reference (24) (0 to 10 of 24 results)
- Surveys and statistical programs – Documentation: 12-539-XDescription:
This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.
Release date: 2019-12-04 - Surveys and statistical programs – Documentation: 99-012-X2011006Geography: CanadaDescription:
This reference guide provides information that enables users to effectively use, apply and interpret data from the 2011 National Household Survey (NHS). This guide contains definitions and explanations of concepts, classifications, data quality and comparability to other sources. Additional information is included for specific variables to help general users better understand the concepts and questions used in the NHS.
Release date: 2013-06-26 - Surveys and statistical programs – Documentation: 99-012-X2011007Description:
This reference guide provides information that enables users to effectively use, apply and interpret data from the 2011 National Household Survey (NHS). This guide contains definitions and explanations of concepts, classifications, data quality and comparability to other sources. Additional information is included for specific variables to help general users better understand the concepts and questions used in the NHS.
Release date: 2013-06-26 - Surveys and statistical programs – Documentation: 99-012-X2011008Description:
This reference guide provides information that enables users to effectively use, apply and interpret data from the 2011 National Household Survey (NHS). This guide contains definitions and explanations of concepts, classifications, data quality and comparability to other sources. Additional information is included for specific variables to help general users better understand the concepts and questions used in the NHS.
Release date: 2013-06-26 - Surveys and statistical programs – Documentation: 99-013-X2011006Description:
This reference guide provides information that enables users to effectively use, apply and interpret data from the 2011 National Household Survey (NHS). This guide contains definitions and explanations of concepts, classifications, data quality and comparability to other sources. Additional information is included for specific variables to help general users better understand the concepts and questions used in the NHS.
Release date: 2013-06-26 - Surveys and statistical programs – Documentation: 62F0026M2010004Description:
This report describes the quality indicators produced for the 2007 Survey of Household Spending. These quality indicators, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates, help users interpret the survey data.
Release date: 2010-12-13 - Surveys and statistical programs – Documentation: 62F0026M2010005Description:
This report describes the quality indicators produced for the 2008 Survey of Household Spending. These quality indicators, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates, help users interpret the survey data.
Release date: 2010-12-13 - Surveys and statistical programs – Documentation: 75F0002M2008005Description:
The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes. Sample surveys are subject to sampling errors. In order to consider these errors, each estimates presented in the "Income Trends in Canada" series comes with a quality indicator based on the coefficient of variation. However, other factors must also be considered to make sure data are properly used. Statistics Canada puts considerable time and effort to control errors at every stage of the survey and to maximise the fitness for use. Nevertheless, the survey design and the data processing could restrict the fitness for use. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes the set of quality measures of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.
Release date: 2008-08-20 - Surveys and statistical programs – Documentation: 92-393-XDescription:
This report is a brief guide to users of census income data. It provides a general description of the various 2001 Census phases, from data collection, through processing for non-response, to dissemination. Descriptions of, and summary data on, the changes to income data that occurred during the processing stages are given. Comparative data from national accounts and tax data sources at a highly aggregated level are also presented to put the quality of the 2001 Census income data into perspective. For users wishing to compare census income data over time, changes in income content and universe coverage over the years are explained. Finally, a complete description of all census products containing income data is also supplied.
Release date: 2004-09-16 - Surveys and statistical programs – Documentation: 92-390-XDescription:
This report includes a definition of the 2001 place of work concept and the place of work geography, standard text on data collection and coverage (including data collection methods, special coverage studies, sampling and weighting, edit and follow-up, coverage and content considerations). Both standard and subject-matter specific text pieces are also included for data assimilation (automated as well as interactive coding), edit and imputation and data evaluation. Finally, this technical report includes a section on historical comparability.
Release date: 2004-08-26
- Date modified: