Keyword search

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Results

All (13)

All (13) (0 to 10 of 13 results)

1. Methods for Constructing Life Tables for Canada, Provinces and Territories
Surveys and statistical programs – Documentation: 84-538-X
Geography: Canada
Description: This electronic publication presents the methodology underlying the production of the life tables for Canada, provinces and territories.
Release date: 2023-08-28
2. One-sided testing of population domain means in surveys
Articles and reports: 12-001-X202300100001
Description: Recent work in survey domain estimation allows for estimation of population domain means under a priori assumptions expressed in terms of linear inequality constraints. For example, it might be known that the population means are non-decreasing along ordered domains. Imposing the constraints has been shown to provide estimators with smaller variance and tighter confidence intervals. In this paper we consider a formal test of the null hypothesis that all the constraints are binding, versus the alternative that at least one constraint is non-binding. The test of constant versus increasing domain means is a special case. The power of the test is substantially better than the test with the same null hypothesis and an unconstrained alternative. The new test is used with data from the National Survey of College Graduates, to show that salaries are positively related to the subject’s father’s educational level, across fields of study and over several years of cohorts.
Release date: 2023-06-30
3. Linking the Canadian Immigrant Landing File to Hospital Data: A New Data Source for Immigrant Health Research Archived
Articles and reports: 11-633-X2016002
Description:
Immigrants comprise an ever-increasing percentage of the Canadian population—at more than 20%, which is the highest percentage among the G8 countries (Statistics Canada 2013a). This figure is expected to rise to 25% to 28% by 2031, when at least one in four people living in Canada will be foreign-born (Statistics Canada 2010).
This report summarizes the linkage of the Immigrant Landing File (ILF) for all provinces and territories, excluding Quebec, to hospital data from the Discharge Abstract Database (DAD), a national database containing information about hospital inpatient and day-surgery events. A deterministic exact-matching approach was used to link data from the 1980-to-2006 ILF and from the DAD (2006/2007, 2007/2008 and 2008/2009) with the 2006 Census, which served as a “bridge” file. This was a secondary linkage in that it used linkage keys created in two previous projects (primary linkages) that separately linked the ILF and the DAD to the 2006 Census. The ILF–DAD linked data were validated by means of a representative sample of 2006 Census records containing immigrant information previously linked to the DAD.
Release date: 2016-08-17
4. The feasibility of adding treatment data to the Canadian Cancer Registry using record linkage Archived
Articles and reports: 82-622-X2015009
Description:
The Canadian Cancer Registry (CCR) represents a collaborative effort between Statistics Canada and the thirteen provincial and territorial cancer registries to create a single database to report annually on cancer incidence and survival at the national and jurisdictional level. While gains have been made to ensure high quality, standardized, and comparable data, the CCR currently lacks information on cancer treatment. The Canadian Council of Cancer Registries (CCCR) identified the need to capture treatment data at the national level as a key strategic priority for 2013/2014. Record linkage was identified as one possible approach to fill this information gap.
The purpose of this study is to examine the feasibility of using record linkage to add cancer treatment information for selected cancers: breast, colorectal and prostate. The objectives are twofold: to assess the quality of the linkage processes and the validity of using linked data to estimate cancer treatment rates at the provincial level. The study is based on the Canadian Cancer Registry (2005 to 2008) linked to the Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) for four provinces (Ontario, Manitoba, Nova Scotia and Prince Edward Island). The linkage was proposed by Statistics Canada, the CCCR and the Canadian Institute for Health Information (CIHI). The linkage was approved and conducted at Statistics Canada.
Release date: 2015-11-23
5. Constructing Provincial Time Series: A Discussion of Data Sources and Methods Archived
Articles and reports: 13-604-M2015077
Description:
This new dataset increases the information available for comparing the performance of provinces and territories across a range of measures. It combines often fragmented provincial time series data that, as such, are of limited utility for examining the evolution of provincial economies over extended periods. More advanced statistical methods, and models with greater breadth and depth, are difficult to apply to existing fragmented Canadian data. The longitudinal nature of the new provincial dataset remedies this shortcoming. This report explains the construction of the latest vintage of the dataset. The dataset contains the most up-to-date information available.
Release date: 2015-02-12
6. Testing for Provincial Industrial Structural Change through the 2000s Archived
Articles and reports: 11F0027M2014092
Geography: Province or territory
Description:
Using data from the Provincial KLEMS database, this paper asks whether provincial economies have undergone structural change in their business sectors since 2000. It does so by applying a measure of industrial change (the dissimilarity index) using measures of output (real GDP) and hours worked. The paper also develops a statistical methodology to test whether the shifts in the industrial composition of output and hours worked over the period are due to random year-over-year changes in industrial structure or long-term systematic change in the structure of provincial economies. The paper is designed to inform discussion and analysis of recent changes in industrial composition at the national level, notably, the decline in manufacturing output and the concomitant rise of resource industries, and the implications of this change for provincial economies.
Release date: 2014-05-07
7. Historical Data Linkage of Tax Records on Labour and Income: The Case of the Living in Canada Survey Pilot Archived
Articles and reports: 89-648-X2013002
Geography: Canada
Description:
Data matching is a common practice used to reduce the response burden of respondents and to improve the quality of the information collected from respondents when the linkage method does not introduce bias. However, historical linkage, which consists in linking external records from previous years to the year of the initial wave of a survey, is relatively rare and, until now, had not been used at Statistics Canada. The present paper describes the method used to link the records from the Living in Canada Survey pilot to historical tax data on income and labour (T1 and T4 files). It presents the evolution of the linkage rate going back over time and compares earnings data collected from personal income tax returns with those collected from employers file. To illustrate the new possibilities of analysis offered by this type of linkage, the study concludes with an earnings profile by age and sex for different cohorts based on year of birth.
Release date: 2013-01-24
8. Variance estimation under composite imputation: The methodology behind SEVANI Archived
Articles and reports: 12-001-X201100211605
Description:
Composite imputation is often used in business surveys. The term "composite" means that more than a single imputation method is used to impute missing values for a variable of interest. The literature on variance estimation in the presence of composite imputation is rather limited. To deal with this problem, we consider an extension of the methodology developed by Särndal (1992). Our extension is quite general and easy to implement provided that linear imputation methods are used to fill in the missing values. This class of imputation methods contains linear regression imputation, donor imputation and auxiliary value imputation, sometimes called cold-deck or substitution imputation. It thus covers the most common methods used by national statistical agencies for the imputation of missing values. Our methodology has been implemented in the System for the Estimation of Variance due to Nonresponse and Imputation (SEVANI) developed at Statistics Canada. Its performance is evaluated in a simulation study.
Release date: 2011-12-21
9. Maximum likelihood estimation for contingency tables and logistic regression with incorrectly linked data Archived
Articles and reports: 12-001-X201100111444
Description:
Data linkage is the act of bringing together records that are believed to belong to the same unit (e.g., person or business) from two or more files. It is a very common way to enhance dimensions such as time and breadth or depth of detail. Data linkage is often not an error-free process and can lead to linking a pair of records that do not belong to the same unit. There is an explosion of record linkage applications, yet there has been little work on assuring the quality of analyses using such linked files. Naively treating such a linked file as if it were linked without errors will, in general, lead to biased estimates. This paper develops a maximum likelihood estimator for contingency tables and logistic regression with incorrectly linked records. The estimation technique is simple and is implemented using the well-known EM algorithm. A well known method of linking records in the present context is probabilistic data linking. The paper demonstrates the effectiveness of the proposed estimators in an empirical study which uses probabilistic data linkage.
Release date: 2011-06-29
10. The construction of stratified designs in R with the package stratification Archived
Articles and reports: 12-001-X201100111447
Description:
This paper introduces a R-package for the stratification of a survey population using a univariate stratification variable X and for the calculation of stratum sample sizes. Non iterative methods such as the cumulative root frequency method and the geometric stratum boundaries are implemented. Optimal designs, with stratum boundaries that minimize either the CV of the simple expansion estimator for a fixed sample size n or the n value for a fixed CV can be constructed. Two iterative algorithms are available to find the optimal stratum boundaries. The design can feature a user defined certainty stratum where all the units are sampled. Take-all and take-none strata can be included in the stratified design as they might lead to smaller sample sizes. The sample size calculations are based on the anticipated moments of the survey variable Y, given the stratification variable X. The package handles conditional distributions of Y given X that are either a heteroscedastic linear model, or a log-linear model. Stratum specific non-response can be accounted for in the design construction and in the sample size calculations.
Release date: 2011-06-29

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (12)

Analysis (12) (0 to 10 of 12 results)

1. One-sided testing of population domain means in surveys
Articles and reports: 12-001-X202300100001
Description: Recent work in survey domain estimation allows for estimation of population domain means under a priori assumptions expressed in terms of linear inequality constraints. For example, it might be known that the population means are non-decreasing along ordered domains. Imposing the constraints has been shown to provide estimators with smaller variance and tighter confidence intervals. In this paper we consider a formal test of the null hypothesis that all the constraints are binding, versus the alternative that at least one constraint is non-binding. The test of constant versus increasing domain means is a special case. The power of the test is substantially better than the test with the same null hypothesis and an unconstrained alternative. The new test is used with data from the National Survey of College Graduates, to show that salaries are positively related to the subject’s father’s educational level, across fields of study and over several years of cohorts.
Release date: 2023-06-30
2. Linking the Canadian Immigrant Landing File to Hospital Data: A New Data Source for Immigrant Health Research Archived
Articles and reports: 11-633-X2016002
Description:
Immigrants comprise an ever-increasing percentage of the Canadian population—at more than 20%, which is the highest percentage among the G8 countries (Statistics Canada 2013a). This figure is expected to rise to 25% to 28% by 2031, when at least one in four people living in Canada will be foreign-born (Statistics Canada 2010).
This report summarizes the linkage of the Immigrant Landing File (ILF) for all provinces and territories, excluding Quebec, to hospital data from the Discharge Abstract Database (DAD), a national database containing information about hospital inpatient and day-surgery events. A deterministic exact-matching approach was used to link data from the 1980-to-2006 ILF and from the DAD (2006/2007, 2007/2008 and 2008/2009) with the 2006 Census, which served as a “bridge” file. This was a secondary linkage in that it used linkage keys created in two previous projects (primary linkages) that separately linked the ILF and the DAD to the 2006 Census. The ILF–DAD linked data were validated by means of a representative sample of 2006 Census records containing immigrant information previously linked to the DAD.
Release date: 2016-08-17
3. The feasibility of adding treatment data to the Canadian Cancer Registry using record linkage Archived
Articles and reports: 82-622-X2015009
Description:
The Canadian Cancer Registry (CCR) represents a collaborative effort between Statistics Canada and the thirteen provincial and territorial cancer registries to create a single database to report annually on cancer incidence and survival at the national and jurisdictional level. While gains have been made to ensure high quality, standardized, and comparable data, the CCR currently lacks information on cancer treatment. The Canadian Council of Cancer Registries (CCCR) identified the need to capture treatment data at the national level as a key strategic priority for 2013/2014. Record linkage was identified as one possible approach to fill this information gap.
The purpose of this study is to examine the feasibility of using record linkage to add cancer treatment information for selected cancers: breast, colorectal and prostate. The objectives are twofold: to assess the quality of the linkage processes and the validity of using linked data to estimate cancer treatment rates at the provincial level. The study is based on the Canadian Cancer Registry (2005 to 2008) linked to the Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) for four provinces (Ontario, Manitoba, Nova Scotia and Prince Edward Island). The linkage was proposed by Statistics Canada, the CCCR and the Canadian Institute for Health Information (CIHI). The linkage was approved and conducted at Statistics Canada.
Release date: 2015-11-23
4. Constructing Provincial Time Series: A Discussion of Data Sources and Methods Archived
Articles and reports: 13-604-M2015077
Description:
This new dataset increases the information available for comparing the performance of provinces and territories across a range of measures. It combines often fragmented provincial time series data that, as such, are of limited utility for examining the evolution of provincial economies over extended periods. More advanced statistical methods, and models with greater breadth and depth, are difficult to apply to existing fragmented Canadian data. The longitudinal nature of the new provincial dataset remedies this shortcoming. This report explains the construction of the latest vintage of the dataset. The dataset contains the most up-to-date information available.
Release date: 2015-02-12
5. Testing for Provincial Industrial Structural Change through the 2000s Archived
Articles and reports: 11F0027M2014092
Geography: Province or territory
Description:
Using data from the Provincial KLEMS database, this paper asks whether provincial economies have undergone structural change in their business sectors since 2000. It does so by applying a measure of industrial change (the dissimilarity index) using measures of output (real GDP) and hours worked. The paper also develops a statistical methodology to test whether the shifts in the industrial composition of output and hours worked over the period are due to random year-over-year changes in industrial structure or long-term systematic change in the structure of provincial economies. The paper is designed to inform discussion and analysis of recent changes in industrial composition at the national level, notably, the decline in manufacturing output and the concomitant rise of resource industries, and the implications of this change for provincial economies.
Release date: 2014-05-07
6. Historical Data Linkage of Tax Records on Labour and Income: The Case of the Living in Canada Survey Pilot Archived
Articles and reports: 89-648-X2013002
Geography: Canada
Description:
Data matching is a common practice used to reduce the response burden of respondents and to improve the quality of the information collected from respondents when the linkage method does not introduce bias. However, historical linkage, which consists in linking external records from previous years to the year of the initial wave of a survey, is relatively rare and, until now, had not been used at Statistics Canada. The present paper describes the method used to link the records from the Living in Canada Survey pilot to historical tax data on income and labour (T1 and T4 files). It presents the evolution of the linkage rate going back over time and compares earnings data collected from personal income tax returns with those collected from employers file. To illustrate the new possibilities of analysis offered by this type of linkage, the study concludes with an earnings profile by age and sex for different cohorts based on year of birth.
Release date: 2013-01-24
7. Variance estimation under composite imputation: The methodology behind SEVANI Archived
Articles and reports: 12-001-X201100211605
Description:
Composite imputation is often used in business surveys. The term "composite" means that more than a single imputation method is used to impute missing values for a variable of interest. The literature on variance estimation in the presence of composite imputation is rather limited. To deal with this problem, we consider an extension of the methodology developed by Särndal (1992). Our extension is quite general and easy to implement provided that linear imputation methods are used to fill in the missing values. This class of imputation methods contains linear regression imputation, donor imputation and auxiliary value imputation, sometimes called cold-deck or substitution imputation. It thus covers the most common methods used by national statistical agencies for the imputation of missing values. Our methodology has been implemented in the System for the Estimation of Variance due to Nonresponse and Imputation (SEVANI) developed at Statistics Canada. Its performance is evaluated in a simulation study.
Release date: 2011-12-21
8. Maximum likelihood estimation for contingency tables and logistic regression with incorrectly linked data Archived
Articles and reports: 12-001-X201100111444
Description:
Data linkage is the act of bringing together records that are believed to belong to the same unit (e.g., person or business) from two or more files. It is a very common way to enhance dimensions such as time and breadth or depth of detail. Data linkage is often not an error-free process and can lead to linking a pair of records that do not belong to the same unit. There is an explosion of record linkage applications, yet there has been little work on assuring the quality of analyses using such linked files. Naively treating such a linked file as if it were linked without errors will, in general, lead to biased estimates. This paper develops a maximum likelihood estimator for contingency tables and logistic regression with incorrectly linked records. The estimation technique is simple and is implemented using the well-known EM algorithm. A well known method of linking records in the present context is probabilistic data linking. The paper demonstrates the effectiveness of the proposed estimators in an empirical study which uses probabilistic data linkage.
Release date: 2011-06-29
9. The construction of stratified designs in R with the package stratification Archived
Articles and reports: 12-001-X201100111447
Description:
This paper introduces a R-package for the stratification of a survey population using a univariate stratification variable X and for the calculation of stratum sample sizes. Non iterative methods such as the cumulative root frequency method and the geometric stratum boundaries are implemented. Optimal designs, with stratum boundaries that minimize either the CV of the simple expansion estimator for a fixed sample size n or the n value for a fixed CV can be constructed. Two iterative algorithms are available to find the optimal stratum boundaries. The design can feature a user defined certainty stratum where all the units are sampled. Take-all and take-none strata can be included in the stratified design as they might lead to smaller sample sizes. The sample size calculations are based on the anticipated moments of the survey variable Y, given the stratification variable X. The package handles conditional distributions of Y given X that are either a heteroscedastic linear model, or a log-linear model. Stratum specific non-response can be accounted for in the design construction and in the sample size calculations.
Release date: 2011-06-29
10. On the efficiency of randomized probability proportional to size sampling Archived
Articles and reports: 12-001-X201100111450
Description:
This paper examines the efficiency of the Horvitz-Thompson estimator from a systematic probability proportional to size (PPS) sample drawn from a randomly ordered list. In particular, the efficiency is compared with that of an ordinary ratio estimator. The theoretical results are confirmed empirically with of a simulation study using Dutch data from the Producer Price Index.
Release date: 2011-06-29

Reference (1)

Reference (1) ((1 result))

1. Methods for Constructing Life Tables for Canada, Provinces and Territories
Surveys and statistical programs – Documentation: 84-538-X
Geography: Canada
Description: This electronic publication presents the methodology underlying the production of the life tables for Canada, provinces and territories.
Release date: 2023-08-28

Report a problem or mistake on this page

Date modified:: 2024-06-02