Statistical techniques
Filter results by
Search HelpKeyword(s)
Type
Survey or statistical program
- Labour Force Survey (7)
- Census of Population (5)
- Canadian Community Health Survey - Annual Component (4)
- Canadian Income Survey (4)
- Survey of Household Spending (3)
- Gross Domestic Product by Industry - National (Monthly) (2)
- Monthly Oil and Other Liquid Petroleum Products Pipeline Survey (2)
- Vital Statistics - Death Database (2)
- Uniform Crime Reporting Survey (2)
- Households and the Environment Survey (2)
- Annual Income Estimates for Census Families and Individuals (T1 Family File) (2)
- Biennial Drinking Water Plants Survey (2)
- Waste Management Industry Survey: Government Sector (1)
- National Balance Sheet Accounts (1)
- National Gross Domestic Product by Income and by Expenditure Accounts (1)
- Biennial Waste Management Survey (1)
- Monthly Electricity Supply and Disposition Survey (1)
- Annual Electricity Supply and Disposition Survey (1)
- Consumer Price Index (1)
- Monthly New Motor Vehicle Sales Survey (1)
- Survey of Employment, Payrolls and Hours (1)
- Survey of Financial Security (1)
- Monthly Passenger Bus and Urban Transit Survey (1)
- Stock and Consumption of Fixed Non-residential Capital (1)
- Tuition and Living Accommodation Costs (1)
- Canadian Cancer Registry (1)
- Vital Statistics - Birth Database (1)
- Census of Agriculture (1)
- Annual Demographic Estimates: Canada, Provinces and Territories (1)
- Longitudinal Administrative Databank (1)
- Annual Survey of Research and Development in Canadian Industry (1)
- Research and Development of Canadian Private Non-Profit Organizations (1)
- Youth in Transition Survey (1)
- Time Use Survey (1)
- General Social Survey - Social Identity (1)
- Canadian Health Measures Survey (1)
- Canadian System of Environmental-Economic Accounts - Physical Flow Accounts (1)
- Government Finance Statistics (1)
- Gross Domestic Expenditures on Research and Development (1)
- Canadian National Health Survey (1)
- Survey of Safety in Public and Private Spaces (1)
- Study on International Money Transfers (1)
- Canadian Housing Survey (1)
- Survey on Early Learning and Child Care Arrangements (SELCCA) (1)
- Canadian Perspectives Survey Series (CPSS) (1)
- Labour Market Indicators (1)
- Bank of Canada (1)
- Longitudinal Employment Analysis Program (1)
Results
All (207)
All (207) (190 to 200 of 207 results)
- Articles and reports: 12-001-X199300114476Description:
This paper focuses on how to deal with record linkage errors when engaged in regression analysis. Recent work by Rubin and Belin (1991) and by Winkler and Thibaudeau (1991) provides the theory, computational algorithms, and software necessary for estimating matching probabilities. These advances allow us to update the work of Neter, Maynes, and Ramanathan (1965). Adjustment procedures are outlined and some successful simulations are described. Our results are preliminary and intended largely to stimulate further work.
Release date: 1993-06-15 - Articles and reports: 12-001-X199300114477Description:
A record-linkage process brings together records from two files into pairs of two records, one from each file, for the purpose of comparison. Each record represents an individual. The status of the pair is a “matched pair” status if the two records in the pair represent the same individual. The status is an “unmatched pair” status if the two records do not represent the same individual. The record-linkage process is governed by an underlying probabilistic process. A record-linkage rule infers the status of each pair of records based on the value of the comparison. The pair is declared a “link” if the inferred status is that of a matched pair, and it is declared a “non-link” if the inferred status is that of an unmatched pair. The discrimination power of a record-linkage rule is the capacity of the rule to designate a maximum number of matched pairs as links, while keeping the rate of unmatched pairs designated as links to a minimum. In general, to construct a discriminatory record-linkage rule, some assumptions must be made on the structure of the underlying probabilistic process. In most of the existing literature, it is assumed that the underlying probabilistic process is an instance of the conditional independence latent class model. However, in many situations, this assumption is false. In fact, many underlying probabilistic processes do not exhibit key properties associated with conditional independence latent class models. The paper introduces more general models. In particular, latent class models with dependencies are studied and it is shown how they can improve the discrimination power of particular record-linkage rules.
Release date: 1993-06-15 - Articles and reports: 12-001-X199300114478Description:
Record linkage refers to the use of an algorithmic technique for identifying pairs of records in separate data files that correspond to the same individual. This paper discusses a framework for evaluating sources of variation in record linkage based on viewing the procedure as a “black box” that takes input data and produces output (a set of declared matched pairs) that has certain properties. We illustrate the idea with a factorial experiment using census/post-enumeration survey data to assess the influence of a variety of factors thought to affect the accuracy of the procedure. The evaluation of record linkage becomes a standard statistical problem using this experimental framework. The investigation provides answers to several research questions, and it is argued that taking an experimental approach similar to that offered here is essential if progress is to be made in understanding the factors that contribute to the error properties of record-linkage procedures.
Release date: 1993-06-15 - Articles and reports: 12-001-X199300114479Description:
Matching records in different administrative data bases is a useful tool for conducting epidemiological studies to study relationships between environmental hazards and health status. With large data bases, sophisticated computerized record linkage algorithms can be used to evaluate the likelihood of a match between two records based on a comparison of one or more identifying variables for those records. Since matching errors are inevitable, consideration needs to be given to the effects of such errors on statistical inferences based on the linked files. This article provides an overview of record linkage methodology, and a discussion of the statistical issues associated with linkage errors.
Release date: 1993-06-15 - Articles and reports: 12-001-X198900114572Description:
The Survey of Income and Program Participation (SIPP) is a new Census Bureau panel survey designed to provide data on the economic situation of persons and families in the United States. The basic datum of SIPP is monthly income, which is reported for each month of the four-month reference period preceding the interview month. The SIPP Record Check Study uses administrative record data to estimate the quality of SIPP estimates for a variety of income sources and transfer programs. The project uses computerized record matching to identify SIPP sample persons in four states who are on record as having received payments from any of nine state or Federal programs, and then compares survey-reported dates and amounts of payments with official record values. The paper describes the project in detail and presents some early findings.
Release date: 1989-06-15 - 196. Methods for adjusting for lack of independence in an application of the Fellegi-Sunter model of record linkage ArchivedArticles and reports: 12-001-X198900114574Description:
Let A x B be the product space of two sets A and B which is divided into matches (pairs representing the same entity) and nonmatches (pairs representing different entities). Linkage rules are those that divide A x B into links (designated matches), possible links (pairs for which we delay a decision), and nonlinks (designated nonmatches). Under fixed bounds on the error rates, Fellegi and Sunter (1969) provided a linkage rule that is optimal in the sense that it minimizes the set of possible links. The optimality is dependent on knowledge of certain probabilities that are used in a crucial likelihood ratio. In applying the record linkage model, an independence assumption is often made that allows estimation of the probabilities. If the assumption is not met, then a record linkage procedure using estimates computed under the assumption may not be optimal. This paper contains an examination of methods for adjusting linkage rules when the independence assumption is not valid. The presentation takes the form of an empirical analysis of lists of businesses for which the truth of matches is known. The number of possible links obtained using standard and adjusted computational procedures may be dependent on different samples. Bootstrap methods (Efron 1987) are used to examine the variation due to different samples.
Release date: 1989-06-15 - 197. A brief note on SQL ArchivedArticles and reports: 12-001-X198800214583Description:
This note portrays SQL, highlighting its strengths and weaknesses.
Release date: 1988-12-15 - 198. ACTR a generalized automated coding system ArchivedArticles and reports: 12-001-X198800214586Description:
A generalized implementation of a method for performing automated coding is described. Traditionally, coding has been performed manually by specially trained personnel, but recently computerized systems have appeared which either eliminate or substantially reduce the need for manual coding. Typically, such systems are limited in use to those applications for which they were originally designed. The system presented here may be used by any application to perform coding of English or French text using any classification scheme.
Release date: 1988-12-15 - 199. QUID, a general automatic coding method ArchivedArticles and reports: 12-001-X198800214587Description:
The QUID system, which was designed and developed by INSEE (Paris) Institut National de la Statistique et des Études Économiques- National Statistics and Economic Studies Institute, is an automatic coding system for survey data collected in the form of literal headings expressed in the terminology of the respondent. The system hinges on the use of a very wide knowledge base made up of real phrases coded by experts. This study deals primarily with the preliminary automatic standardization processing of the phrases, and then with the algorithm used to organize the phrase base into an optimized tree pattern. A sorting example is provided in the form of an illustration. At present, the processing of additional coding variables used to complement the information contained in the phrases presents certain difficulties, and these will be examined in detail. The QUID 2 project, an updated version of the system, will be discussed briefly.
Release date: 1988-12-15 - 200. Evaluation of reverse record check estimates of undercoverage in the Canadian Census of Population ArchivedArticles and reports: 12-001-X198800214595Description:
Estimates of undercoverage in the Canadian Census of Population have been produced for each Census since 1961, using a Reverse Record Check method. The reliability of the estimates is important to how they are used to assess the quality of the Census data and to identify significant causes of coverage error. It is also critical to the development of methods and procedures to improve coverage for future Censuses. The purpose of this paper is to identify potential sources of error in the Reverse Record Check, which should be understood and addressed, where possible, in using this method to estimate coverage error.
Release date: 1988-12-15
- Previous Go to previous page of All results
- 1 Go to page 1 of All results
- ...
- 15 Go to page 15 of All results
- 16 Go to page 16 of All results
- 17 Go to page 17 of All results
- 18 Go to page 18 of All results
- 19 Go to page 19 of All results
- 20 (current) Go to page 20 of All results
- 21 Go to page 21 of All results
- Next Go to next page of All results
Data (1)
Data (1) ((1 result))
- Table: 11-10-0074-01Geography: Census tractFrequency: OccasionalDescription:
The divergence index (D-index) describes the degree that families with different income levels are mixing together in neighbourhoods. It compares neighbourhood (census tract, CT) discrete income distributions to a base distribution, which is the income quintiles of the neighbourhood’s census metropolitan area (CMA).
Release date: 2020-06-22
Analysis (199)
Analysis (199) (0 to 10 of 199 results)
- Journals and periodicals: 11-633-XDescription: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.Release date: 2026-04-24
- Articles and reports: 12-001-X202500200001Description: Nested error regression models are commonly used to incorporate unit specific auxiliary variables to improve small area estimates. When the mean structure of the model is misspecified, the design-based mean squared prediction error (MSPE) of Empirical Best Linear Unbiased Predictors (EBLUP) generally increases. The Observed Best Prediction (OBP) method has been proposed with the intent to improve on the design-based MSPE over EBLUP. In this paper, we conduct a Monte Carlo simulation experiments to understand the effect of misspsecification of mean structures on different small area estimators. Our findings suggest that the OBP using unit-level auxiliary variables does not outperform the EBLUP in terms of design-based MSPE, unless the number of small areas m is extremely large. Conversely, the performance of OBP significantly improves when area-level auxiliary variables are employed. This paper includes both analytical and numerical evidence to demonstrate these observations, providing practical insights for addressing model misspecification in small area estimation (SAE).Release date: 2025-12-23
- Articles and reports: 12-001-X202500200007Description: Although probability samples have been regarded as the gold standard to collect information for population-based study, non-probability samples have been used frequently in practice due to low cost, convenience, and the lack of the sampling frame for the survey. Naïve estimates based on non-probability samples without any adjustments may be misleading due to selection bias. Recently, a valid data integration approach that includes mass imputation, propensity score weighting, and calibration has been used to improve the representativeness of non-probability samples. The effectiveness of the mass imputation approach depends on the underlying model assumptions. In this paper, we propose using deep learning for the mass imputation in the combining of probability and non-probability samples and compare it with several modern machine learning-based mass imputation approaches, including generalized additive modeling, regression tree, random forest, and XG-boosting. In the simulation study, deep learning-based approaches have been shown to be more robust and effective than other mass imputation approaches against the failure of underlying model assumptions under non-linearity scenarios.Release date: 2025-12-23
- Articles and reports: 12-001-X202500200008Description: Classical design-based survey estimation relies on a properly specified sampling design for valid inference. We consider the properties of regression estimation under a misspecified sample design, in which the nominal and true inclusion probabilities do not necessarily match. This general misspecified sample design setting encompasses many challenges in the modern survey environment. Under this setting, an asymptotic analysis of the regression estimator, an expression of the bias, and an expression of the variance are presented. Further, a consistent variance estimator is derived and an expression which estimates the bias in-part or in-whole is discussed. This later expression may be used as an indicator of the presence of bias due to misspecification by a practitioner. A simulation study is conducted to support the presented theory.Release date: 2025-12-23
- Articles and reports: 18-001-X2025001Description: This paper brings the analysis of business cluster to a more granular geographic scale by developing a methodology for identifying business clusters at the neighborhood level. The proposed method identifies clusters of businesses at the DB level, which is one of the most granular spatial units of analysis defined by Statistics Canada. The method is developed with an application to four census metropolitan areas (CMAs) of different sizes and for different industry cluster specifications, including simple 2-digit North American Industry Classification System (NAICS) groups as well as industry clusters resulting from groupings of NAICS codes, as defined by Delgado et al. (2014).Release date: 2025-10-10
- Articles and reports: 11-522-X202500100019Description: Accurate and efficient record linkage is crucial for maintaining a comprehensive and current Statistical Business Register (SBR) at Statistics Canada. Linking external business lists to the SBR by name presents computational and methodological challenges, especially as data volumes grow. This paper describes a scalable methodology that employs blocking techniques to constrain the computational search space and integrates multiple similarity measures—from edit distances and n-gram overlaps to embedding-based methods using Sentence-BERT (SBERT)—to identify likely matches. By combining simple character-level comparisons with more advanced semantic embedding methods, the approach can adapt to various naming conventions and complexities. While it does not guarantee superior accuracy in all circumstances, it offers a pragmatic balance between computational feasibility and linkage quality.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100020Description: At Statistics Canada, many data sets are linked with quasi-identifiers such as the first name, last name, or address. In such cases, linkage errors are a potential concern and must be measured. In that regard, previous studies have shown that the evaluation may be based on modeling the number of links from a given record while accounting for all the interactions among the linkage variables and dispensing with clerical reviews, so long as the decision to link two records does not involve other records. In this communication, the methodology is adapted for a class of practical strategies, which violate this constraint by linking the records in consecutive waves, where a given wave links a subset of the records that are not linked in previous waves. In particular, the linkage may be based on a deterministic wave followed by a probabilistic one.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100021Description: Optimal threshold selection is a critical challenge in probabilistic linkage, with significant implications for the accuracy and reliability of linked datasets. This paper analyzes the performance of the neighbour model, a recently proposed error model which models linkage errors by the number of links from each record. Three threshold selection algorithms utilizing the neighbour model were assessed, highlighting the strengths and limitations of each. Their performance was assessed through simulation studies, which demonstrated that methods using the neighbour model achieved lower relative bias compared to two established methods for threshold selection. Additionally, the practical utility was validated through goodness-of-fit tests conducted on four agricultural datasets, showing the potential of the model for use in real-world applications.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100022Description: In Canada, T1 Tax forms are used to report personal income, whether earned as an employee or through self-employment. Income from self-employment, or "T1 Business Income" is reported by sole proprietorships or partnerships. A T1 partnership involves two or more legal entities jointly filing for a shared business. T1 business data is received as individual filings, meaning partnerships are received separately for each partner. Internal record linkage within the T1 business database is performed to identify partnerships and prevent overcoverage within the final population of T1 businesses. This new T1 partnership identification process takes advantage of newer algorithms, such as DBSCAN numerical clustering fuzzy matching, to identify internal linkages. Graph theory is used to construct the list of partnerships from the row-pairs identified in the linkage process.Release date: 2025-09-08
- Articles and reports: 11-522-X202500100023Description: The latest Canadian Census Health and Environment Cohort (CanCHEC) continues a series of population-based microdata linkages focused on population health research by demographic, social and economic characteristics. The 2021 CanCHEC consists of 95.5% of the 2021 Census long-form sample survey records. The records of survey respondents that could not be linked to the Derived Record Depository and those presumed to be duplicates account for the remaining 4.5%. Linkage-adjusted main and replicate weights allow researchers to estimate and evaluate the variance of summary measures about population health in the presence of missed linked pairs to better understand the experiences of diverse population groups.Release date: 2025-09-08
- Previous Go to previous page of Analysis results
- 1 (current) Go to page 1 of Analysis results
- 2 Go to page 2 of Analysis results
- 3 Go to page 3 of Analysis results
- 4 Go to page 4 of Analysis results
- 5 Go to page 5 of Analysis results
- 6 Go to page 6 of Analysis results
- 7 Go to page 7 of Analysis results
- ...
- 20 Go to page 20 of Analysis results
- Next Go to next page of Analysis results
Reference (7)
Reference (7) ((7 results))
- Surveys and statistical programs – Documentation: 84-538-XGeography: CanadaDescription: This electronic publication presents the methodology underlying the production of the life tables for Canada, provinces and territories.Release date: 2023-08-28
- Surveys and statistical programs – Documentation: 82-225-X200701010508Description:
The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.
Release date: 2008-01-18 - Surveys and statistical programs – Documentation: 11-522-X20050019476Description:
The paper will show how, using data published by Statistics Canada and available from member libraries of the CREPUQ, a linkage approach using postal codes makes it possible to link the data from the outcomes file to a set of contextual variables. These variables could then contribute to producing, on an exploratory basis, a better index to explain the varied outcomes of students from schools. In terms of the impact, the proposed index could show more effectively the limitations of ranking students and schools when this information is not given sufficient weight.
Release date: 2007-03-02 - Surveys and statistical programs – Documentation: 68-514-XDescription:
Statistics Canada's approach to gathering and disseminating economic data has developed over several decades into a highly integrated system for collection and estimation that feeds the framework of the Canadian System of National Accounts.
The key to this approach was creation of the Unified Enterprise Survey, the goal of which was to improve the consistency, coherence, breadth and depth of business survey data.
The UES did so by bringing many of Statistics Canada's individual annual business surveys under a common framework. This framework included a single survey frame, a sample design framework, conceptual harmonization of survey content, means of using relevant administrative data, common data collection, processing and analysis tools, and a common data warehouse.
Release date: 2006-11-20 - Surveys and statistical programs – Documentation: 89-612-XDescription:
This paper describes the structure and linkage of two databases: the Longitudinal Administrative Databank (LAD), and the Longitudinal Immigration Database (IMDB). The combined data associate landed immigrant taxfilers on the LAD with their key characteristics upon immigration. The paper highlights how the combined information, referred to here as the LAD_IMDB, enhances and complements the existing separate databases. The paper compares the full IMDB file with the sample of immigrants to assess the representativeness of the sample file.
Release date: 2004-01-05 - Surveys and statistical programs – Documentation: 81-595-M2003005Geography: CanadaDescription:
This paper develops technical procedures that may enable ministries of education to link provincial tests with national and international tests in order to compare standards and report results on a common scale.
Release date: 2003-05-29 - Surveys and statistical programs – Documentation: 85-602-XDescription:
The purpose of this report is to provide an overview of existing methods and techniques making use of personal identifiers to support record linkage. Record linkage can be loosely defined as a methodology for manipulating and / or transforming personal identifiers from individual data records from one or more operational databases and subsequently attempting to match these personal identifiers to create a composite record about an individual. Record linkage is not intended to uniquely identify individuals for operational purposes; however, it does provide probabilistic matches of varying degrees of reliability for use in statistical reporting. Techniques employed in record linkage may also be of use for investigative purposes to help narrow the field of search against existing databases when some form of personal identification information exists.
Release date: 2000-12-05