Statistical methods

Skip to filters. View results.

Key indicators

Changing any selection will automatically update the page content.

Selected geographical area:Canada

Selected geographical area:Newfoundland and Labrador

Selected geographical area:Prince Edward Island

Selected geographical area:Nova Scotia

Selected geographical area:New Brunswick

Selected geographical area:Quebec

Selected geographical area:Ontario

Selected geographical area:Manitoba

Selected geographical area:Saskatchewan

Selected geographical area:Alberta

Selected geographical area:British Columbia

Selected geographical area:Yukon

Selected geographical area:Northwest Territories

Selected geographical area:Nunavut

Sort Help
entries

Results

All (2,478)

All (2,478) (30 to 40 of 2,478 results)

  • Journals and periodicals: 12-206-X
    Description: This report summarizes the annual achievements of the Methodology Research and Development Program (MRDP) sponsored by the Modern Statistical Methods and Data Science Branch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the agency’s statistical programs; these activities would otherwise be less likely to be carried out during the provision of regular methodology services to those programs. The MRDP also includes activities that provide support in the application of past successful developments in order to promote the use of the results of research and development work. Selected prospective research activities are also presented.
    Release date: 2025-10-10

  • Articles and reports: 18-001-X2025001
    Description: This paper brings the analysis of business cluster to a more granular geographic scale by developing a methodology for identifying business clusters at the neighborhood level. The proposed method identifies clusters of businesses at the DB level, which is one of the most granular spatial units of analysis defined by Statistics Canada. The method is developed with an application to four census metropolitan areas (CMAs) of different sizes and for different industry cluster specifications, including simple 2-digit North American Industry Classification System (NAICS) groups as well as industry clusters resulting from groupings of NAICS codes, as defined by Delgado et al. (2014).
    Release date: 2025-10-10

  • Articles and reports: 11-522-X202500100001
    Description: Synthetic data generation (SDG) is increasingly applied across sectors for privacy-preserving data sharing, de-biasing and augmentation. Each use case requires a distinct set of evaluation metrics that must account for the stochasticity of the SDG process: membership and attribute disclosure vulnerability are critical for privacy; fidelity and downstream task utility apply more broadly; and fairness and diversity are relevant for de-biasing and augmentation, respectively. Presenting accumulated evidence and through exemplar case studies, it is shown that SDG can perform well across many of these use cases and our key learnings from our experiences with synthetic health data are shared.
    Release date: 2025-09-08

  • Articles and reports: 11-522-X202500100002
    Description: Under the consumer-merchant bipartite network, we apply the indirect sampling approach to estimate merchant payment acceptance through a consumer payment diary. The records of in-person transactions in the consumer diary provide both the merchant sample via consumer-merchant linkages, and the merchant acceptance via consumers' responses on methods of payments used and accepted. Among merchants receiving multiple transactions during the period of the diary, we show that the derived payment acceptance from the consumer reporting is high quality in terms of very few conflicts between usage and perception, and within perceptions. Therefore, consumers are leveraged to be both sampling and reporting units in our indirect sampling application to eliminate merchant response burden. Furthermore, the necessity to proceed to weight adjustment to account for the non-recorded-merchant bias due to the relatively shorter duration of the diary (i.e., 3 days) is shown. Finally, these indirect sampling estimates are compared to the ones from a direct sampling survey, and it is found that the results are aligning well.
    Release date: 2025-09-08

  • Articles and reports: 11-522-X202500100003
    Description: In-person data collection is critical for the success of many large government-sponsored surveys. Despite response rate declines and increasing costs, the mode remains the gold standard for meeting the most rigorous survey requirements for federal survey programs, particularly as part of a multimode data collection strategy (Schober, 2018). However, over the last ten years critical labor market and workforce changes, exacerbated by the pandemic, have made in-person data collection efforts prohibitive for all but the largest survey organizations. Shifting ideas about job flexibility and job satisfaction alongside the increasingly technical role and demanding nature of the job have impacted recruitment and retention for survey organizations across the U.S. and Europe (Charman et al., 2024). The trends in U.S. field data collector employment are summarized and it is outlined that there are promising practices in recruiting and retaining high quality field data collectors. Additionally, broader ways to structure the field data collector labor force for continued success are considered, including supplementing field data collection with multimode alternatives such as video interviewing and updating value propositions for respondents.
    Release date: 2025-09-08

  • Articles and reports: 11-522-X202500100004
    Description: The Survey of Household Spending (SHS) conducted by Statistics Canada collects paper diaries and shopping receipts as a source of household expenditure data. An auto-capturing algorithm was created for SHS 2023 to reduce statistical clerks' manual work of extracting important information from scanned receipts of common store brands. The algorithm used Tesseract optical character recognition (OCR) to extract text characters from images of receipts, and it identified store and product entities using regular expressions, also known as regex. The goal of this study was to enhance the current auto-capture algorithm by experimenting with more advanced OCR and machine learning methods. As a result, PaddleOCR, an open-source OCR toolkit, was selected as the new default OCR engine due to its overall performance in recognizing texts, especially digits, accurately across receipts of various qualities. Additionally, entity classifiers based on support vector machines were trained on historical SHS records and existing regex patterns. By using classifiers to categorize different elements present on receipts instead of relying solely on regex patterns, product and store recognition improved. It is expected that this new algorithm will be used for SHS 2025 to improve the auto-capture quality and reduce the manual burden associated with capturing receipt variables.
    Release date: 2025-09-08

  • Articles and reports: 11-522-X202500100005
    Description: The Physical Flow Account for Plastic Material (PFAPM) aims to enhance environmental-economic analysis by tracking plastic material flows within the Canadian economy. To help streamline this complex process, the project leveraged advanced natural language processing (NLP) such as large language models (LLM) techniques to automate sector classification and summarize the impact of COVID-19 from company reports. By integrating machine learning models and retrieval-augmented generation (RAG) methods, the manual workload was significantly reduced, improving data analysis efficiency, and leading to higher quality insights.
    Release date: 2025-09-08

  • Articles and reports: 11-522-X202500100006
    Description: Small area estimation is frequently used to produce estimates at a disaggregated level where direct survey estimation does not have sufficient sample to produce precise estimates. Often this is done using the area-level Fay-Herriot model, by assuming the direct estimates are independent under the design and have a known variance, and applying a smoothing process to the variance estimates of the direct estimates to better meet that last assumption. It is not rare that small area estimates are benchmarked/raked to aggregated level direct estimates. This article shows that wrongly assuming independence can have a big impact on the MSE of the raked estimates. Values of the covariances between direct estimates are thus required for good point and MSE estimates. Getting good estimates of those covariances is difficult given the small sample sizes in some areas. An original way of deriving values for those covariances, by reverse-engineering a hypothetical raking process, is presented.
    Release date: 2025-09-08

  • Articles and reports: 11-522-X202500100007
    Description: This paper employs the Pseudo Maximum Likelihood (PML) estimator to the non-probability two-phase sampling when relevant auxiliary information is available from both probability survey sample and non-probability survey sample. To accommodate various weight adjustments and estimates variance beyond totals and means such as medians and quantiles, a simplified pseudo-population bootstrap procedure is proposed to approximately estimate the second-phase variance. Specifically, the simplification ignores the second phase sampling variability (i.e., treated as fixed, while in fact it is random), if the first-phase sampling fraction of the non-probability sample is negligible. Using the Bank of Canada 2020 Cash Alternative Survey Wave 2, the performance of the proposed method is compared to alternative methods, which either do not explicitly model the selection probability (i.e., raking) or ignore the valuable information from Phase 1 (i.e., Phase-2-Only). The results show that the PML-based approach performs better than raking and Phase-2-Only estimates in terms of reducing the selection bias for both phases' payment-related variables, especially for the low-response youth group. Estimated variances of the PML-based estimates are stable.

    Release date: 2025-09-08

  • Articles and reports: 11-522-X202500100008
    Description: In 2020, Statistics Canada started to use probabilistic web panels as an alternate method of collecting official statistics. In a web panel, respondents to another survey are asked for contact information to participate in future short surveys. This paper will highlight Statistics Canada's experience with panels after 4 years, including what has been learned about the recruitment of panel participants and how to subsequently collect data using panel surveys. The ways in which recruitment questions are presented can result in very different rates of participation. Moreover, the wealth of auxiliary information available on the recruitment survey can be used to actively manage panel collection operations, by predicting the probability of response and using this information to target follow-up efforts.
    Release date: 2025-09-08
Data (10)

Data (10) ((10 results))

  • Public use microdata: 89F0002X
    Description: The SPSD/M is a static microsimulation model designed to analyse financial interactions between governments and individuals in Canada. It can compute taxes paid to and cash transfers received from government. It is comprised of a database, a series of tax/transfer algorithms and models, analytical software and user documentation.
    Release date: 2026-02-12

  • Profile of a community or region: 46-26-0002
    Description: The National Address Register (NAR) is a list of commercial and residential addresses in Canada that are extracted from Statistics Canada's Building Register and deemed non-confidential.
    Release date: 2025-12-19

  • Table: 89-26-0006
    Description: PASSAGES is an open-source dynamic microsimulation model aimed at supporting policy analysis and research relating to Canadian retirement income system outcomes at the individual and family level. The publicly available version includes a synthetic starting database, a model, and documentation. A confidential starting database is also available.
    Release date: 2025-03-12

  • Data Visualization: 71-607-X2020010
    Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
    Release date: 2024-08-21

  • Table: 11-10-0074-01
    Geography: Census tract
    Frequency: Occasional
    Description:

    The divergence index (D-index) describes the degree that families with different income levels are mixing together in neighbourhoods. It compares neighbourhood (census tract, CT) discrete income distributions to a base distribution, which is the income quintiles of the neighbourhood’s census metropolitan area (CMA).

    Release date: 2020-06-22

  • Data Visualization: 71-607-X2019010
    Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
    Release date: 2019-10-30

  • Table: 53-500-X
    Description:

    This report presents the results of a pilot survey conducted by Statistics Canada to measure the fuel consumption of on-road motor vehicles registered in Canada. This study was carried out in connection with the Canadian Vehicle Survey (CVS) which collects information on road activity such as distance traveled, number of passengers and trip purpose.

    Release date: 2004-10-21

  • Table: 13-220-X
    Description: In the 1997 edition, new and revised benchmarks were introduced for 1992 and 1988. The indicators are used to monitor supply, demand and employment for tourism in Canada on a timely basis. The annual tables are derived using the National Income and Expenditure Accounts (NIEA) and various industry and travel surveys. Tables providing actual data and percentage changes, for seasonally adjusted current and constant price estimates are included. In addition, an analytical section provides graphs, and time series of first differences, percentage changes, and seasonal factors for selected indicators. Data are published from 1987 and the publication will be available on the day of release. New data are included in the demand tables for non-tourism commodities produced by non-tourism industries and in the employment tables covering direct tourism employment generated by non-tourism industries. This product was commissioned by the Canadian Tourism Commission to provide annual updates for the Tourism Satellite Account.
    Release date: 2003-01-08

  • Table: 11-516-X
    Description:

    The second edition of Historical statistics of Canada was jointly produced by the Social Science Federation of Canada and Statistics Canada in 1983. This volume contains about 1,088 statistical tables on the social, economic and institutional conditions of Canada from the start of Confederation in 1867 to the mid-1970s. The tables are arranged in sections with an introduction explaining the content of each section, the principal sources of data for each table, and general explanatory notes regarding the statistics. In most cases, there is sufficient description of the individual series to enable the reader to use them without consulting the numerous basic sources referenced in the publication.

    The electronic version of this historical publication is accessible on the Internet site of Statistics Canada as a free downloadable document: text as HTML pages and all tables as individual spreadsheets in a comma delimited format (CSV) (which allows online viewing or downloading).

    Release date: 1999-07-29

  • Table: 82-567-X
    Description:

    The National Population Health Survey (NPHS) is designed to enhance the understanding of the processes affecting health. The survey collects cross-sectional as well as longitudinal data. In 1994/95 the survey interviewed a panel of 17,276 individuals, then returned to interview them a second time in 1996/97. The response rate for these individuals was 96% in 1996/97. Data collection from the panel will continue for up to two decades. For cross-sectional purposes, data were collected for a total of 81,000 household residents in all provinces (except people on Indian reserves or on Canadian Forces bases) in 1996/97.

    This overview illustrates the variety of information available by presenting data on perceived health, chronic conditions, injuries, repetitive strains, depression, smoking, alcohol consumption, physical activity, consultations with medical professionals, use of medications and use of alternative medicine.

    Release date: 1998-07-29
Analysis (2,036)

Analysis (2,036) (1,970 to 1,980 of 2,036 results)

  • Articles and reports: 12-001-X197900254832
    Description:

    A Hot Deck imputation procedure is defined to be one where an incomplete response is completed by using values from one or more other records on the same file and the choice of these records varies with the record requiring imputation.

    General approaches to Hot Deck imputation are outlined, with emphasis on the interaction between the edit constraints and the imputation procedures. Distance functions can be constructed on a mixture of categorical and numeric fields, can be modified to take account of the relative importance of fields and can discriminate against less desirable donors. Matching fields may be correlated with missing fields, may be linked with missing fields by edits or may be natural stratification variables; but increasing the number of matching fields does not necessarily result in a better match. It is important to audit the imputation process and to summarize its performance.

    Hot Deck procedures should be evaluated to study the bias and reliability of the estimates, donor usage and frequency of imputation failure in terms of a variety of conditions of the data and variations of the imputation procedure. It appears that the only generally available approach to evaluation is by simulation.

    Release date: 1979-12-14

  • Articles and reports: 12-001-X197900254833
    Description:

    This paper looks at the current state of development of social statistics in Canada. Some key concepts related to statistics and social information are defined and discussed. The availability and analysis of administrative data is highlighted, along with the need for social surveys. Suggestions are made about the types of data analysis needed for the development of social decision models to meet policy requirements. Finally, an outline of priorities for future work toward the effective use of social statistics is given.

    Release date: 1979-12-14

  • Articles and reports: 12-001-X197900100001
    Description: This paper discusses the management of information within the context of the information industry and indicates some likely future trends related thereto. The information industry itself is first briefly described. Then the process used in producing information, the organizational structure required for such production, and the legislation relating to the information industry are discussed in turn. Finally, some approaches to solving the problems of the future are suggested.
    Release date: 1979-06-15

  • Articles and reports: 12-001-X197900100002
    Description: This paper includes a description of interviewer techniques and procedures used to minimize non-response, an outline of methods used to monitor and control non-response, and a discussion of how non-respondents are treated in the data processing and estimation stages of the Canadian Labour Force Survey. Recent non-response rates as well as data on the characteristics of non-respondents are also given. It is concluded that a yearly non-response rate of approximately 5 percent is probably the best that can be achieved in the Labour Force Survey.
    Release date: 1979-06-15

  • Articles and reports: 12-001-X197900100003
    Description: Two methods for estimating the correlated response variance of a survey estimator are studied by way of both theoretical comparison and empirical investigation. The variance of these estimators is discussed and the effects of outliers examined. Finally, an improved estimator is developed and evaluated.
    Release date: 1979-06-15

  • Articles and reports: 12-001-X197900100004
    Description: Let U = {1, 2, …, i, …, N} be a finite population of N identifiable units. A known “size measure” x_i is associated with unit i; i = 1, 2, ..., N. A sampling procedure for selecting a sample of size n (2 < n < N) with probability proportional to size (PPS) and without replacement (WOR) from the population is proposed. With this method, the inclusion probability is proportional to size (IPPS) for each unit in the population.
    Release date: 1979-06-15

  • Articles and reports: 12-001-X197900100005
    Description: Approximate cutoff rules for stratifying a population into a take-all and take-some universe have been given by Dalenius (1950) and Glasser (1962). They expressed the cutoff value (that value which delineates the boundary of the take-all and take-some) as a function of the mean, the sampling weight and the population variance. Their cutoff values were derived on the assumption that a single random sample of size n was to be drawn without replacement from the population of size N.

    In the present context, exact and approximate cutoff rules have been worked out for a similar situation. Rather than providing the sample size of the sample, the precision (coefficient of variation) is given. Note that in many sampling situations, the sampler is given a set of objectives in terms of reliability and not sample size. The result is particularly useful for determining the take-all - take-some boundary for samples drawn from a known population. The procedure is also extended to ratio estimation.
    Release date: 1979-06-15

  • Articles and reports: 12-001-X197900100006
    Description: Under a sequential sampling plan, the proportion defective in the sample is generally a biased estimator of the population value. In this paper, an unbiased estimator is given. Also, an unbiased estimator of its variance is derived. These results are applied to an estimation problem from the 1976 Canadian Census.
    Release date: 1979-06-15

  • Articles and reports: 12-001-X197800254832
    Description: I.P. Fellegi and D. Holt proposed a systematic approach to automatic edit and imputation. An implementation of this proposal was a Generalized Edit and Imputation System by the Hot-Deck Approach, that was utilized in the edit and imputation of the 1976 Canadian Census of Population and Housing. This paper discusses that application, evaluating the strengths and weaknesses of the methodology with some empirical evidence. The system will be considered in relation to the general issues of the edit and imputation of survey data. Some directions for future developments will also be considered.
    Release date: 1978-12-15

  • Articles and reports: 12-001-X197800254833
    Description: Owners of small businesses complain about the quantity of forms they are required to collectors of statistics. Administrative data are an alternative source but do not usually include all the information required by the survey takers.

    The “Tax Data Imputation System” makes use of tax data collected from a large number of businesses by Revenue Canada and data obtained by sample survey for a small subset of these businesses. Survey data is imputed (estimated) for all the businesses not actually surveyed using a “hot-deck” technique, with adjustments made to ensure certain edit rules are satisfied. The results of a simulation study suggest that this procedure has reasonable statistical properties. Estimators (of means or totals) are unbiased with variances of comparable size to the corresponding ratio estimators.
    Release date: 1978-12-15
Reference (380)

Reference (380) (330 to 340 of 380 results)

  • Surveys and statistical programs – Documentation: 13-605-X19970018521
    Description:

    A historical revision of the National Economic and Financial Accounts was published on December 12, 1997. This historical revision had three goals.

    Release date: 1997-12-12

  • Notices and consultations: 62-010-X19970023422
    Description:

    The current official time base of the Consumer Price Index (CPI) is 1986=100. This time base was first used when the CPI for June 1990 was released. Statistics Canada is about to convert all price index series to the time base 1992=100. As a result, all constant dollar series will be converted to 1992 dollars. The CPI will shift to the new time base when the CPI for January 1998 is released on February 27th, 1998.

    Release date: 1997-11-17

  • Notices and consultations: 92-125-G
    Description:

    This consultation guide marks the beginning of the content consultation and testing process for the 2001 Census. A broad range of data users, including those in every level of government, national associations, non-government organizations, community groups, businesses and private sector, universities and the general public, will be asked to provide their comments on the questions asked, requirements for future census information, and the identification of data gaps.

    Release date: 1997-10-31

  • Notices and consultations: 87-003-X19970012882
    Geography: Canada
    Description:

    The purpose of this article is to inform Travel-log readers of the availability of a new analytical tool - the National Tourism Indicators. These estimates, which measure trends in tourism in Canada, are placed in perspective here, taking into account the concepts and definitions used in developing them.

    Release date: 1997-01-08

  • Surveys and statistical programs – Documentation: 13-604-M1996035
    Description:

    About once every five years, the System of National Accounts (SNA) is rebased to keep up with the evolution of prices in the economy. In other words, its aggregates at constant prices are recalculated in terms of the prices of a more recent time. Also, the System is revamped about once a decade to introduce new accounting conventions, improved methods of estimation and revised statistical classifications. These revisions will change the gross domestic product (GDP) of the past 70 years. Both types of revision are presently underway, with their results scheduled for release next year.

    This article takes an advance look at the likely effect of rebasing the SNA on the record of growth since 1992. It presents the results of an approximate rebasing of the expenditure-based GDP of the quarterly National Income and Expenditure Accounts (NIEA).

    Release date: 1996-08-30

  • Surveys and statistical programs – Documentation: 11F0019M1995083
    Geography: Canada
    Description:

    This paper examines the robustness of a measure of the average complete duration of unemployment in Canada to a host of assumptions used in its derivation. In contrast to the average incomplete duration of unemployment, which is a lagging cyclical indicator, this statistic is a coincident indicator of the business cycle. The impact of using a steady state as opposed to a non steady state assumption, as well as the impact of various corrections for response bias are explored. It is concluded that a non steady state estimator would be a valuable compliment to the statistics on unemployment duration that are currently released by many statistical agencies, and particularly Statistics Canada.

    Release date: 1995-12-30

  • Surveys and statistical programs – Documentation: 75F0002M1993001
    Description:

    This paper discusses the advantages and disadvantages of an approach to collecting income data being tested for the Survey of Labour and Income Dynamics (SLID) whereby respondents would be encouraged to refer to their T1 income tax forms.

    Release date: 1995-12-30

  • Surveys and statistical programs – Documentation: 75F0002M1993002
    Description:

    The paper provides question wording, lays out the possible responses, and maps out the flow of the questions for the Survey of Labour and Income Dynamics (SLID) labour interview questionnaire.

    Release date: 1995-12-30

  • Surveys and statistical programs – Documentation: 75F0002M1993004
    Description:

    This paper provides a description of the data collection procedures and the question wordings for the income and wealth portion of the Survey of Labour and Income Dynamics (SLID), as well as some rationale for the chosen direction.

    Release date: 1995-12-30

  • Surveys and statistical programs – Documentation: 75F0002M1993005
    Description:

    This paper presents general observations from the members of the Survey of Labour and Income Dynamics head office project team, a summary of responses by a subset of interviewers in the test who were asked to complete a debriefing questionnaire after completing the test and detailed comments by the observers from Head Office.

    Release date: 1995-12-30