Statistical methods

Key indicators

Changing any selection will automatically update the page content.

Selected geographical area: Canada

Selected geographical area: Newfoundland and Labrador

Selected geographical area: Prince Edward Island

Selected geographical area: Nova Scotia

Selected geographical area: New Brunswick

Selected geographical area: Quebec

Selected geographical area: Ontario

Selected geographical area: Manitoba

Selected geographical area: Saskatchewan

Selected geographical area: Alberta

Selected geographical area: British Columbia

Selected geographical area: Yukon

Selected geographical area: Northwest Territories

Selected geographical area: Nunavut

Sort Help
entries

Results

All (2,299)

All (2,299) (30 to 40 of 2,299 results)

  • Stats in brief: 11-637-X
    Description: This product presents data on the Sustainable Development Goals. They present an overview of the 17 Goals through infographics by leveraging data currently available to report on Canada’s progress towards the 2030 Agenda for Sustainable Development.
    Release date: 2024-01-25

  • Articles and reports: 11-633-X2024001
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.
    Release date: 2024-01-22

  • Articles and reports: 13-604-M2024001
    Description: This documentation outlines the methodology used to develop the Distributions of household economic accounts published in January 2024 for the reference years 2010 to 2023. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.
    Release date: 2024-01-22

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2024-01-22

  • Stats in brief: 11-001-X202402237898
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2024-01-22

  • Articles and reports: 12-001-X202300200001
    Description: When a Medicare healthcare provider is suspected of billing abuse, a population of payments X made to that provider over a fixed timeframe is isolated. A certified medical reviewer, in a time-consuming process, can determine the overpayment Y = X - (amount justified by the evidence) associated with each payment. Typically, there are too many payments in the population to examine each with care, so a probability sample is selected. The sample overpayments are then used to calculate a 90% lower confidence bound for the total population overpayment. This bound is the amount demanded for recovery from the provider. Unfortunately, classical methods for calculating this bound sometimes fail to provide the 90% confidence level, especially when using a stratified sample.

    In this paper, 166 redacted samples from Medicare integrity investigations are displayed and described, along with 156 associated payment populations. The 7,588 examined (Y, X) sample pairs show (1) Medicare audits have high error rates: more than 76% of these payments were considered to have been paid in error; and (2) the patterns in these samples support an “All-or-Nothing” mixture model for (Y, X) previously defined in the literature. Model-based Monte Carlo testing procedures for Medicare sampling plans are discussed, as well as stratification methods based on anticipated model moments. In terms of viability (achieving the 90% confidence level) a new stratification method defined here is competitive with the best of the many existing methods tested and seems less sensitive to choice of operating parameters. In terms of overpayment recovery (equivalent to precision) the new method is also comparable to the best of the many existing methods tested. Unfortunately, no stratification algorithm tested was ever viable for more than about half of the 104 test populations.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200002
    Description: Being able to quantify the accuracy (bias, variance) of published output is crucial in official statistics. Output in official statistics is nearly always divided into subpopulations according to some classification variable, such as mean income by categories of educational level. Such output is also referred to as domain statistics. In the current paper, we limit ourselves to binary classification variables. In practice, misclassifications occur and these contribute to the bias and variance of domain statistics. Existing analytical and numerical methods to estimate this effect have two disadvantages. The first disadvantage is that they require that the misclassification probabilities are known beforehand and the second is that the bias and variance estimates are biased themselves. In the current paper we present a new method, a Gaussian mixture model estimated by an Expectation-Maximisation (EM) algorithm combined with a bootstrap, referred to as the EM bootstrap method. This new method does not require that the misclassification probabilities are known beforehand, although it is more efficient when a small audit sample is used that yields a starting value for the misclassification probabilities in the EM algorithm. We compared the performance of the new method with currently available numerical methods: the bootstrap method and the SIMEX method. Previous research has shown that for non-linear parameters the bootstrap outperforms the analytical expressions. For nearly all conditions tested, the bias and variance estimates that are obtained by the EM bootstrap method are closer to their true values than those obtained by the bootstrap and SIMEX methods. We end this paper by discussing the results and possible future extensions of the method.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200003
    Description: We investigate small area prediction of general parameters based on two models for unit-level counts. We construct predictors of parameters, such as quartiles, that may be nonlinear functions of the model response variable. We first develop a procedure to construct empirical best predictors and mean square error estimators of general parameters under a unit-level gamma-Poisson model. We then use a sampling importance resampling algorithm to develop predictors for a generalized linear mixed model (GLMM) with a Poisson response distribution. We compare the two models through simulation and an analysis of data from the Iowa Seat-Belt Use Survey.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200004
    Description: We present a novel methodology to benchmark county-level estimates of crop area totals to a preset state total subject to inequality constraints and random variances in the Fay-Herriot model. For planted area of the National Agricultural Statistics Service (NASS), an agency of the United States Department of Agriculture (USDA), it is necessary to incorporate the constraint that the estimated totals, derived from survey and other auxiliary data, are no smaller than administrative planted area totals prerecorded by other USDA agencies except NASS. These administrative totals are treated as fixed and known, and this additional coherence requirement adds to the complexity of benchmarking the county-level estimates. A fully Bayesian analysis of the Fay-Herriot model offers an appealing way to incorporate the inequality and benchmarking constraints, and to quantify the resulting uncertainties, but sampling from the posterior densities involves difficult integration, and reasonable approximations must be made. First, we describe a single-shrinkage model, shrinking the means while the variances are assumed known. Second, we extend this model to accommodate double shrinkage, borrowing strength across means and variances. This extended model has two sources of extra variation, but because we are shrinking both means and variances, it is expected that this second model should perform better in terms of goodness of fit (reliability) and possibly precision. The computations are challenging for both models, which are applied to simulated data sets with properties resembling the Illinois corn crop.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200005
    Description: Population undercoverage is one of the main hurdles faced by statistical analysis with non-probability survey samples. We discuss two typical scenarios of undercoverage, namely, stochastic undercoverage and deterministic undercoverage. We argue that existing estimation methods under the positivity assumption on the propensity scores (i.e., the participation probabilities) can be directly applied to handle the scenario of stochastic undercoverage. We explore strategies for mitigating biases in estimating the mean of the target population under deterministic undercoverage. In particular, we examine a split population approach based on a convex hull formulation, and construct estimators with reduced biases. A doubly robust estimator can be constructed if a followup subsample of the reference probability survey with measurements on the study variable becomes feasible. Performances of six competing estimators are investigated through a simulation study and issues which require further investigation are briefly discussed.
    Release date: 2024-01-03
Data (9)

Data (9) ((9 results))

No content available at this time.

Analysis (1,874)

Analysis (1,874) (0 to 10 of 1,874 results)

  • Journals and periodicals: 75F0002M
    Description: This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.
    Release date: 2024-04-26

  • Articles and reports: 75F0002M2024005
    Description: The Canadian Income Survey (CIS) has introduced improvements to the methods and data sources used to produce income and poverty estimates with the release of its 2022 reference year estimates. Foremost among these improvements is a significant increase in the sample size for a large subset of the CIS content. The weighting methodology was also improved and the target population of the CIS was changed from persons aged 16 years and over to persons aged 15 years and over. This paper describes the changes made and presents the approximate net result of these changes on the income estimates and data quality of the CIS using 2021 data. The changes described in this paper highlight the ways in which data quality has been improved while having little impact on key CIS estimates and trends.
    Release date: 2024-04-26

  • Articles and reports: 18-001-X2024001
    Description: This study applies small area estimation (SAE) and a new geographic concept called Self-contained Labor Area (SLA) to the Canadian Survey on Business Conditions (CSBC) with a focus on remote work opportunities in rural labor markets. Through SAE modelling, we estimate the proportions of businesses, classified by general industrial sector (service providers and goods producers), that would primarily offer remote work opportunities to their workforce.
    Release date: 2024-04-22

  • Stats in brief: 11-001-X202411338008
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2024-04-22

  • Articles and reports: 11-522-X202200100001
    Description: Record linkage aims at identifying record pairs related to the same unit and observed in two different data sets, say A and B. Fellegi and Sunter (1969) suggest each record pair is tested whether generated from the set of matched or unmatched pairs. The decision function consists of the ratio between m(y) and u(y),probabilities of observing a comparison y of a set of k>3 key identifying variables in a record pair under the assumptions that the pair is a match or a non-match, respectively. These parameters are usually estimated by means of the EM algorithm using as data the comparisons on all the pairs of the Cartesian product ?=A×B. These observations (on the comparisons and on the pairs status as match or non-match) are assumed as generated independently of other pairs, assumption characterizing most of the literature on record linkage and implemented in software tools (e.g. RELAIS, Cibella et al. 2012). On the contrary, comparisons y and matching status in ? are deterministically dependent. As a result, estimates on m(y) and u(y) based on the EM algorithm are usually bad. This fact jeopardizes the effective application of the Fellegi-Sunter method, as well as automatic computation of quality measures and possibility to apply efficient methods for model estimation on linked data (e.g. regression functions), as in Chambers et al. (2015). We propose to explore ? by a set of samples, each one drawn so to preserve independence of comparisons among the selected record pairs. Simulations are encouraging.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100002
    Description: The authors used the Splink probabilistic linkage package developed by the UK Ministry of Justice, to link census data from England and Wales to itself to find duplicate census responses. A large gold standard of confirmed census duplicates was available meaning that the results of the Splink implementation could be quality assured. This paper describes the implementation and features of Splink, gives details of the settings and parameters that we used to tune Splink for our particular project, and gives the results that we obtained.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100003
    Description: Estimation at fine levels of aggregation is necessary to better describe society. Small area estimation model-based approaches that combine sparse survey data with rich data from auxiliary sources have been proven useful to improve the reliability of estimates for small domains. Considered here is a scenario where small area model-based estimates, produced at a given aggregation level, needed to be disaggregated to better describe the social structure at finer levels. For this scenario, an allocation method was developed to implement the disaggregation, overcoming challenges associated with data availability and model development at such fine levels. The method is applied to adult literacy and numeracy estimation at the county-by-group-level, using data from the U.S. Program for the International Assessment of Adult Competencies. In this application the groups are defined in terms of age or education, but the method could be applied to estimation of other equity-deserving groups.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100004
    Description: In accordance with Statistics Canada’s long-term Disaggregated Data Action Plan (DDAP), several initiatives have been implemented into the Labour Force Survey (LFS). One of the more direct initiatives was a targeted increase in the size of the monthly LFS sample. Furthermore, a regular Supplement program was introduced, where an additional series of questions are asked to a subset of LFS respondents and analyzed in a monthly or quarterly production cycle. Finally, the production of modelled estimates based on Small Area Estimation (SAE) methodologies resumed for the LFS and will include a wider scope with more analytical value than what had existed in the past. This paper will give an overview of these three initiatives.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100005
    Description: Sampling variance smoothing is an important topic in small area estimation. In this paper, we propose sampling variance smoothing methods for small area proportion estimation. In particular, we consider the generalized variance function and design effect methods for sampling variance smoothing. We evaluate and compare the smoothed sampling variances and small area estimates based on the smoothed variance estimates through analysis of survey data from Statistics Canada. The results from real data analysis indicate that the proposed sampling variance smoothing methods work very well for small area estimation.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100006
    Description: The Australian Bureau of Statistics (ABS) is committed to improving access to more microdata, while ensuring privacy and confidentiality is maintained, through its virtual DataLab which supports researchers to undertake complex research more efficiently. Currently, the DataLab research outputs need to follow strict rules to minimise disclosure risks for clearance. However, the clerical-review process is not cost effective and has potential to introduce errors. The increasing number of statistical outputs from different projects can potentially introduce differencing risks even though these outputs from different projects have met the strict output rules. The ABS has been exploring the possibility of providing automatic output checking using the ABS cellkey methodology to ensure that all outputs across different projects are protected consistently to minimise differencing risks and reduce costs associated with output checking.
    Release date: 2024-03-25
Reference (363)

Reference (363) (60 to 70 of 363 results)

  • Notices and consultations: 13-605-X201400414107
    Description:

    Beginning in November 2014, International Trade in goods data will be provided on a Balance of Payments (BOP) basis for additional country detail. In publishing this data, BOP-based exports to and imports from 27 countries, referred to as Canada’s Principal Trading Partners (PTPs), will be highlighted for the first time. BOP-based trade in goods data will be available for countries such as China and Mexico, Brazil and India, South Korea, and our largest European Union trading partners, in response to substantial demand for information on these countries in recent years. Until now, Canada’s geographical trading patterns have been examined almost exclusively through analysis of Customs-based trade data. Moreover, BOP trade in goods data for these countries will be available alongside the now quarterly Trade in Services data as well as annual Foreign Direct Investment data for many of these Principal Trading Partners, facilitating country-level international trade and investment analysis using fully comparable data. The objective of this article is to introduce these new measures. This note will first walk users through the key BOP concepts, most importantly the concept of change in ownership. This will serve to familiarize analysts with the Balance of Payments framework for analyzing country-level data, in contrast to Customs-based trade data. Second, some preliminary analysis will be reviewed to illustrate the concepts, with provisional estimates for BOP-based trade with China serving as the principal example. Lastly, we will outline the expansion of quarterly trade in services to generate new estimates of trade for the PTPs and discuss future work in trade statistics.

    Release date: 2014-11-04

  • Surveys and statistical programs – Documentation: 11-522-X201300014258
    Description:

    The National Fuel Consumption Survey (FCS) was created in 2013 and is a quarterly survey that is designed to analyze distance driven and fuel consumption for passenger cars and other vehicles weighing less than 4,500 kilograms. The sampling frame consists of vehicles extracted from the vehicle registration files, which are maintained by provincial ministries. For collection, FCS uses car chips for a part of the sampled units to collect information about the trips and the fuel consumed. There are numerous advantages to using this new technology, for example, reduction in response burden, collection costs and effects on data quality. For the quarters in 2013, the sampled units were surveyed 95% via paper questionnaires and 5% with car chips, and in Q1 2014, 40% of sampled units were surveyed with car chips. This study outlines the methodology of the survey process, examines the advantages and challenges in processing and imputation for the two collection modes, presents some initial results and concludes with a summary of the lessons learned.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014259
    Description:

    In an effort to reduce response burden on farm operators, Statistics Canada is studying alternative approaches to telephone surveys for producing field crop estimates. One option is to publish harvested area and yield estimates in September as is currently done, but to calculate them using models based on satellite and weather data, and data from the July telephone survey. However before adopting such an approach, a method must be found which produces estimates with a sufficient level of accuracy. Research is taking place to investigate different possibilities. Initial research results and issues to consider are discussed in this paper.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014260
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) produces monthly estimates and determines the month-to-month changes for variables such as employment, earnings and hours at detailed industrial levels for Canada, the provinces and territories. In order to improve the efficiency of collection activities for this survey, an electronic questionnaire (EQ) was introduced in the fall of 2012. Given the timeframe allowed for this transition as well as the production calendar of the survey, a conversion strategy was developed for the integration of this new mode. The goal of the strategy was to ensure a good adaptation of the collection environment and also to allow the implementation of a plan of analysis that would evaluate the impact of this change on the results of the survey. This paper will give an overview of the conversion strategy, the different adjustments that were made during the transition period and the results of various evaluations that were conducted. For example, the impact of the integration of the EQ on the collection process, the response rate and the follow-up rate will be presented. In addition, the effect that this new collection mode has on the survey estimates will also be discussed. More specifically, the results of a randomized experiment that was conducted in order to determine the presence of a mode effect will be presented.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014269
    Description:

    The Census Overcoverage Study (COS) is a critical post-census coverage measurement study. Its main objective is to produce estimates of the number of people erroneously enumerated, by province and territory, study the characteristics of individuals counted multiple times and identify possible reasons for the errors. The COS is based on the sampling and clerical review of groups of connected records that are built by linking the census response database to an administrative frame, and to itself. In this paper we describe the new 2011 COS methodology. This methodology has incorporated numerous improvements including a greater use of probabilistic record-linkage, the estimation of linking parameters with an Expectation-Maximization (E-M) algorithm, and the efficient use of household information to detect more overcoverage cases.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014278
    Description:

    In January and February 2014, Statistics Canada conducted a test aiming at measuring the effectiveness of different collection strategies using an online self-reporting survey. Sampled units were contacted using mailed introductory letters and asked to complete the online survey without any interviewer contact. The objectives of this test were to measure the take-up rates for completing an online survey, and to profile the respondents/non-respondents. Different samples and letters were tested to determine the relative effectiveness of the different approaches. The results of this project will be used to inform various social surveys that are preparing to include an internet response option in their surveys. The paper will present the general methodology of the test as well as results observed from collection and the analysis of profiles.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014285
    Description:

    The 2011 National Household Survey (NHS) is a voluntary survey that replaced the traditional mandatory long-form questionnaire of the Canadian census of population. The NHS sampled about 30% of Canadian households and achieved a design-weighted response rate of 77%. In comparison, the last census long form was sent to 20% of households and achieved a response rate of 94%. Based on the long-form data, Statistics Canada traditionally produces two public use microdata files (PUMFs): the individual PUMF and the hierarchical PUMF. Both give information on individuals, but the hierarchical PUMF provides extra information on the household and family relationships between the individuals. To produce two PUMFs, based on the NHS data, that cover the whole country evenly and that do not overlap, we applied a special sub-sampling strategy. Difficulties in the confidentiality analyses have increased because of the numerous new variables, the more detailed geographic information and the voluntary nature of the NHS. This paper describes the 2011 PUMF methodology and how it balances the requirements for more information and for low risk of disclosure.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014290
    Description:

    This paper describes a new module that will project families and households by Aboriginal status using the Demosim microsimulation model. The methodology being considered would assign a household/family headship status annually to each individual and would use the headship rate method to calculate the number of annual families and households by various characteristics and geographies associated with Aboriginal populations.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 13-605-X201400214100
    Description:

    Canadian international merchandise trade data are released monthly and may be revised in subsequent releases as new information becomes available. These data are released approximately 35 days following the close of the reference period and represent one of the timeliest economic indicators produced by Statistics Canada. Given their timeliness, some of the data are not received in time and need to be estimated or modelled. This is the case for imports and exports of crude petroleum and natural gas. More specifically, at the time of release, energy trade data are based on an incomplete set of information and are revised as Statistics Canada and National Energy Board information becomes available in the subsequent months. Due to the increasing importance of energy imports and exports and the timeliness of the data, the revisions to energy prices and volumes are having an increasingly significant impact on the monthly revision to Canada’s trade balance. This note explains how the estimates in the initial release are made when data sources are not yet available, and how the original data are adjusted in subsequent releases.

    Release date: 2014-10-03

  • Surveys and statistical programs – Documentation: 99-011-X2011002
    Description:

    The 2011 NHS Aboriginal Peoples Technical Report deals with: (1) Aboriginal ancestry, (2) Aboriginal identity, (3) Registered Indian status and (4) First Nation/Indian band membership.

    The report contains explanations of concepts, data quality, historical comparability and comparability to other sources, as well as information on data collection, processing and dissemination.

    Release date: 2014-05-28

Browse our partners page to find a complete list of our partners and their associated products.

Date modified: