Statistical methods

Key indicators

Changing any selection will automatically update the page content.

Selected geographical area: Canada

Selected geographical area: Newfoundland and Labrador

Selected geographical area: Prince Edward Island

Selected geographical area: Nova Scotia

Selected geographical area: New Brunswick

Selected geographical area: Quebec

Selected geographical area: Ontario

Selected geographical area: Manitoba

Selected geographical area: Saskatchewan

Selected geographical area: Alberta

Selected geographical area: British Columbia

Selected geographical area: Yukon

Selected geographical area: Northwest Territories

Selected geographical area: Nunavut

Sort Help
entries

Results

All (2,299)

All (2,299) (30 to 40 of 2,299 results)

  • Stats in brief: 11-637-X
    Description: This product presents data on the Sustainable Development Goals. They present an overview of the 17 Goals through infographics by leveraging data currently available to report on Canada’s progress towards the 2030 Agenda for Sustainable Development.
    Release date: 2024-01-25

  • Stats in brief: 11-001-X202402237898
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2024-01-22

  • Articles and reports: 11-633-X2024001
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.
    Release date: 2024-01-22

  • Articles and reports: 13-604-M2024001
    Description: This documentation outlines the methodology used to develop the Distributions of household economic accounts published in January 2024 for the reference years 2010 to 2023. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.
    Release date: 2024-01-22

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2024-01-22

  • Articles and reports: 12-001-X202300200001
    Description: When a Medicare healthcare provider is suspected of billing abuse, a population of payments X made to that provider over a fixed timeframe is isolated. A certified medical reviewer, in a time-consuming process, can determine the overpayment Y = X - (amount justified by the evidence) associated with each payment. Typically, there are too many payments in the population to examine each with care, so a probability sample is selected. The sample overpayments are then used to calculate a 90% lower confidence bound for the total population overpayment. This bound is the amount demanded for recovery from the provider. Unfortunately, classical methods for calculating this bound sometimes fail to provide the 90% confidence level, especially when using a stratified sample.

    In this paper, 166 redacted samples from Medicare integrity investigations are displayed and described, along with 156 associated payment populations. The 7,588 examined (Y, X) sample pairs show (1) Medicare audits have high error rates: more than 76% of these payments were considered to have been paid in error; and (2) the patterns in these samples support an “All-or-Nothing” mixture model for (Y, X) previously defined in the literature. Model-based Monte Carlo testing procedures for Medicare sampling plans are discussed, as well as stratification methods based on anticipated model moments. In terms of viability (achieving the 90% confidence level) a new stratification method defined here is competitive with the best of the many existing methods tested and seems less sensitive to choice of operating parameters. In terms of overpayment recovery (equivalent to precision) the new method is also comparable to the best of the many existing methods tested. Unfortunately, no stratification algorithm tested was ever viable for more than about half of the 104 test populations.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200002
    Description: Being able to quantify the accuracy (bias, variance) of published output is crucial in official statistics. Output in official statistics is nearly always divided into subpopulations according to some classification variable, such as mean income by categories of educational level. Such output is also referred to as domain statistics. In the current paper, we limit ourselves to binary classification variables. In practice, misclassifications occur and these contribute to the bias and variance of domain statistics. Existing analytical and numerical methods to estimate this effect have two disadvantages. The first disadvantage is that they require that the misclassification probabilities are known beforehand and the second is that the bias and variance estimates are biased themselves. In the current paper we present a new method, a Gaussian mixture model estimated by an Expectation-Maximisation (EM) algorithm combined with a bootstrap, referred to as the EM bootstrap method. This new method does not require that the misclassification probabilities are known beforehand, although it is more efficient when a small audit sample is used that yields a starting value for the misclassification probabilities in the EM algorithm. We compared the performance of the new method with currently available numerical methods: the bootstrap method and the SIMEX method. Previous research has shown that for non-linear parameters the bootstrap outperforms the analytical expressions. For nearly all conditions tested, the bias and variance estimates that are obtained by the EM bootstrap method are closer to their true values than those obtained by the bootstrap and SIMEX methods. We end this paper by discussing the results and possible future extensions of the method.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200003
    Description: We investigate small area prediction of general parameters based on two models for unit-level counts. We construct predictors of parameters, such as quartiles, that may be nonlinear functions of the model response variable. We first develop a procedure to construct empirical best predictors and mean square error estimators of general parameters under a unit-level gamma-Poisson model. We then use a sampling importance resampling algorithm to develop predictors for a generalized linear mixed model (GLMM) with a Poisson response distribution. We compare the two models through simulation and an analysis of data from the Iowa Seat-Belt Use Survey.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200004
    Description: We present a novel methodology to benchmark county-level estimates of crop area totals to a preset state total subject to inequality constraints and random variances in the Fay-Herriot model. For planted area of the National Agricultural Statistics Service (NASS), an agency of the United States Department of Agriculture (USDA), it is necessary to incorporate the constraint that the estimated totals, derived from survey and other auxiliary data, are no smaller than administrative planted area totals prerecorded by other USDA agencies except NASS. These administrative totals are treated as fixed and known, and this additional coherence requirement adds to the complexity of benchmarking the county-level estimates. A fully Bayesian analysis of the Fay-Herriot model offers an appealing way to incorporate the inequality and benchmarking constraints, and to quantify the resulting uncertainties, but sampling from the posterior densities involves difficult integration, and reasonable approximations must be made. First, we describe a single-shrinkage model, shrinking the means while the variances are assumed known. Second, we extend this model to accommodate double shrinkage, borrowing strength across means and variances. This extended model has two sources of extra variation, but because we are shrinking both means and variances, it is expected that this second model should perform better in terms of goodness of fit (reliability) and possibly precision. The computations are challenging for both models, which are applied to simulated data sets with properties resembling the Illinois corn crop.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200005
    Description: Population undercoverage is one of the main hurdles faced by statistical analysis with non-probability survey samples. We discuss two typical scenarios of undercoverage, namely, stochastic undercoverage and deterministic undercoverage. We argue that existing estimation methods under the positivity assumption on the propensity scores (i.e., the participation probabilities) can be directly applied to handle the scenario of stochastic undercoverage. We explore strategies for mitigating biases in estimating the mean of the target population under deterministic undercoverage. In particular, we examine a split population approach based on a convex hull formulation, and construct estimators with reduced biases. A doubly robust estimator can be constructed if a followup subsample of the reference probability survey with measurements on the study variable becomes feasible. Performances of six competing estimators are investigated through a simulation study and issues which require further investigation are briefly discussed.
    Release date: 2024-01-03
Data (9)

Data (9) ((9 results))

No content available at this time.

Analysis (1,874)

Analysis (1,874) (60 to 70 of 1,874 results)

  • Articles and reports: 12-001-X202300100005
    Description: Weight smoothing is a useful technique in improving the efficiency of design-based estimators at the risk of bias due to model misspecification. As an extension of the work of Kim and Skinner (2013), we propose using weight smoothing to construct the conditional likelihood for efficient analytic inference under informative sampling. The Beta prime distribution can be used to build a parameter model for weights in the sample. A score test is developed to test for model misspecification in the weight model. A pretest estimator using the score test can be developed naturally. The pretest estimator is nearly unbiased and can be more efficient than the design-based estimator when the weight model is correctly specified, or the original weights are highly variable. A limited simulation study is presented to investigate the performance of the proposed methods.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100006
    Description: My comments consist of three components: (1) A brief account of my professional association with Chris Skinner. (2) Observations on Skinner’s contributions to statistical disclosure control, (3) Some comments on making inferences from masked survey data.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100007
    Description: I provide an overview of the evolution of Statistical Disclosure Control (SDC) research over the last decades and how it has evolved to handle the data revolution with more formal definitions of privacy. I emphasize the many contributions by Chris Skinner in the research areas of SDC. I review his seminal research, starting in the 1990’s with his work on the release of UK Census sample microdata. This led to a wide-range of research on measuring the risk of re-identification in survey microdata through probabilistic models. I also focus on other aspects of Chris’ research in SDC. Chris was the recipient of the 2019 Waksberg Award and sadly never got a chance to present his Waksberg Lecture at the Statistics Canada International Methodology Symposium. This paper follows the outline that Chris had prepared in preparation for that lecture.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100008
    Description: This brief tribute reviews Chris Skinner’s main scientific contributions.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100009
    Description: In this paper, with and without-replacement versions of adaptive proportional to size sampling are presented. Unbiased estimators are developed for these methods and their properties are studied. In the two versions, the drawing probabilities are adapted during the sampling process based on the observations already selected. To this end, in the version with-replacement, after each draw and observation of the variable of interest, the vector of the auxiliary variable will be updated using the observed values of the variable of interest to approximate the exact selection probability proportional to size. For the without-replacement version, first, using an initial sample, we model the relationship between the variable of interest and the auxiliary variable. Then, utilizing this relationship, we estimate the unknown (unobserved) population units. Finally, on these estimated population units, we select a new sample proportional to size without-replacement. These approaches can significantly improve the efficiency of designs not only in the case of a positive linear relationship, but also in the case of a non-linear or negative linear relationship between the variables. We investigate the efficiencies of the designs through simulations and real case studies on medicinal flowers, social and economic data.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100010
    Description: Precise and unbiased estimates of response propensities (RPs) play a decisive role in the monitoring, analysis, and adaptation of data collection. In a fixed survey climate, those parameters are stable and their estimates ultimately converge when sufficient historic data is collected. In survey practice, however, response rates gradually vary in time. Understanding time-dependent variation in predicting response rates is key when adapting survey design. This paper illuminates time-dependent variation in response rates through multi-level time-series models. Reliable predictions can be generated by learning from historic time series and updating with new data in a Bayesian framework. As an illustrative case study, we focus on Web response rates in the Dutch Health Survey from 2014 to 2019.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100011
    Description: The definition of statistical units is a recurring issue in the domain of sample surveys. Indeed, not all the populations surveyed have a readily available sampling frame. For some populations, the sampled units are distinct from the observation units and producing estimates on the population of interest raises complex questions, which can be addressed by using the weight share method (Deville and Lavallée, 2006). However, the two populations considered in this approach are discrete. In some fields of study, the sampled population is continuous: this is for example the case of forest inventories for which, frequently, the trees surveyed are those located on plots of which the centers are points randomly drawn in a given area. The production of statistical estimates from the sample of trees surveyed poses methodological difficulties, as do the associated variance calculations. The purpose of this paper is to generalize the weight share method to the continuous (sampled population) ? discrete (surveyed population) case, from the extension proposed by Cordy (1993) of the Horvitz-Thompson estimator for drawing points carried out in a continuous universe.
    Release date: 2023-06-30

  • Articles and reports: 75F0002M2022003
    Description: This discussion paper describes the proposed methodology for a Northern Market Basket Measure (MBM-N) for Nunavut, as well as identifies research which could be conducted in preparation for the 2023 review. The paper presents initial MBM-N thresholds and provides preliminary poverty estimates for reference years 2018 to 2021. A review period will follow the release of this paper, during which time Statistics Canada and Employment and Social Development Canada will welcome feedback from interested parties and work with experts, stakeholders, indigenous organizations, federal, provincial and territorial officials to validate the results.
    Release date: 2023-06-21

  • Articles and reports: 11F0019M2023003
    Description: This study combines survey and administrative data to examine the correspondence between paid-employment and self-employment activities reported in each of these data sources by the same individuals. The study also looks at the role of self-employment as a supplemental income source for individuals whose self-declared main labour market activity is wage employment.
    Release date: 2023-06-06

  • Articles and reports: 75F0002M2023001
    Description: This discussion paper describes the work being achieved and undertaken by Statistics Canada, in partnership with the Treasury Board of Canada Secretariat, the Department of Finance Canada and the Privy Council Office, on developing the Quality of Life Framework for Canada and related outputs, including an online Hub. This is the first paper in a series that will provide updates on the progress of work relating to the Framework.
    Release date: 2023-04-19
Reference (363)

Reference (363) (60 to 70 of 363 results)

  • Notices and consultations: 13-605-X201400414107
    Description:

    Beginning in November 2014, International Trade in goods data will be provided on a Balance of Payments (BOP) basis for additional country detail. In publishing this data, BOP-based exports to and imports from 27 countries, referred to as Canada’s Principal Trading Partners (PTPs), will be highlighted for the first time. BOP-based trade in goods data will be available for countries such as China and Mexico, Brazil and India, South Korea, and our largest European Union trading partners, in response to substantial demand for information on these countries in recent years. Until now, Canada’s geographical trading patterns have been examined almost exclusively through analysis of Customs-based trade data. Moreover, BOP trade in goods data for these countries will be available alongside the now quarterly Trade in Services data as well as annual Foreign Direct Investment data for many of these Principal Trading Partners, facilitating country-level international trade and investment analysis using fully comparable data. The objective of this article is to introduce these new measures. This note will first walk users through the key BOP concepts, most importantly the concept of change in ownership. This will serve to familiarize analysts with the Balance of Payments framework for analyzing country-level data, in contrast to Customs-based trade data. Second, some preliminary analysis will be reviewed to illustrate the concepts, with provisional estimates for BOP-based trade with China serving as the principal example. Lastly, we will outline the expansion of quarterly trade in services to generate new estimates of trade for the PTPs and discuss future work in trade statistics.

    Release date: 2014-11-04

  • Surveys and statistical programs – Documentation: 11-522-X201300014258
    Description:

    The National Fuel Consumption Survey (FCS) was created in 2013 and is a quarterly survey that is designed to analyze distance driven and fuel consumption for passenger cars and other vehicles weighing less than 4,500 kilograms. The sampling frame consists of vehicles extracted from the vehicle registration files, which are maintained by provincial ministries. For collection, FCS uses car chips for a part of the sampled units to collect information about the trips and the fuel consumed. There are numerous advantages to using this new technology, for example, reduction in response burden, collection costs and effects on data quality. For the quarters in 2013, the sampled units were surveyed 95% via paper questionnaires and 5% with car chips, and in Q1 2014, 40% of sampled units were surveyed with car chips. This study outlines the methodology of the survey process, examines the advantages and challenges in processing and imputation for the two collection modes, presents some initial results and concludes with a summary of the lessons learned.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014259
    Description:

    In an effort to reduce response burden on farm operators, Statistics Canada is studying alternative approaches to telephone surveys for producing field crop estimates. One option is to publish harvested area and yield estimates in September as is currently done, but to calculate them using models based on satellite and weather data, and data from the July telephone survey. However before adopting such an approach, a method must be found which produces estimates with a sufficient level of accuracy. Research is taking place to investigate different possibilities. Initial research results and issues to consider are discussed in this paper.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014260
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) produces monthly estimates and determines the month-to-month changes for variables such as employment, earnings and hours at detailed industrial levels for Canada, the provinces and territories. In order to improve the efficiency of collection activities for this survey, an electronic questionnaire (EQ) was introduced in the fall of 2012. Given the timeframe allowed for this transition as well as the production calendar of the survey, a conversion strategy was developed for the integration of this new mode. The goal of the strategy was to ensure a good adaptation of the collection environment and also to allow the implementation of a plan of analysis that would evaluate the impact of this change on the results of the survey. This paper will give an overview of the conversion strategy, the different adjustments that were made during the transition period and the results of various evaluations that were conducted. For example, the impact of the integration of the EQ on the collection process, the response rate and the follow-up rate will be presented. In addition, the effect that this new collection mode has on the survey estimates will also be discussed. More specifically, the results of a randomized experiment that was conducted in order to determine the presence of a mode effect will be presented.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014269
    Description:

    The Census Overcoverage Study (COS) is a critical post-census coverage measurement study. Its main objective is to produce estimates of the number of people erroneously enumerated, by province and territory, study the characteristics of individuals counted multiple times and identify possible reasons for the errors. The COS is based on the sampling and clerical review of groups of connected records that are built by linking the census response database to an administrative frame, and to itself. In this paper we describe the new 2011 COS methodology. This methodology has incorporated numerous improvements including a greater use of probabilistic record-linkage, the estimation of linking parameters with an Expectation-Maximization (E-M) algorithm, and the efficient use of household information to detect more overcoverage cases.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014278
    Description:

    In January and February 2014, Statistics Canada conducted a test aiming at measuring the effectiveness of different collection strategies using an online self-reporting survey. Sampled units were contacted using mailed introductory letters and asked to complete the online survey without any interviewer contact. The objectives of this test were to measure the take-up rates for completing an online survey, and to profile the respondents/non-respondents. Different samples and letters were tested to determine the relative effectiveness of the different approaches. The results of this project will be used to inform various social surveys that are preparing to include an internet response option in their surveys. The paper will present the general methodology of the test as well as results observed from collection and the analysis of profiles.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014285
    Description:

    The 2011 National Household Survey (NHS) is a voluntary survey that replaced the traditional mandatory long-form questionnaire of the Canadian census of population. The NHS sampled about 30% of Canadian households and achieved a design-weighted response rate of 77%. In comparison, the last census long form was sent to 20% of households and achieved a response rate of 94%. Based on the long-form data, Statistics Canada traditionally produces two public use microdata files (PUMFs): the individual PUMF and the hierarchical PUMF. Both give information on individuals, but the hierarchical PUMF provides extra information on the household and family relationships between the individuals. To produce two PUMFs, based on the NHS data, that cover the whole country evenly and that do not overlap, we applied a special sub-sampling strategy. Difficulties in the confidentiality analyses have increased because of the numerous new variables, the more detailed geographic information and the voluntary nature of the NHS. This paper describes the 2011 PUMF methodology and how it balances the requirements for more information and for low risk of disclosure.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014290
    Description:

    This paper describes a new module that will project families and households by Aboriginal status using the Demosim microsimulation model. The methodology being considered would assign a household/family headship status annually to each individual and would use the headship rate method to calculate the number of annual families and households by various characteristics and geographies associated with Aboriginal populations.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 13-605-X201400214100
    Description:

    Canadian international merchandise trade data are released monthly and may be revised in subsequent releases as new information becomes available. These data are released approximately 35 days following the close of the reference period and represent one of the timeliest economic indicators produced by Statistics Canada. Given their timeliness, some of the data are not received in time and need to be estimated or modelled. This is the case for imports and exports of crude petroleum and natural gas. More specifically, at the time of release, energy trade data are based on an incomplete set of information and are revised as Statistics Canada and National Energy Board information becomes available in the subsequent months. Due to the increasing importance of energy imports and exports and the timeliness of the data, the revisions to energy prices and volumes are having an increasingly significant impact on the monthly revision to Canada’s trade balance. This note explains how the estimates in the initial release are made when data sources are not yet available, and how the original data are adjusted in subsequent releases.

    Release date: 2014-10-03

  • Surveys and statistical programs – Documentation: 99-011-X2011002
    Description:

    The 2011 NHS Aboriginal Peoples Technical Report deals with: (1) Aboriginal ancestry, (2) Aboriginal identity, (3) Registered Indian status and (4) First Nation/Indian band membership.

    The report contains explanations of concepts, data quality, historical comparability and comparability to other sources, as well as information on data collection, processing and dissemination.

    Release date: 2014-05-28

Browse our partners page to find a complete list of our partners and their associated products.

Date modified: