Data analysis

Skip to filters. View results.

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Survey or statistical program

56 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (289)

All (289) (0 to 10 of 289 results)

  • Articles and reports: 36-28-0001202600500003
    Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.
    Release date: 2026-05-27

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2026-05-27

  • Surveys and statistical programs – Documentation: 11-633-X2026001
    Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
    Release date: 2026-03-05

  • Articles and reports: 12-001-X202500200004
    Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.
    Release date: 2025-12-23

  • Stats in brief: 89-20-00062025001
    Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.
    Release date: 2025-12-15

  • Articles and reports: 11-522-X202500100010
    Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
    Release date: 2025-09-08

  • Articles and reports: 36-28-0001202500300002
    Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.
    Release date: 2025-03-26

  • Articles and reports: 12-001-X202400200004
    Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.
    Release date: 2024-12-20

  • Articles and reports: 12-001-X202400200005
    Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.
    Release date: 2024-12-20

  • Surveys and statistical programs – Documentation: 11-633-X2024004
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
    Release date: 2024-12-09
Data (2)

Data (2) ((2 results))

  • Data Visualization: 71-607-X2020010
    Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
    Release date: 2024-08-21

  • Data Visualization: 71-607-X2019010
    Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
    Release date: 2019-10-30
Analysis (256)

Analysis (256) (190 to 200 of 256 results)

  • Articles and reports: 11-522-X20020016746
    Description:

    In 1961, the European Commission launched a harmonized qualitative survey program to the consumers and the heads of companies (industry, services, construction, retail trade, investments) that covers more than 40 countries today. These qualitative surveys are aimed at understanding the economic situation of these companies. Results are available a few days after the end of the reference period, well before the results of the quantitative surveys.

    Although qualitative, these surveys have quickly become an essential tool of the cyclical diagnosis and of the short-term economic forecast. This product shows how these surveys are used by the European Commission, in particular by the Directorate-General for economic and financial Affairs (DG ECFIN) and the Statistical Office of the European Communities (EUROSTAT), to evaluate the economic situation of the Euro zone.

    The first part of this product briefly presents the harmonized European business and consumer survey program. In the second part, we look at how DG ECFIN calculates a coincident indicator of the economic activity, using a dynamic factorial analysis of the questions of the survey in industry. This type of indicator makes it possible, in addition, to study the convergence of the economic cycles of the member states. The quantitative short-term indicators for the Euro zone are often criticized for the delay with which they are published. In the third part, we look at how EUROSTAT plans to publish flash estimates of the industrial product price index (IPPI) resulting from econometric models integrating the business survey series. Lastly, we show how these surveys can be used to forecast the gross domestic product (GDP) and to define proxies for some non-available key indicators (new orders in industry, etc.).

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016748
    Description:

    Practitioners often use data collected from complex surveys (such as labour force and health surveys involving stratified cluster sampling) to fit logistic regression and other models of interest. A great deal of effort over the last two decades has been spent on developing methods to analyse survey data that take account of design features. This paper looks at an alternative method known as inverse sampling.

    Specialized programs, such as SUDAAN and WESVAR, are also available to implement some of the methods developed to take into account the design features. However, these methods require additional information such as survey weights, design effects or cluster identification of microdata and thus, another method is necessary.

    Inverse sampling (Hinkins et al., Survey Methodology, 1977) provides an alternative approach by undoing the complex data structures so that standard methods can be applied. Repeated subsamples with simple random structure are drawn and each subsample is analysed by standard methods and is combined to increase the efficiency. Although computer-intensive, this method has the potential to preserve confidentiality of microdata files. A drawback of the method is that it can lead to biased estimates of regression parameters when the subsample sizes are small (as in the case of stratified cluster sampling).

    In this paper, we propose using the estimating equation approach that combines the subsamples before estimation and thus leads to nearly unbiased estimates of regression parameters regardless of subsample sizes. This method is computationally less intensive than the original method. We apply the method to cluster-correlated data generated from a nested error linear regression model to illustrate its advantages. A real dataset from a Statistics Canada survey will also be analysed using the estimating equation method.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016749
    Description:

    Survey sampling is a statistical domain that has been slow to take advantage of flexible regression methods. In this technical paper, two approaches are discussed that could be used to make these regression methods accessible: adapt the techniques to the complex survey design that has been used or sample the survey data so that the standard techniques are applicable.

    In following the former route, we introduce techniques that account for the complex survey structure of the data for scatterplot smoothing and additive models. The use of penalized least squares in the sampling context is studied as a tool for the analysis of a general trend in a finite population. We focus on smooth regression with a normal error model. Ties in covariates abound for large scale surveys resulting in the application of scatterplot smoothers to means. The estimation of smooths (for example, smoothing splines) depends on the sampling design only via the sampling weights, meaning that standard software can be used for estimation. Inference for these curves is more challenging, as a result of correlations induced by the sampling design. We propose and illustrate tests that account for the sampling design. Illustrative examples are given using the Ontario health survey, including scatterplot smoothing, additive models and model diagnostics. In an attempt to resolve the problem by appropriate sampling of the survey data file, we discuss some of the hurdles that are faced when using this approach.

    Release date: 2004-09-13

  • Articles and reports: 12-001-X20040016998
    Description:

    The Canadian Labour Force Survey (LFS) was not designed to be a longitudinal survey. However, given that respondent households typically remain in the sample for six consecutive months, it is possible to reconstruct six-month fragments of longitudinal data from the monthly records of household members. Such longitudinal micro-data - altogether consisting of millions of person-months of individual and family level data - is useful for analyses of monthly labour market dynamics over relatively long periods of time, 25 years and more.

    We make use of these data to estimate hazard functions describing transitions among the labour market states: self-employed, paid employee and not employed. Data on job tenure, for employed respondents, and on the date last worked, for those not employed - together with the date of survey responses - allow the construction of models that include terms reflecting seasonality and macro-economic cycles as well as the duration dependence of each type of transition. In addition, the LFS data permits spouse labour market activity and family composition variables to be included in the hazard models as time-varying covariates. The estimated hazard equations have been incorporated in the LifePaths microsimulation model. In that setting, the equations have been used to simulate lifetime employment activity from past, present and future birth cohorts. Simulation results have been validated by comparison with the age profiles of LFS employment/population ratios for the period 1976 to 2001.

    Release date: 2004-07-14

  • Articles and reports: 12-002-X20040016904
    Description:

    This article provides a practical example of the development of a survival analysis model. It begins with an overview of the software tool that was used, SAS. The next section examines the construction of a longitudinal file and the challenges that may present. Of particular interest are explanatory variables that do not have a constant value over time. An example of a practical application is provided to illustrate the survival approach. The example consists of an analysis based on data from the Survey of Labour and Income Dynamics (SLID), specifically data from panel 1 between January 1993 and December 1998. Survey information in vector form is used to develop a Cox semi-parametric model. Comments are provided on a sample computer program. The way in which the program handles the main variables is also discussed. The last section contains a brief description of the results of a relatively simple model.

    Release date: 2004-04-15

  • Articles and reports: 12-002-X20040016905
    Description:

    Large data sets present several challenges to researchers, particularly to less experienced researchers. One of the most time-consuming and frustrating activities for beginning researchers, who do not have experience with large datasets, can be pruning or parsing the data set to only the variables and sample of interest. The production of an 'efficient' data file can assist in the increased performance of hardware and software as well as reducing frustration for the researcher. One way of producing a parsed efficient data file using a program called Stat/Transfer is presented.

    Release date: 2004-04-15

  • Articles and reports: 81-003-X20020026525
    Geography: Canada
    Description:

    This report discusses the methodology for the development of two school engagement scales using items from the National Longitudinal Survey of Children and Youth (NLSCY) teachers' questionnaire.

    Release date: 2003-06-11

  • Journals and periodicals: 85F0036X
    Geography: Canada
    Description:

    This study documents the methodological and technical challenges that are involved in performing analysis on small groups using a sample survey, oversampling, response rate, non-response rate due to language, release feasibility and sampling variability. It is based on the 1999 General Social Survey (GSS) on victimization.

    Release date: 2002-05-14

  • Articles and reports: 92F0138M2001001
    Description:

    Traditionally, Statistics Canada uses standard geographic areas as "containers" for the dissemination of statistical data. However, geographic structures are often used as variables in general applications, for example, to document the rural and urban population in a specific area such as an incorporated municipality (census subdivision). They are not often cross-tabulated with each other to illustrate and analyse specific social and economic processes, for example, the settlement patterns of the population inside and outside of larger urban centres broken down by urban and rural areas.The introduction of the census metropolitan area and census agglomeration influenced zone (MIZ) concept presents additional opportunities to use geographic structures as variables to analyse census data.The objectives of this working paper are to illustrate the advantages of using geographic structures as variables to better analyse social and economic processes and to initiate a discussion in the user community about using these variables and the potential of this largely untapped capability of the Census databases. In order to achieve these objectives, four examples of geography as a variable are presented. The examples include Aboriginal persons living on-reserve and off-reserve in urban and rural areas in Canada, the unemployment rate of persons living in urban and rural areas in Canada, the gross rent of renter households in urban and rural areas in Canada, and the migration flows of persons 15 to 24 years of age between major urban centres and rural and small town areas (MIZ).Our intent is to encourage the use of geographic structures as census variables in order to provide users with the tools that will enable them to more accurately analyse the social and economic processes that take place in the geographic areas of Canada.

    Release date: 2001-03-16

  • Articles and reports: 12-001-X20000025534
    Description:

    The primary goal of this research is to investigate the validity of Markov latent class analysis (MLCA) estimates of labor force classification error and to evaluate the efficacy of MLC analysis as an alternative to traditional methods for evaluating data quality.

    Release date: 2001-02-28
Reference (26)

Reference (26) (0 to 10 of 26 results)

  • Surveys and statistical programs – Documentation: 11-633-X2026001
    Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
    Release date: 2026-03-05

  • Surveys and statistical programs – Documentation: 11-633-X2024004
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
    Release date: 2024-12-09

  • Surveys and statistical programs – Documentation: 11-633-X2024001
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.
    Release date: 2024-01-22

  • Surveys and statistical programs – Documentation: 32-26-0006
    Description: This report provides data quality information pertaining to the Agriculture–Population Linkage, such as sources of error, matching process, response rates, imputation rates, sampling, weighting, disclosure control methods and data quality indicators.
    Release date: 2023-08-25

  • Surveys and statistical programs – Documentation: 98-20-00032021011
    Description: This video explains the key concepts of different levels of aggregation of income data such as household and family income; income concepts derived from key income variables such as adjusted income and equivalence scale; and statistics used for income data such as median and average income, quartiles, quintiles, deciles and percentiles.
    Release date: 2023-03-29

  • Surveys and statistical programs – Documentation: 98-20-00032021012
    Description: This video builds on concepts introduced in the other videos on income. It explains key low-income concepts - Market Basket Measure (MBM), Low income measure (LIM) and Low-income cut-offs (LICO) and the indicators associated with these concepts such as the low-income gap and the low-income ratio. These concepts are used in analysis of the economic well-being of the population.
    Release date: 2023-03-29

  • Surveys and statistical programs – Documentation: 11-633-X2022009
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.

    This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.

    Release date: 2022-12-05

  • Notices and consultations: 98-26-0001
    Description:

    This white paper presents Statistics Canada’s planned approach to the 2021 Census of Population and provides a clear explanation of the processes behind the census program, touching on historical, legal, operational and content aspects. Statistics Canada recognizes that it is important to not only successfully conduct the census, but also to be transparent and informative about the way in which those efforts are accomplished. Painting a Portrait of Canada: The 2021 Census of Population gives readers an exclusive, detailed look at how census data is collected, analyzed and given back to Canadians, in the form of high-quality statistical information, used to make evidence-based decisions in Canadian society.

    Release date: 2020-07-20

  • Surveys and statistical programs – Documentation: 91F0015M2016012
    Description:

    This article provides information on using family-related variables from the microdata files of Canada’s Census of Population. These files exist internally at Statistics Canada, in the Research Data Centres (RDCs), and as public-use microdata files (PUMFs). This article explains certain technical aspects of all three versions, including the creation of multi-level variables for analytical purposes.

    Release date: 2016-12-22

  • Surveys and statistical programs – Documentation: 11-522-X201700014710
    Description:

    The Data Warehouse has modernized the way the Canadian System of Macroeconomic Accounts (MEA) are produced and analyzed today. Its continuing evolution facilitates the amounts and types of analytical work that is done within the MEA. It brings in the needed element of harmonization and confrontation as the macroeconomic accounts move toward full integration. The improvements in quality, transparency, and timeliness have strengthened the statistics that are being disseminated.

    Release date: 2016-03-24