Data analysis

Results

All (289)

All (289) (40 to 50 of 289 results)

41. Confidentiality Vetting Support: Proportion and round tool using SAS Archived
Stats in brief: 89-20-00082021002
Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
42. Confidentiality Vetting Support: Rounding proportions using Stata Archived
Stats in brief: 89-20-00082021003
Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
43. Confidentiality Vetting Support: Dominance and homogeneity using the tcensus function (Stata) Archived
Stats in brief: 89-20-00082021004
Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-27
44. Confidentiality Vetting Support: Rounding proportions using Rounder An R Shiny App Archived
Stats in brief: 89-20-00082021005
Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
45. Confidentiality Vetting Support: Dominance and homogeneity using R Archived
Stats in brief: 89-20-00082021006
Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-27
46. The Journey of Statistics Canada Data: Why it Matters to You Archived
Stats in brief: 11-627-M2022016
Description:
This infographic explains the steps involved in collecting data for all Statistics Canada household and business surveys. The responses are compiled, analyzed and used to make important decisions and are kept strictly confidential.

Release date: 2022-02-28
47. Data science pipelines @ Istat: challenges and solutions Archived
Articles and reports: 11-522-X202100100029
Description:
In line with the path taken by the European Statistical System, Istat is investing on innovative methods to harness Big Data sources and to use them for the production of new and enriched Official Statistics products. Big Data sources are not, in general, directly tractable with traditional statistical techniques, just think of specific data types such as images and texts that are examples of the Variety dimension of Big Data. This motivates and justifies the growing interest of National Statistical Institutes in data science techniques. Istat is currently using data science techniques, including machine learning techniques, in innovation projects and for the publication of experimental statistics. This paper will provide an overview of the main current projects by Istat and will focus on two specific Big Data-based production pipelines, related to the processing of respectively text sources and imagery sources. The paper will highlight the main challenges these two pipelines and the solutions put in place to solve them.

Key Words: Machine Learning; Text Processing; Image Processing; Big Data

Release date: 2021-11-05
48. An Approximate Bayesian Approach to Improving Probability Sample Estimators Using a Supplementary Non-Probability Sample Archived
Articles and reports: 11-522-X202100100008
Description:
Non-probability samples are being increasingly explored by National Statistical Offices as a complement to probability samples. We consider the scenario where the variable of interest and auxiliary variables are observed in both a probability and non-probability sample. Our objective is to use data from the non-probability sample to improve the efficiency of survey-weighted estimates obtained from the probability sample. Recently, Sakshaug, Wisniowski, Ruiz and Blom (2019) and Wisniowski, Sakshaug, Ruiz and Blom (2020) proposed a Bayesian approach to integrating data from both samples for the estimation of model parameters. In their approach, non-probability sample data are used to determine the prior distribution of model parameters, and the posterior distribution is obtained under the assumption that the probability sampling design is ignorable (or not informative). We extend this Bayesian approach to the prediction of finite population parameters under non-ignorable (or informative) sampling by conditioning on appropriate survey-weighted statistics. We illustrate the properties of our predictor through a simulation study.
Key Words: Bayesian prediction; Gibbs sampling; Non-ignorable sampling; Statistical data integration.

Release date: 2021-10-29
49. Relative Performance of Methods Based on Model-Assisted Survey Regression Estimation Archived
Articles and reports: 11-522-X202100100009
Description:
Use of auxiliary data to improve the efficiency of estimators of totals and means through model-assisted survey regression estimation has received considerable attention in recent years. Generalized regression (GREG) estimators, based on a working linear regression model, are currently used in establishment surveys at Statistics Canada and several other statistical agencies. GREG estimators use common survey weights for all study variables and calibrate to known population totals of auxiliary variables. Increasingly, many auxiliary variables are available, some of which may be extraneous. This leads to unstable GREG weights when all the available auxiliary variables, including interactions among categorical variables, are used in the working linear regression model. On the other hand, new machine learning methods, such as regression trees and lasso, automatically select significant auxiliary variables and lead to stable nonnegative weights and possible efficiency gains over GREG. In this paper, a simulation study, based on a real business survey sample data set treated as the target population, is conducted to study the relative performance of GREG, regression trees and lasso in terms of efficiency of the estimators.
Key Words: Model assisted inference; calibration estimation; model selection; generalized regression estimator.

Release date: 2021-10-29
50. Nowcasting Finnish real economic activity using traffic loop data Archived
Articles and reports: 11-522-X202100100018
Description: Statistics Finland started publishing nowcasts of the trend indicator of output (TIO), the monthly indicator of real economic activity, to answer users´ needs during the Covid-19 pandemic. The indicator was first published in April 2020, at the very beginning of the pandemic in Finland, and had a monthly release schedule until June 2021. The TIO nowcasts are produced using open-source data on truck traffic volumes at about 100 automatic measuring points in the Helsinki/Uusimaa -region and the Economic Sentiment Indicator for Finland. Estimation is done using a machine learning approach and the methodology is based on previous work done by Statistics Finland and ETLA Economic Research.

Key Words: nowcasting; flash estimates; machine learning; experimental statistics.
Release date: 2021-10-29

Data (2)

Data (2) ((2 results))

1. Canadian Statistical Geospatial Explorer Hub Archived
Data Visualization: 71-607-X2020010
Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
Release date: 2024-08-21
2. Housing Data Viewer Archived
Data Visualization: 71-607-X2019010
Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
Release date: 2019-10-30

Analysis (256)

Analysis (256) (170 to 180 of 256 results)

171. Comparing a rate in a subpopulation to the rate in the full population: How it may be done when using survey data, and available software tools Archived
Articles and reports: 12-002-X20050018030
Description:
People often wish to use survey micro-data to study whether the rate of occurrence of a particular condition in a subpopulation is the same as the rate of occurrence in the full population. This paper describes some alternatives for making inferences about such a rate difference and shows whether and how these alternatives may be implemented in three different survey software packages. The software packages illustrated - SUDAAN, WesVar and Bootvar - all can make use of bootstrap weights provided by the analyst to carry out variance estimation.
Release date: 2005-06-23
172. Revisions of the Canadian National Tourism Indicators Archived
Stats in brief: 13-604-M2005047
Description:
This paper discusses the revision policy of Canada's National Tourism Indicators (NTI) and summarizes results from some recent studies of data revisions to the NTI. The discussion is timely, as the adoption of explicit data revision policies has been emphasized recently as an essential element in the good governance of statistical systems.
The paper starts with a brief description of the NTI, their underlying conceptual framework, and their sources and methods. Next comes a discussion of the need for data revisions, and an outline of various types of revisions. Then a few sections are devoted to the new NTI revision policy adopted with the first quarter 2004 estimates, and the associated costs and benefits. Revision studies, which have been used to assess quality of national accounts estimates, and the database established to track data revisions to the NTI are described next. Last, results from some recent NTI data revision exercises and studies are summarized.
Release date: 2005-01-28
173. Research and analysis to better understand data collection activities Archived
Articles and reports: 11-522-X20030017604
Description:
This paper explains the scope, objectives and challenges of research and analysis on operations at Statistics Canada and gives some examples of the work accomplished to date.
Release date: 2005-01-26
174. A design-based procedure for the analysis of experiments embedded in complex sample surveys Archived
Articles and reports: 11-522-X20030017702
Description:
This paper proposes a procedure to test hypotheses about differences between sample estimates observed under alternative survey methodologies.
Release date: 2005-01-26
175. A Comparison of Canadian and U.S. Productivity Levels: An Exploration of Measurement Issues Archived
Articles and reports: 11F0027M2005028
Geography: Canada
Description:
This paper examines the level of labour productivity in Canada relative to that of the United States in 1999. In doing so, it addresses two main issues. The first is the comparability of the measures of GDP and labour inputs that the statistical agency in each country produces. Second, it investigates how a price index can be constructed to reconcile estimates of Canadian and U.S. GDP per hour worked that are calculated in Canadian and U.S. dollars respectively. After doing so, and taking into account alternative assumptions about Canada/U.S. prices, the paper provides point estimates of Canada's relative labour productivity of the total economy of around 93% that of the United States. The paper points out that at least a 10 percentage point confidence interval should be applied to these estimates. The size of the range is particularly sensitive to assumptions that are made about import and export prices.
Release date: 2005-01-20
176. Describing the Distribution of Income: Guidelines for Effective Analysis Archived
Articles and reports: 75F0002M2004010
Description:
This document offers a set of guidelines for analysing income distributions. It focuses on the basic intuition of the concepts and techniques instead of the equations and technical details.
Release date: 2004-10-08
177. Comparison of design-based and model-based methods in analyzing complex health survey data: A case study Archived
Articles and reports: 11-522-X20020016708
Description:
In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).
The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.
Release date: 2004-09-13
178. Interval censoring of smoking cessation in the National Population Health Survey Archived
Articles and reports: 11-522-X20020016712
Description:
In this paper, we consider the effect of the interval censoring of cessation time on intensity parameter estimation with regard to smoking cessation and pregnancy. The three waves of the National Population Health Survey allow the methodology of event history analysis to be applied to smoking initiation, cessation and relapse. One issue of interest is the relationship between smoking cessation and pregnancy. If a longitudinal respondent who is a smoker at the first cycle ceases smoking by the second cycle, we know the cessation time to within an interval of length at most a year, since the respondent is asked for the age at which she stopped smoking, and her date of birth is known. We also know whether she is pregnant at the time of the second cycle, and whether she has given birth since the time of the first cycle. For many such subjects, we know the date of conception to within a relatively small interval. If we knew the time of smoking cessation and pregnancy period exactly for each member who experienced one or other of these events between cycles, we could model their temporal relationship through their joint intensities.
Release date: 2004-09-13
179. Application of the delete-a-group jackknife variance estimator to analyses of data from a complex longitudinal survey Archived
Articles and reports: 11-522-X20020016714
Description:
In this highly technical paper, we illustrate the application of the delete-a-group jack-knife variance estimator approach to a particular complex multi-wave longitudinal study, demonstrating its utility for linear regression and other analytic models. The delete-a-group jack-knife variance estimator is proving a very useful tool for measuring variances under complex sampling designs. This technique divides the first-phase sample into mutually exclusive and nearly equal variance groups, deletes one group at a time to create a set of replicates and makes analogous weighting adjustments in each replicate to those done for the sample as a whole. Variance estimation proceeds in the standard (unstratified) jack-knife fashion.
Our application is to the Chicago Health and Aging Project (CHAP), a community-based longitudinal study examining risk factors for chronic health problems of older adults. A major aim of the study is the investigation of risk factors for incident Alzheimer's disease. The current design of CHAP has two components: (1) Every three years, all surviving members of the cohort are interviewed on a variety of health-related topics. These interviews include cognitive and physical function measures. (2) At each of these waves of data collection, a stratified Poisson sample is drawn from among the respondents to the full population interview for detailed clinical evaluation and neuropsychological testing. To investigate risk factors for incident disease, a 'disease-free' cohort is identified at the preceding time point and forms one major stratum in the sampling frame.
We provide proofs of the theoretical applicability of the delete-a-group jack-knife for particular estimators under this Poisson design, paying needed attention to the distinction between finite-population and infinite-population (model) inference. In addition, we examine the issue of determining the 'right number' of variance groups.
Release date: 2004-09-13
180. A comparison of approaches to modelling health and environment Archived
Articles and reports: 11-522-X20020016719
Description:
This study takes a look at the modelling methods used for public health data. Public health has a renewed interest in the impact of the environment on health. Ecological or contextual studies ideally investigate these relationships using public health data augmented with environmental characteristics in multilevel or hierarchical models. In these models, individual respondents in health data are the first level and community data are the second level. Most public health data use complex sample survey designs, which require analyses accounting for the clustering, nonresponse, and poststratification to obtain representative estimates of prevalence of health risk behaviours.
This study uses the Behavioral Risk Factor Surveillance System (BRFSS), a state-specific US health risk factor surveillance system conducted by the Center for Disease Control and Prevention, which assesses health risk factors in over 200,000 adults annually. BRFSS data are now available at the metropolitan statistical area (MSA) level and provide quality health information for studies of environmental effects. MSA-level analyses combining health and environmental data are further complicated by joint requirements of the survey sample design and the multilevel analyses.
We compare three modelling methods in a study of physical activity and selected environmental factors using BRFSS 2000 data. Each of the methods described here is a valid way to analyse complex sample survey data augmented with environmental information, although each accounts for the survey design and multilevel data structure in a different manner and is thus appropriate for slightly different research questions.
Release date: 2004-09-13

Reference (26)

Reference (26) (20 to 30 of 26 results)

21. Life Cycle Bias in the Estimation of Intergenerational Earnings Persistence Archived
Surveys and statistical programs – Documentation: 11F0019M2003207
Geography: Canada
Description:
The estimation of intergenerational earnings mobility is rife with measurement problems since the research does not observe permanent, lifetime earnings. Nearly all studies make corrections for mean variation in earnings because of the age differences among respondents. Recent works employ average earnings or instrumental variable methods to address the effects of measurement error as a result of transitory earnings shocks and mis-reporting. However, empirical studies of intergenerational mobility have paid no attention to the changes in earnings variance across the life cycle suggested by economic models of human capital investment.
Using information from the Intergenerational Income Data from Canada and the National Longitudinal Survey and Panel Study of Income Dynamics from the United States, this study finds a strong association between age at observation and estimated earnings persistence. Part of this age-dependence is related to a general increase in transitory earnings variance during the collection of data. An independent effect of life cycle investment is also identified. These findings are then applied to the variation among intergenerational earnings persistence studies. Among studies with similar methodologies, one-third of the variance in published estimates of earnings persistence is attributable to cross-study differences in the age of responding fathers. Finally, these results call into question tests for the importance of credit constraints based on measures of earnings at different points in the life cycle.
Release date: 2003-08-05
22. Statistics Canada Total Work Accounts System: Technical Guide to the 1998 Edition Archived
Surveys and statistical programs – Documentation: 12-584-G
Description:
This book introduces technical aspects of the Statistics Canada Total Work Accounts System (TWAS). The TWAS is designed to facilitate the analysis of issues that require simultaneous consideration of both paid work and unpaid productive work. Its key contribution is to allocate the deemed output of each episode of unpaid work activity to a specific beneficiary or group of beneficiaries (called "destinations"). The guide presents the criteria used to decide the allocation of each work episode to one of the destinations, as well as the pseudo code for DESTIN, the key variable of the System. This pseudo code allows programmers to quickly create the actual programming code needed to derive the DESTIN variable in their own microdata files of diary-based time-use records. The guide also discusses illustrative applications of the System, as well as its key limitations.
Release date: 2002-02-12
23. National tourism indicators: A new tool for analysing tourism in Canada Archived
Notices and consultations: 87-003-X19970012882
Geography: Canada
Description:
The purpose of this article is to inform Travel-log readers of the availability of a new analytical tool - the National Tourism Indicators. These estimates, which measure trends in tourism in Canada, are placed in perspective here, taking into account the concepts and definitions used in developing them.
Release date: 1997-01-08
24. Alternative Measures of the Average Duration of Unemployment Archived
Surveys and statistical programs – Documentation: 11F0019M1995083
Geography: Canada
Description:
This paper examines the robustness of a measure of the average complete duration of unemployment in Canada to a host of assumptions used in its derivation. In contrast to the average incomplete duration of unemployment, which is a lagging cyclical indicator, this statistic is a coincident indicator of the business cycle. The impact of using a steady state as opposed to a non steady state assumption, as well as the impact of various corrections for response bias are explored. It is concluded that a non steady state estimator would be a valuable compliment to the statistics on unemployment duration that are currently released by many statistical agencies, and particularly Statistics Canada.
Release date: 1995-12-30
25. Labour Force Classification in the Survey of Labour and Income Dynamics (SLID): Evaluation of Test 3A Results Archived
Surveys and statistical programs – Documentation: 75F0002M1993014
Description:
This paper presents the results from test 3A of the Survey of Labour and Income Dynamics (SLID), conducted in January 1993, with a view to identify any necessary changes to the questions or to the algorithm used to derive labour force status.
Release date: 1995-12-30
26. SLID Derived Variables: Demographic, Cultural and Geographic Archived
Surveys and statistical programs – Documentation: 75F0002M1994018
Description:
This document describes the demographic, cultural and geographic derived variables for the Survey of Labour and Income Dynamics (SLID).
Release date: 1995-12-30

Date modified:: 2026-06-23

Language selection

WxT Language switcher

Search and menus

WxT Search form

Data analysis

Filter results by

Keyword(s)

Type

Geography

Survey or statistical program

Content

Results

All (289) (40 to 50 of 289 results)

Data (2) ((2 results))

Analysis (256) (170 to 180 of 256 results)

Reference (26) (20 to 30 of 26 results)

Data analysis

Filter results by

Keyword(s)

Type

Geography

Survey or statistical program

Content

Results

All (289) (40 to 50 of 289 results)

Data (2) ((2 results))

Analysis (256) (170 to 180 of 256 results)

Reference (26) (20 to 30 of 26 results)

How are the results ordered?

How are the results ordered?

How do I use the filters and the search box?

How do I refine my search?

How does the search work?