Data analysis

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Survey or statistical program

55 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (274)

All (274) (40 to 50 of 274 results)

  • Articles and reports: 11-522-X202100100027
    Description:

    Privacy concerns are a barrier to applying remote analytics, including machine learning, on sensitive data via the cloud. In this work, we use a leveled fully Homomorphic Encryption scheme to train an end-to-end supervised machine learning algorithm to classify texts while protecting the privacy of the input data points. We train our single-layer neural network on a large simulated dataset, providing a practical solution to a real-world multi-class text classification task. To improve both accuracy and training time, we train an ensemble of such classifiers in parallel using ciphertext packing.

    Key Words: Privacy Preservation, Machine Learning, Encryption

    Release date: 2021-10-29

  • Articles and reports: 11-522-X202100100022
    Description:

    I provide an overview of the evolution of Statistical Disclosure Control (SDC) research over the last decades and how it has evolved to handle the data revolution with more formal definitions of privacy. I emphasize the many contributions by Chris Skinner in the research areas of SDC. I will review his seminal research, starting in the 1990’s with his work on the release of UK Census sample microdata. This led to a wide-range of research on measuring the risk of re-identification in survey microdata through probabilistic models. I also focus on other aspects of Chris’ research in SDC. Chris was the recipient of the 2019 Waksberg Award and sadly never got a chance to present his Waksberg Lecture at the Statistics Canada International Methodology Symposium. This paper follows the outline that Chris had prepared in preparation for that lecture, and provided to me by his son, Tom Skinner. Keywords: Risk of Re-identification, Data Revolution, Privacy Models, Differential Privacy

    Release date: 2021-10-22

  • Articles and reports: 11-522-X202100100021
    Description: Istat has started a new project for the Short Term statistical processes, to satisfy the coming new EU Regulation to release estimates in a shorter time. The assessment and analysis of the current Short Term Survey on Turnover in Services (FAS) survey process, aims at identifying how the best features of the current methods and practices can be exploited to design a more “efficient” process. In particular, the project is expected to release methods that would allow important economies of scale, scope and knowledge to be applied in general to the STS productive context, usually working with a limited number of resources. The analysis of the AS-IS process revealed that the FAS survey incurs substantial E&I costs, especially due to intensive follow-up and interactive editing that is used for every type of detected errors. In this view, we tried to exploit the lessons learned by participating to the High-Level Group for the Modernisation of Official Statistics (HLG-MOS, UNECE) about the Use of Machine Learning in Official Statistics. In this work, we present a first experiment using Random Forest models to: (i) predict which units represent “suspicious” data, (ii) to assess the prediction potential use over new data and (iii) to explore data to identify hidden rules and patterns. In particular, we focus on the use of Random Forest modelling to compare some alternative methods in terms of error prediction efficiency and to address the major aspects for the new design of the E&I scheme.
    Release date: 2021-10-15

  • Articles and reports: 12-001-X202100100003
    Description:

    One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.

    Release date: 2021-06-24

  • 19-22-0005
    Description:

    In this session, we will attempt to demystify the concept of confidence intervals as they relate to sample data. A practical approach is used, placing emphasis on the meaning and interpretation of results rather than the mathematics. The goal is to make sense of some common challenges faced by data users when interpreting confidence intervals. The session is intended for a beginner audience. Some familiarity with basic statistical concepts would be beneficial/advantageous but not required.

    https://www.statcan.gc.ca/eng/wtc/information/19220005

    Release date: 2021-05-28

  • Stats in brief: 89-20-00062021002
    Description:

    This video is intended for viewers who wish to gain a basic understanding of correlation and causality. As a prerequisite, before beginning this video, we highly recommend having already completed our videos titled “What is Data? An Introduction to Data Terminology and Concepts” and “Types of Data: Understanding and Exploring Data”.

    Release date: 2021-05-03

  • Articles and reports: 11-633-X2021003
    Description:

    Canada continues to experience an opioid crisis. While there is solid information on the demographic and geographic characteristics of people experiencing fatal and non-fatal opioid overdoses in Canada, there is limited information on the social and economic conditions of those who experience these events. To fill this information gap, Statistics Canada collaborated with existing partnerships in British Columbia, including the BC Coroners Service, BC Stats, the BC Centre for Disease Control and the British Columbia Ministry of Health, to create the Statistics Canada British Columbia Opioid Overdose Analytical File (BC-OOAF).

    Release date: 2021-02-17

  • Articles and reports: 11-633-X2021001
    Description:

    Using data from the Canadian Housing Survey, this project aimed to construct a measure of social inclusion, using indicators identified by the Canada Mortgage and Housing Corporation (CMHC), to report a social inclusion score for each geographic stratum separately for dwellings that are and are not in social and affordable housing. This project also sought to examine associations between social inclusion and a set of economic, social and health variables.

    Release date: 2021-01-05

  • Articles and reports: 12-001-X202000200004
    Description:

    This article proposes a weight scaling method for Firth’s penalized likelihood for proportional hazards regression models. The method derives a relationship between the penalized likelihood that uses scaled weights and the penalized likelihood that uses unscaled weights, and it shows that the penalized likelihood that uses scaled weights have some desirable properties. A simulation study indicates that the penalized likelihood using scaled weights produces smaller biases in point estimates and standard errors than the biases produced by the penalized likelihood using unscaled weights. The weighted penalized likelihood is applied to estimate hazard rates for heart attacks by using a public-use data set from the National Health and Epidemiology Followup Study (NHEFS). SAS® statements to estimate hazard rates using data from complex surveys are given in the appendix.

    Release date: 2020-12-15

  • Articles and reports: 89-648-X2020004
    Description:

    This technical report is intended to validate the Longitudinal and International Study of Adults (LISA) Wave 4 (2018) Food Security (FSC) module and provide recommendations for analytical use. Section 2 of this report provides an overview of the LISA data. Section 3 provides some background information of food security measures in national surveys and why it is significant in today's literature. Section 4 analyzes FSC data by presenting key descriptive statistics and logic checks using LISA methodology as well as outside researcher information. In section 5, certification validation was done by comparing other Canadian national surveys that have used the FSC module to the one used by LISA. Finally in section 6, key findings and their implications with regard to LISA are outlined.

    Release date: 2020-11-02
Data (2)

Data (2) ((2 results))

  • Data Visualization: 71-607-X2020010
    Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
    Release date: 2024-08-21

  • Data Visualization: 71-607-X2019010
    Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
    Release date: 2019-10-30
Analysis (246)

Analysis (246) (60 to 70 of 246 results)

  • Articles and reports: 11-637-X202000100015
    Description: As the fifteenth goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss by 2030. This 2020 infographic provides an overview of indicators underlying the fifteenth Sustainable Development Goal in support of life on land, and the statistics and data sources used to monitor and report on this goal in Canada.
    Release date: 2020-10-20

  • Articles and reports: 11-637-X202000100016
    Description: As the sixteenth goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to promote peaceful and inclusive societies for sustainable development, provide access to justice for all and build effective, accountable and inclusive institutions at all levels by 2030. This 2020 infographic provides an overview of indicators underlying the sixteenth Sustainable Development Goal in support of peace, justice and strong institutions, and the statistics and data sources used to monitor and report on this goal in Canada.
    Release date: 2020-10-20

  • Articles and reports: 11-637-X202000100017
    Description: As the seventeenth goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to strengthen the means of implementation and revitalize the Global Partnership for Sustainable Development by 2030. This 2020 infographic provides an overview of indicators underlying the seventeenth Sustainable Development Goal in support of partnerships for the goals, and the statistics and data sources used to monitor and report on this goal in Canada.
    Release date: 2020-10-20

  • Stats in brief: 89-20-00062020004
    Description:

    In this module, we will explore the concepts of data and statistical information, and the differences between them. You will also learn about the different types of data.

    Release date: 2020-09-23

  • Stats in brief: 89-20-00062020009
    Description:

    By the end of this video, you will learn about the basic concepts of the analytical process: the guiding principles of analysis, the steps of the analytical process, and planning your analysis.

    Release date: 2020-09-23

  • Stats in brief: 89-20-00062020010
    Description:

    In this video, you will learn how to implement your analytical plan. The key steps in implementing your plan include: preparing and checking your data, performing your analysis, and documenting your analytical decisions.

    Release date: 2020-09-23

  • Stats in brief: 89-20-00062020011
    Description:

    In this video, you will learn how to summarize and interpret your data and share your findings. The key elements to communicating your findings are as follows: select your essential findings, summarize and interpret the results, organize and assess your reviews, and prepare for dissemination.

    Release date: 2020-09-23

  • Stats in brief: 89-20-00062020012
    Description:

    In this video, we will review the steps of the analytical process and you will obtain a better understanding of how analysts apply each step of the analytical process by walking through an example. The example that we will discuss is a project that examined the relationship between walkability in neighbourhoods, meaning how well they support physical activity, and actual physical activity for Canadians.

    Release date: 2020-09-23

  • Articles and reports: 12-001-X202000100004
    Description:

    Cut-off sampling is applied when there is a subset of units from the population from which getting the required information is too expensive or difficult and, therefore, those units are deliberately excluded from sample selection. If those excluded units are different from the sampled ones in the characteristics of interest, naïve estimators may be severely biased. Calibration estimators have been proposed to reduce the design-bias. However, when estimating in small domains, they can be inefficient even in the absence of cut-off sampling. Model-based small area estimation methods may prove useful for reducing the bias due to cut-off sampling if the assumed model holds for the whole population. At the same time, for small domains, these methods provide more efficient estimators than calibration methods. Since model-based properties are obtained assuming that the model holds but no model is exactly true, here we analyze the design properties of calibration and model-based procedures for estimation of small domain characteristics under cut-off sampling. Our results confirm that model-based estimators reduce the bias due to cut-off sampling and perform significantly better in terms of design mean squared error.

    Release date: 2020-06-30

  • Articles and reports: 12-001-X201900300009
    Description:

    We discuss a relevant inference for the alpha coefficient (Cronbach, 1951) - a popular ratio-type statistic for the covariances and variances in survey sampling including complex survey sampling with unequal selection probabilities. This study can help investigators who wish to evaluate various psychological or social instruments used in large surveys. For the survey data, we investigate workable confidence intervals by using two approaches: (1) the linearization method using the influence function and (2) the coverage-corrected bootstrap method. The linearization method provides adequate coverage rates with correlated ordinal values that many instruments consist of; however, this method may not be as good with some non-normal underlying distributions, e.g., a multi-lognormal distribution. We suggest that the coverage-corrected bootstrap method can be used as a complement to the linearization method, because the coverage-corrected bootstrap method is computer-intensive. Using the developed methods, we provide the confidence intervals for the alpha coefficient to assess various mental health instruments (Kessler 10, Kessler 6 and Sheehan Disability Scale) for different demographics using data from the National Comorbidity Survey Replication (NCS-R).

    Release date: 2019-12-17
Reference (22)

Reference (22) (10 to 20 of 22 results)

  • Surveys and statistical programs – Documentation: 15-206-X2006004
    Description:

    This paper provides a brief description of the methodology currently used to produce the annual volume of hours worked consistent with the System of National Accounts (SNA). These data are used for labour input in the annual and quarterly measures of labour productivity, as well as in the annual measures of multifactor productivity. For this purpose, hours worked are broken down by educational level and age group, so that changes in the composition of the labour force can be taken into account. They are also used to calculate hourly compensation and the unit labour cost and for simulations of the SNA Input-Output Model; as such, they are integrated as labour force inputs into most SNA satellite accounts (i.e., environment, tourism).

    Release date: 2006-10-27

  • Surveys and statistical programs – Documentation: 62F0026M2005005
    Description:

    This discussion paper reviews the previous research into the subject of presenting historical time series and comparisons in constant dollars for the Survey of Household Spending (SHS), and its predecessor the Family Expenditure Survey (FAMEX). It examines two principal methods of converting spending data into constant dollars. The purpose of this discussion paper is to show interested parties how the two methods differ in complexity of implementation and interpretation.

    Release date: 2005-07-15

  • Notices and consultations: 12-002-X20050018033
    Description:

    Dr. J. Douglas Willms, and his staff at the Canadian Research Institute for Social Policy (CRISP) at the University of New Brunswick (Fredericton Campus), have developed a set of files for researchers interested in using Statistics Canada's National Longitudinal Survey of Children and Youth (NLSCY) data sets. "The Files" consist of SPSS data and syntax, which are intended to assist researchers in conducting more efficient longitudinal analyses, using NLSCY data.

    Release date: 2005-06-23

  • Surveys and statistical programs – Documentation: 62F0026M2005001
    Description:

    This paper provides some guidance to users on the use of medians and also gives some examples of situations when it can be a more appropriate measure than the average.

    Release date: 2005-05-17

  • Surveys and statistical programs – Documentation: 81-595-M2004020
    Geography: Canada
    Description:

    This article discusses the collection and interpretation of statistical data on Canada's trade in culture goods. It defines the products that are included in culture trade and explains how appropriate products are selected from the relevant classification standards.

    This version has been replaced by Culture Goods Trade Data User Guide, Catalogue No. 81-595-MIE2006040.

    Release date: 2004-07-28

  • Surveys and statistical programs – Documentation: 92-388-X
    Description:

    This report contains basic conceptual and data quality information to help users interpret and make use of census occupation data. It gives an overview of the collection, coding (to the 2001 National Occupational Classification), edit and imputation of the occupation data from the 2001 Census. The report describes procedural changes between the 2001 and earlier censuses, and provides an analysis of the quality level of the 2001 Census occupation data. Finally, it details the revision of the 1991 Standard Occupational Classification used in the 1991 and 1996 Censuses to the 2001 National Occupational Classification for Statistics used in 2001. The historical comparability of data coded to the two classifications is discussed. Appendices to the report include a table showing historical data for the 1991, 1996 and 2001 Censuses.

    Release date: 2004-07-15

  • Surveys and statistical programs – Documentation: 11F0019M2003207
    Geography: Canada
    Description:

    The estimation of intergenerational earnings mobility is rife with measurement problems since the research does not observe permanent, lifetime earnings. Nearly all studies make corrections for mean variation in earnings because of the age differences among respondents. Recent works employ average earnings or instrumental variable methods to address the effects of measurement error as a result of transitory earnings shocks and mis-reporting. However, empirical studies of intergenerational mobility have paid no attention to the changes in earnings variance across the life cycle suggested by economic models of human capital investment.

    Using information from the Intergenerational Income Data from Canada and the National Longitudinal Survey and Panel Study of Income Dynamics from the United States, this study finds a strong association between age at observation and estimated earnings persistence. Part of this age-dependence is related to a general increase in transitory earnings variance during the collection of data. An independent effect of life cycle investment is also identified. These findings are then applied to the variation among intergenerational earnings persistence studies. Among studies with similar methodologies, one-third of the variance in published estimates of earnings persistence is attributable to cross-study differences in the age of responding fathers. Finally, these results call into question tests for the importance of credit constraints based on measures of earnings at different points in the life cycle.

    Release date: 2003-08-05

  • Surveys and statistical programs – Documentation: 12-584-G
    Description:

    This book introduces technical aspects of the Statistics Canada Total Work Accounts System (TWAS). The TWAS is designed to facilitate the analysis of issues that require simultaneous consideration of both paid work and unpaid productive work. Its key contribution is to allocate the deemed output of each episode of unpaid work activity to a specific beneficiary or group of beneficiaries (called "destinations"). The guide presents the criteria used to decide the allocation of each work episode to one of the destinations, as well as the pseudo code for DESTIN, the key variable of the System. This pseudo code allows programmers to quickly create the actual programming code needed to derive the DESTIN variable in their own microdata files of diary-based time-use records. The guide also discusses illustrative applications of the System, as well as its key limitations.

    Release date: 2002-02-12

  • Notices and consultations: 87-003-X19970012882
    Geography: Canada
    Description:

    The purpose of this article is to inform Travel-log readers of the availability of a new analytical tool - the National Tourism Indicators. These estimates, which measure trends in tourism in Canada, are placed in perspective here, taking into account the concepts and definitions used in developing them.

    Release date: 1997-01-08

  • Surveys and statistical programs – Documentation: 11F0019M1995083
    Geography: Canada
    Description:

    This paper examines the robustness of a measure of the average complete duration of unemployment in Canada to a host of assumptions used in its derivation. In contrast to the average incomplete duration of unemployment, which is a lagging cyclical indicator, this statistic is a coincident indicator of the business cycle. The impact of using a steady state as opposed to a non steady state assumption, as well as the impact of various corrections for response bias are explored. It is concluded that a non steady state estimator would be a valuable compliment to the statistics on unemployment duration that are currently released by many statistical agencies, and particularly Statistics Canada.

    Release date: 1995-12-30
Date modified: