Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Author(s)

183 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (342)

All (342) (40 to 50 of 342 results)

  • Articles and reports: 11-633-X2019003
    Description:

    This report provides an overview of the definitions and competency frameworks of data literacy, as well as the assessment tools used to measure it. These are based on the existing literature and current practices around the world. Data literacy, or the ability to derive meaningful information from data, is a relatively new concept. However, it is gaining increasing recognition as a vital skillset in the information age. Existing approaches to measuring data literacy—from self-assessment tools to objective measures, and from individual to organizational assessments—are discussed in this report to inform the development of an assessment tool for data literacy in the Canadian public service.

    Release date: 2019-08-14

  • Articles and reports: 13-605-X201900100009
    Description:

    In this paper a preliminary set of statistical estimates of the amounts invested in Canadian data, databases and data science in recent years are presented. The results indicate rapid growth in investment in data, databases and data science over the last three decades and a significant accumulation of these kinds of capital over time.

    Release date: 2019-07-10

  • Articles and reports: 12-001-X201900200003
    Description:

    Merging available sources of information is becoming increasingly important for improving estimates of population characteristics in a variety of fields. In presence of several independent probability samples from a finite population we investigate options for a combined estimator of the population total, based on either a linear combination of the separate estimators or on the combined sample approach. A linear combination estimator based on estimated variances can be biased as the separate estimators of the population total can be highly correlated to their respective variance estimators. We illustrate the possibility to use the combined sample to estimate the variances of the separate estimators, which results in general pooled variance estimators. These pooled variance estimators use all available information and have potential to significantly reduce bias of a linear combination of separate estimators.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201900200008
    Description:

    High nonresponse occurs in many sample surveys today, including important surveys carried out by government statistical agencies. An adaptive data collection can be advantageous in those conditions: Lower nonresponse bias in survey estimates can be gained, up to a point, by producing a well-balanced set of respondents. Auxiliary variables serve a twofold purpose: Used in the estimation phase, through calibrated adjustment weighting, they reduce, but do not entirely remove, the bias. In the preceding adaptive data collection phase, auxiliary variables also play a major role: They are instrumental in reducing the imbalance in the ultimate set of respondents. For such combined use of auxiliary variables, the deviation of the calibrated estimate from the unbiased estimate (under full response) is studied in the article. We show that this deviation is a sum of two components. The reducible component can be decreased through adaptive data collection, all the way to zero if perfectly balanced response is realized with respect to a chosen auxiliary vector. By contrast, the resisting component changes little or not at all by a better balanced response; it represents a part of the deviation that adaptive design does not get rid of. The relative size of the former component is an indicator of the potential payoff from an adaptive survey design.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201900200009
    Description:

    In recent years, there has been a strong interest in indirect measures of nonresponse bias in surveys or other forms of data collection. This interest originates from gradually decreasing propensities to respond to surveys parallel to pressures on survey budgets. These developments led to a growing focus on the representativeness or balance of the responding sample units with respect to relevant auxiliary variables. One example of a measure is the representativeness indicator, or R-indicator. The R-indicator is based on the design-weighted sample variation of estimated response propensities. It pre-supposes linked auxiliary data. One of the criticisms of the indicator is that it cannot be used in settings where auxiliary information is available only at the population level. In this paper, we propose a new method for estimating response propensities that does not need auxiliary information for non-respondents to the survey and is based on population auxiliary information. These population-based response propensities can then be used to develop R-indicators that employ population contingency tables or population frequency counts. We discuss the statistical properties of the indicators, and evaluate their performance using an evaluation study based on real census data and an application from the Dutch Health Survey.

    Release date: 2019-06-27

  • Articles and reports: 13-605-X201900100008
    Description:

    This paper aims to expand the current national accounting concepts and statistical methods for measuring data in order to shed light on some highly consequential changes in society that are related to the rising usage of data. The paper concludes by discussing possible methods that can be used to assign an economic value to the various elements in the information chain and tests these concepts and methods by presenting results for Canada as a first attempt to measure the value of data.

    Release date: 2019-06-24

  • Articles and reports: 11-633-X2019002
    Description:

    Survey data collection through mobile devices, such as tablets and smartphones, is underway in Canada. However, little is known about the representativeness of the data collected through these devices. In March 2017, Statistics Canada commissioned survey data collection through the Carrot Rewards Application and included 11 questions on the Carrot Rewards Mobile App Survey (Carrot) drawn from the 2017 Canadian Community Health Survey (CCHS).

    Release date: 2019-06-04

  • Articles and reports: 12-001-X201900100006
    Description:

    The empirical predictor under an area level version of the generalized linear mixed model (GLMM) is extensively used in small area estimation (SAE) for counts. However, this approach does not use the sampling weights or clustering information that are essential for valid inference given the informative samples produced by modern complex survey designs. This paper describes an SAE method that incorporates this sampling information when estimating small area proportions or counts under an area level version of the GLMM. The approach is further extended under a spatial dependent version of the GLMM (SGLMM). The mean squared error (MSE) estimation for this method is also discussed. This SAE method is then applied to estimate the extent of household poverty in different districts of the rural part of the state of Uttar Pradesh in India by linking data from the 2011-12 Household Consumer Expenditure Survey collected by the National Sample Survey Office (NSSO) of India, and the 2011 Indian Population Census. Results from this application indicate a substantial gain in precision for the new methods compared to the direct survey estimates.

    Release date: 2019-05-07

  • Articles and reports: 13-605-X201900100005
    Description:

    The last global financial crisis revealed some important data gaps in countries’ statistics to properly assess the build-up of risk and the interconnectedness in financial markets. These gaps have led to the development of a series of initiatives at the international level with clear deliverables to enhance the quality of the information produced by countries in the area of financial investment statistics, including statistics on securities. The initiative undertaken at Statistics Canada to enhance securities statistics has produced many benefits with important expansions in terms of additional characteristics of instruments issued and held by Canadians, additional sector details as well as increased frequency and timeliness.

    Release date: 2019-04-16

  • Articles and reports: 12-001-X201800254956
    Description:

    In Italy, the Labor Force Survey (LFS) is conducted quarterly by the National Statistical Institute (ISTAT) to produce estimates of the labor force status of the population at different geographical levels. In particular, ISTAT provides LFS estimates of employed and unemployed counts for local Labor Market Areas (LMAs). LMAs are 611 sub-regional clusters of municipalities and are unplanned domains for which direct estimates have overly large sampling errors. This implies the need of Small Area Estimation (SAE) methods. In this paper, we develop a new area level SAE method that uses a Latent Markov Model (LMM) as linking model. In LMMs, the characteristic of interest, and its evolution in time, is represented by a latent process that follows a Markov chain, usually of first order. Therefore, areas are allowed to change their latent state across time. The proposed model is applied to quarterly data from the LFS for the period 2004 to 2014 and fitted within a hierarchical Bayesian framework using a data augmentation Gibbs sampler. Estimates are compared with those obtained by the classical Fay-Herriot model, by a time-series area level SAE model, and on the basis of data coming from the 2011 Population Census.

    Release date: 2018-12-20
Stats in brief (2)

Stats in brief (2) ((2 results))

  • Stats in brief: 89-20-00062023001
    Description: This course is intended for Government of Canada employees who would like to learn about evaluating the quality of data for a particular use. Whether you are a new employee interested in learning the basics, or an experienced subject matter expert looking to refresh your skills, this course is here to help.
    Release date: 2023-07-17

  • Stats in brief: 89-20-00062022003
    Description:

    By the end of this video you will understand what confidence intervals are, why we use them, and what factors have an impact on them.

    Release date: 2022-05-24
Articles and reports (337)

Articles and reports (337) (50 to 60 of 337 results)

  • Articles and reports: 12-001-X201700254871
    Description:

    In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. This model also makes it possible to utilize the higher frequency of the social media to produce more precise estimates for the sample survey in real time at the moment that statistics for the social media become available but the sample data are not yet available. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700254897
    Description:

    This note by Chris Skinner presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 years.

    Release date: 2017-12-21

  • Articles and reports: 13-605-X201700114840
    Description:

    Statistics Canada is presently preparing the statistical system to be able to gauge the impact of the transition from illegal to legal non-medical cannabis use and to shed light on the social and economic activities related to the use of cannabis thereafter. While the system of social statistics captures some information on the use of cannabis, updates will be required to more accurately measure health effects and the impact on the judicial system. Current statistical infrastructure used to more comprehensively measure the use and impacts of substances such as tobacco and alcohol could be adapted to do the same for cannabis. However, available economic statistics are largely silent on the role illegal drugs play in the economy. Both social and economic statistics will need to be updated to reflect the legalization of cannabis and the challenge is especially great for economic statistics This paper provides a summary of the work that is now under way toward these ends.

    Release date: 2017-09-15

  • Articles and reports: 11-633-X2017008
    Description:

    The DYSEM microsimulation modelling platform provides a demographic and socioeconomic core that can be readily built upon to develop custom dynamic microsimulation models or applications. This paper describes DYSEM and provides an overview of its intended uses, as well as the methods and data used in its development.

    Release date: 2017-07-28

  • Articles and reports: 12-001-X201600214663
    Description:

    We present theoretical evidence that efforts during data collection to balance the survey response with respect to selected auxiliary variables will improve the chances for low nonresponse bias in the estimates that are ultimately produced by calibrated weighting. One of our results shows that the variance of the bias – measured here as the deviation of the calibration estimator from the (unrealized) full-sample unbiased estimator – decreases linearly as a function of the response imbalance that we assume measured and controlled continuously over the data collection period. An attractive prospect is thus a lower risk of bias if one can manage the data collection to get low imbalance. The theoretical results are validated in a simulation study with real data from an Estonian household survey.

    Release date: 2016-12-20

  • Articles and reports: 11-633-X2016003
    Description:

    Large national mortality cohorts are used to estimate mortality rates for different socioeconomic and population groups, and to conduct research on environmental health. In 2008, Statistics Canada created a cohort linking the 1991 Census to mortality. The present study describes a linkage of the 2001 Census long-form questionnaire respondents aged 19 years and older to the T1 Personal Master File and the Amalgamated Mortality Database. The linkage tracks all deaths over a 10.6-year period (until the end of 2011, to date).

    Release date: 2016-10-26

  • Articles and reports: 11-633-X2016002
    Description:

    Immigrants comprise an ever-increasing percentage of the Canadian population—at more than 20%, which is the highest percentage among the G8 countries (Statistics Canada 2013a). This figure is expected to rise to 25% to 28% by 2031, when at least one in four people living in Canada will be foreign-born (Statistics Canada 2010).

    This report summarizes the linkage of the Immigrant Landing File (ILF) for all provinces and territories, excluding Quebec, to hospital data from the Discharge Abstract Database (DAD), a national database containing information about hospital inpatient and day-surgery events. A deterministic exact-matching approach was used to link data from the 1980-to-2006 ILF and from the DAD (2006/2007, 2007/2008 and 2008/2009) with the 2006 Census, which served as a “bridge” file. This was a secondary linkage in that it used linkage keys created in two previous projects (primary linkages) that separately linked the ILF and the DAD to the 2006 Census. The ILF–DAD linked data were validated by means of a representative sample of 2006 Census records containing immigrant information previously linked to the DAD.

    Release date: 2016-08-17

  • Articles and reports: 11-633-X2016001
    Description:

    Every year, thousands of workers lose their jobs as firms reduce the size of their workforce in response to growing competition, technological changes, changing trade patterns and numerous other factors. Thousands of workers also start a job with a new employer as new firms enter a product market and existing firms expand or replace employees who recently left. This worker reallocation process across employers is generally seen as contributing to productivity growth and rising living standards. To measure this labour reallocation process, labour market indicators such as hiring rates and layoff rates are needed. In response to growing demand for subprovincial labour market information and taking advantage of unique administrative datasets, Statistics Canada is producing hiring rates and layoff rates by economic region of residence. This document describes the data sources, conceptual and methodological issues, and other matters pertaining to these two indicators.

    Release date: 2016-06-27

  • Articles and reports: 12-001-X201600114538
    Description:

    The aim of automatic editing is to use a computer to detect and amend erroneous values in a data set, without human intervention. Most automatic editing methods that are currently used in official statistics are based on the seminal work of Fellegi and Holt (1976). Applications of this methodology in practice have shown systematic differences between data that are edited manually and automatically, because human editors may perform complex edit operations. In this paper, a generalization of the Fellegi-Holt paradigm is proposed that can incorporate a large class of edit operations in a natural way. In addition, an algorithm is outlined that solves the resulting generalized error localization problem. It is hoped that this generalization may be used to increase the suitability of automatic editing in practice, and hence to improve the efficiency of data editing processes. Some first results on synthetic data are promising in this respect.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114545
    Description:

    The estimation of quantiles is an important topic not only in the regression framework, but also in sampling theory. A natural alternative or addition to quantiles are expectiles. Expectiles as a generalization of the mean have become popular during the last years as they not only give a more detailed picture of the data than the ordinary mean, but also can serve as a basis to calculate quantiles by using their close relationship. We show, how to estimate expectiles under sampling with unequal probabilities and how expectiles can be used to estimate the distribution function. The resulting fitted distribution function estimator can be inverted leading to quantile estimates. We run a simulation study to investigate and compare the efficiency of the expectile based estimator.

    Release date: 2016-06-22
Journals and periodicals (3)

Journals and periodicals (3) ((3 results))

  • Journals and periodicals: 12-605-X
    Description:

    The Record Linkage Project Process Model (RLPPM) was developed by Statistics Canada to identify the processes and activities involved in record linkage. The RLPPM applies to linkage projects conducted at the individual and enterprise level using diverse data sources to create new data sources to meet analytical and operational needs.

    Release date: 2017-06-05

  • Journals and periodicals: 89-639-X
    Geography: Canada
    Description:

    Beginning in late 2006, the Social and Aboriginal Statistics Division of Statistics Canada embarked on the process of review of questions used in the Census and in surveys to produce data about Aboriginal peoples (North American Indian, Métis and Inuit). This process is essential to ensure that Aboriginal identification questions are valid measures of contemporary Aboriginal identification, in all its complexity. Questions reviewed included the following (from the Census 2B questionnaire):- the Ethnic origin / Aboriginal ancestry question;- the Aboriginal identity question;- the Treaty / Registered Indian question; and- the Indian band / First Nation Membership question.

    Additional testing was conducted on Census questions with potential Aboriginal response options: the population group question (also known as visible minorities), and the Religion question. The review process to date has involved two major steps: regional discussions with data users and stakeholders, and qualitative testing. The regional discussions with over 350 users of Aboriginal data across Canada were held in early 2007 to examine the four questions used on the Census and other surveys of Statistics Canada. Data users included National Aboriginal organizations, Aboriginal Provincial and Territorial Organizations, Federal, Provincial and local governments, researchers and Aboriginal service organizations. User feedback showed that main areas of concern were data quality, undercoverage, the wording of questions, and the importance of comparability over time.

    Release date: 2009-04-17

  • Journals and periodicals: 89-629-X
    Geography: Canada
    Description:

    This report summarizes the main issues raised in these meetings. Four questions used to identify Aboriginal people from the Census and surveys were considered in the discussions.Statistics Canada regularly reviews the questions used on the Census and other surveys to ensure that the resulting data are representative of the population. As a first step in the process to review the questions used to produce data about First Nations, Inuit and Métis populations, regional discussions were held with more than 350 users of Aboriginal data in over 40 locations across Canada during the winter, spring and early summer of 2007.

    This report summarizes the main issues raised in these meetings. Four questions used to identify Aboriginal people from the Census and surveys were considered in the discussions.

    Release date: 2008-05-27
Date modified: