Analysis

COVID-19 A data perspective

COVID-19: A data perspective: Explore key economic trends and social challenges that arise as the COVID-19 situation evolves.

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (126)

All (126) (0 to 10 of 126 results)

  • Stats in brief: 89-20-00062022004
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. In this video, we will discuss the importance of considering data ethics throughout the process of producing statistical information.

    As a pre-requisite to this video, make sure to watch the video titled “Data Ethics: An introduction” also available in Statistics Canada’s data literacy training catalogue.

    Release date: 2022-10-17

  • Articles and reports: 11-633-X2022007
    Description:

    This paper investigates how Statistics Canada can increase trust by giving users the ability to authenticate data from its website through digital signatures and blockchain technology.

    Release date: 2022-09-19

  • Articles and reports: 12-001-X202200100002
    Description:

    We consider an intercept only linear random effects model for analysis of data from a two stage cluster sampling design. At the first stage a simple random sample of clusters is drawn, and at the second stage a simple random sample of elementary units is taken within each selected cluster. The response variable is assumed to consist of a cluster-level random effect plus an independent error term with known variance. The objects of inference are the mean of the outcome variable and the random effect variance. With a more complex two stage sampling design, the use of an approach based on an estimated pairwise composite likelihood function has appealing properties. Our purpose is to use our simpler context to compare the results of likelihood inference with inference based on a pairwise composite likelihood function that is treated as an approximate likelihood, in particular treated as the likelihood component in Bayesian inference. In order to provide credible intervals having frequentist coverage close to nominal values, the pairwise composite likelihood function and corresponding posterior density need modification, such as a curvature adjustment. Through simulation studies, we investigate the performance of an adjustment proposed in the literature, and find that it works well for the mean but provides credible intervals for the random effect variance that suffer from under-coverage. We propose possible future directions including extensions to the case of a complex design.

    Release date: 2022-06-21

  • Stats in brief: 89-20-00062022001
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. To properly conduct these processes, data ethics ethics must be upheld in order to ensure the appropriate use of data.

    Release date: 2022-05-24

  • Stats in brief: 89-20-00062022002
    Description:

    This video will break down what it means to be FAIR in terms of data and metadata, and how each pillar of FAIR serves to guide data users and producers alike, as they navigate their way through the data journey, in order to gain maximum, long term value.

    Release date: 2022-05-24

  • Articles and reports: 11-633-X2021007
    Description:

    Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

    Release date: 2021-11-16

  • Articles and reports: 82-003-X202000700002
    Description:

    This paper's objectives are to examine the feasibility of pooling linked population health surveys from three countries, facilitate the examination of health behaviours, and present useful information to assist in the planning of international population health surveillance and research studies.

    Release date: 2020-07-29

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17

  • Articles and reports: 11-633-X2019003
    Description:

    This report provides an overview of the definitions and competency frameworks of data literacy, as well as the assessment tools used to measure it. These are based on the existing literature and current practices around the world. Data literacy, or the ability to derive meaningful information from data, is a relatively new concept. However, it is gaining increasing recognition as a vital skillset in the information age. Existing approaches to measuring data literacy—from self-assessment tools to objective measures, and from individual to organizational assessments—are discussed in this report to inform the development of an assessment tool for data literacy in the Canadian public service.

    Release date: 2019-08-14

  • Articles and reports: 12-001-X201900200007
    Description:

    When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.

    Release date: 2019-06-27
Stats in brief (3)

Stats in brief (3) ((3 results))

  • Stats in brief: 89-20-00062022004
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. In this video, we will discuss the importance of considering data ethics throughout the process of producing statistical information.

    As a pre-requisite to this video, make sure to watch the video titled “Data Ethics: An introduction” also available in Statistics Canada’s data literacy training catalogue.

    Release date: 2022-10-17

  • Stats in brief: 89-20-00062022001
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. To properly conduct these processes, data ethics ethics must be upheld in order to ensure the appropriate use of data.

    Release date: 2022-05-24

  • Stats in brief: 89-20-00062022002
    Description:

    This video will break down what it means to be FAIR in terms of data and metadata, and how each pillar of FAIR serves to guide data users and producers alike, as they navigate their way through the data journey, in order to gain maximum, long term value.

    Release date: 2022-05-24
Articles and reports (122)

Articles and reports (122) (0 to 10 of 122 results)

  • Articles and reports: 11-633-X2022007
    Description:

    This paper investigates how Statistics Canada can increase trust by giving users the ability to authenticate data from its website through digital signatures and blockchain technology.

    Release date: 2022-09-19

  • Articles and reports: 12-001-X202200100002
    Description:

    We consider an intercept only linear random effects model for analysis of data from a two stage cluster sampling design. At the first stage a simple random sample of clusters is drawn, and at the second stage a simple random sample of elementary units is taken within each selected cluster. The response variable is assumed to consist of a cluster-level random effect plus an independent error term with known variance. The objects of inference are the mean of the outcome variable and the random effect variance. With a more complex two stage sampling design, the use of an approach based on an estimated pairwise composite likelihood function has appealing properties. Our purpose is to use our simpler context to compare the results of likelihood inference with inference based on a pairwise composite likelihood function that is treated as an approximate likelihood, in particular treated as the likelihood component in Bayesian inference. In order to provide credible intervals having frequentist coverage close to nominal values, the pairwise composite likelihood function and corresponding posterior density need modification, such as a curvature adjustment. Through simulation studies, we investigate the performance of an adjustment proposed in the literature, and find that it works well for the mean but provides credible intervals for the random effect variance that suffer from under-coverage. We propose possible future directions including extensions to the case of a complex design.

    Release date: 2022-06-21

  • Articles and reports: 11-633-X2021007
    Description:

    Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

    Release date: 2021-11-16

  • Articles and reports: 82-003-X202000700002
    Description:

    This paper's objectives are to examine the feasibility of pooling linked population health surveys from three countries, facilitate the examination of health behaviours, and present useful information to assist in the planning of international population health surveillance and research studies.

    Release date: 2020-07-29

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17

  • Articles and reports: 11-633-X2019003
    Description:

    This report provides an overview of the definitions and competency frameworks of data literacy, as well as the assessment tools used to measure it. These are based on the existing literature and current practices around the world. Data literacy, or the ability to derive meaningful information from data, is a relatively new concept. However, it is gaining increasing recognition as a vital skillset in the information age. Existing approaches to measuring data literacy—from self-assessment tools to objective measures, and from individual to organizational assessments—are discussed in this report to inform the development of an assessment tool for data literacy in the Canadian public service.

    Release date: 2019-08-14

  • Articles and reports: 12-001-X201900200007
    Description:

    When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201900100004
    Description:

    In this paper, we make use of auxiliary information to improve the efficiency of the estimates of the censored quantile regression parameters. Utilizing the information available from previous studies, we computed empirical likelihood probabilities as weights and proposed weighted censored quantile regression. Theoretical properties of the proposed method are derived. Our simulation studies shown that our proposed method has advantages compared to standard censored quantile regression.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201900100005
    Description:

    Small area estimation using area-level models can sometimes benefit from covariates that are observed subject to random errors, such as covariates that are themselves estimates drawn from another survey. Given estimates of the variances of these measurement (sampling) errors for each small area, one can account for the uncertainty in such covariates using measurement error models (e.g., Ybarra and Lohr, 2008). Two types of area-level measurement error models have been examined in the small area estimation literature. The functional measurement error model assumes that the underlying true values of the covariates with measurement error are fixed but unknown quantities. The structural measurement error model assumes that these true values follow a model, leading to a multivariate model for the covariates observed with error and the original dependent variable. We compare and contrast these two models with the alternative of simply ignoring measurement error when it is present (naïve model), exploring the consequences for prediction mean squared errors of use of an incorrect model under different underlying assumptions about the true model. Comparisons done using analytic formulas for the mean squared errors assuming model parameters are known yield some surprising results. We also illustrate results with a model fitted to data from the U.S. Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) Program.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201800254961
    Description:

    In business surveys, it is common to collect economic variables with highly skewed distribution. In this context, winsorization is frequently used to address the problem of influential values. In stratified simple random sampling, there are two methods for selecting the thresholds involved in winsorization. This article comprises two parts. The first reviews the notations and the concept of a winsorization estimator. The second part details the two methods and extends them to the case of Poisson sampling, and then compares them on simulated data sets and on the labour cost and structure of earnings survey carried out by INSEE.

    Release date: 2018-12-20
Journals and periodicals (1)

Journals and periodicals (1) ((1 result))

  • Journals and periodicals: 84F0013X
    Geography: Canada, Province or territory
    Description:

    This study was initiated to test the validity of probabilistic linkage methods used at Statistics Canada. It compared the results of data linkages on infant deaths in Canada with infant death data from Nova Scotia and Alberta. It also compared the availability of fetal deaths on the national and provincial files.

    Release date: 1999-10-08
Date modified: