Sort Help
entries

Results

All (12)

All (12) (0 to 10 of 12 results)

  • Stats in brief: 89-20-00062024001
    Description: This short video explains how it can be very effective for all levels of governments and organizations that serve communities to use disaggregated data to make evidence-informed public policy decisions. By using disaggregated data, policymakers are able to design more appropriate and effective policies that meet the needs of each diverse and unique Canadian.
    Release date: 2024-07-16

  • Stats in brief: 89-20-00062024002
    Description: This short video explains how the use of disaggregated data can help policymakers to develop more targeted and effective policies by identifying the unique needs and challenges faced by different demographic groups.
    Release date: 2024-07-16

  • Articles and reports: 12-001-X202300200009
    Description: In this paper, we investigate how a big non-probability database can be used to improve estimates of finite population totals from a small probability sample through data integration techniques. In the situation where the study variable is observed in both data sources, Kim and Tam (2021) proposed two design-consistent estimators that can be justified through dual frame survey theory. First, we provide conditions ensuring that these estimators are more efficient than the Horvitz-Thompson estimator when the probability sample is selected using either Poisson sampling or simple random sampling without replacement. Then, we study the class of QR predictors, introduced by Särndal and Wright (1984), to handle the less common case where the non-probability database contains no study variable but auxiliary variables. We also require that the non-probability database is large and can be linked to the probability sample. We provide conditions ensuring that the QR predictor is asymptotically design-unbiased. We derive its asymptotic design variance and provide a consistent design-based variance estimator. We compare the design properties of different predictors, in the class of QR predictors, through a simulation study. This class includes a model-based predictor, a model-assisted estimator and a cosmetic estimator. In our simulation setups, the cosmetic estimator performed slightly better than the model-assisted estimator. These findings are confirmed by an application to La Poste data, which also illustrates that the properties of the cosmetic estimator are preserved irrespective of the observed non-probability sample.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200011
    Description: The article considers sampling designs for populations that can be represented as a N × M matrix. For instance when investigating tourist activities, the rows could be locations visited by tourists and the columns days in the tourist season. The goal is to sample cells (i, j) of the matrix when the number of selections within each row and each column is fixed a priori. The ith row sample size represents the number of selected cells within row i; the jth column sample size is the number of selected cells within column j. A matrix sampling design gives an N × M matrix of sample indicators, with entry 1 at position (i, j) if cell (i, j) is sampled and 0 otherwise. The first matrix sampling design investigated has one level of sampling, row and column sample sizes are set in advance: the row sample sizes can vary while the column sample sizes are all equal. The fixed margins can be seen as balancing constraints and algorithms available for selecting such samples are reviewed. A new estimator for the variance of the Horvitz-Thompson estimator for the mean of survey variable y is then presented. Several levels of sampling might be necessary to account for all the constraints; this involves multi-level matrix sampling designs that are also investigated.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200013
    Description: Jean-Claude Deville is one of the most prominent researcher in survey sampling theory and practice. His research on balanced sampling, indirect sampling and calibration in particular is internationally recognized and widely used in official statistics. He was also a pioneer in the field of functional data analysis. This discussion gives us the opportunity to recognize the immense work he has accomplished, and to pay tribute to him. In the first part of this article, we recall briefly his contribution to the functional principal analysis. We also detail some recent extension of his work at the intersection of the fields of functional data analysis and survey sampling. In the second part of this paper, we present some extension of Jean-Claude’s work in indirect sampling. These extensions are motivated by concrete applications and illustrate Jean-Claude’s influence on our work as researchers.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300100006
    Description: My comments consist of three components: (1) A brief account of my professional association with Chris Skinner. (2) Observations on Skinner’s contributions to statistical disclosure control, (3) Some comments on making inferences from masked survey data.
    Release date: 2023-06-30

  • Articles and reports: 89-648-X2022001
    Description:

    This report explores the size and nature of the attrition challenges faced by the Longitudinal and International Study of Adults (LISA) survey, as well as the use of a non-response weight adjustment and calibration strategy to mitigate the effects of attrition on the LISA estimates. The study focuses on data from waves 1 (2012) to 4 (2018) and uses practical examples based on selected demographic variables, to illustrate how attrition be assessed and treated.

    Release date: 2022-11-14

  • Articles and reports: 12-001-X202200100003
    Description:

    Use of auxiliary data to improve the efficiency of estimators of totals and means through model-assisted survey regression estimation has received considerable attention in recent years. Generalized regression (GREG) estimators, based on a working linear regression model, are currently used in establishment surveys at Statistics Canada and several other statistical agencies. GREG estimators use common survey weights for all study variables and calibrate to known population totals of auxiliary variables. Increasingly, many auxiliary variables are available, some of which may be extraneous. This leads to unstable GREG weights when all the available auxiliary variables, including interactions among categorical variables, are used in the working linear regression model. On the other hand, new machine learning methods, such as regression trees and lasso, automatically select significant auxiliary variables and lead to stable nonnegative weights and possible efficiency gains over GREG. In this paper, a simulation study, based on a real business survey sample data set treated as the target population, is conducted to study the relative performance of GREG, regression trees and lasso in terms of efficiency of the estimators and properties of associated regression weights. Both probability sampling and non-probability sampling scenarios are studied.

    Release date: 2022-06-21

  • Articles and reports: 18-001-X2020001
    Description:

    This paper presents the methodology used to generate the first nationwide database of proximity measures and the results obtained with a first set of ten measures. The computational methods are presented as a generalizable model due to the fact that it is now possible to apply similar methods to a multitude of other services or amenities, in a variety of alternative specifications.

    Release date: 2021-02-15

  • Articles and reports: 11-633-X2020002
    Description:

    The concepts of urban and rural are widely debated and vary depending on a country’s geopolitical and sociodemographic composition. In Canada, population centres and statistical area classifications are widely used to distinguish urban and rural communities. However, neither of these classifications precisely classify Canadian communities into urban, rural and remote areas. A group of researchers at Statistics Canada developed an alternative tool called the “remoteness index” to measure the relative remoteness of Canadian communities. This study builds on the remoteness index, which is a continuous index, by examining how it can be classified into five discrete categories of remoteness geographies. When properly categorized, the remoteness index can be a useful tool to distinguish urban, rural and remote communities in Canada, while protecting the privacy and confidentiality of citizens. This study considers five methodological approaches and recommends three methods.

    Release date: 2020-08-11
Stats in brief (2)

Stats in brief (2) ((2 results))

  • Stats in brief: 89-20-00062024001
    Description: This short video explains how it can be very effective for all levels of governments and organizations that serve communities to use disaggregated data to make evidence-informed public policy decisions. By using disaggregated data, policymakers are able to design more appropriate and effective policies that meet the needs of each diverse and unique Canadian.
    Release date: 2024-07-16

  • Stats in brief: 89-20-00062024002
    Description: This short video explains how the use of disaggregated data can help policymakers to develop more targeted and effective policies by identifying the unique needs and challenges faced by different demographic groups.
    Release date: 2024-07-16
Articles and reports (10)

Articles and reports (10) ((10 results))

  • Articles and reports: 12-001-X202300200009
    Description: In this paper, we investigate how a big non-probability database can be used to improve estimates of finite population totals from a small probability sample through data integration techniques. In the situation where the study variable is observed in both data sources, Kim and Tam (2021) proposed two design-consistent estimators that can be justified through dual frame survey theory. First, we provide conditions ensuring that these estimators are more efficient than the Horvitz-Thompson estimator when the probability sample is selected using either Poisson sampling or simple random sampling without replacement. Then, we study the class of QR predictors, introduced by Särndal and Wright (1984), to handle the less common case where the non-probability database contains no study variable but auxiliary variables. We also require that the non-probability database is large and can be linked to the probability sample. We provide conditions ensuring that the QR predictor is asymptotically design-unbiased. We derive its asymptotic design variance and provide a consistent design-based variance estimator. We compare the design properties of different predictors, in the class of QR predictors, through a simulation study. This class includes a model-based predictor, a model-assisted estimator and a cosmetic estimator. In our simulation setups, the cosmetic estimator performed slightly better than the model-assisted estimator. These findings are confirmed by an application to La Poste data, which also illustrates that the properties of the cosmetic estimator are preserved irrespective of the observed non-probability sample.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200011
    Description: The article considers sampling designs for populations that can be represented as a N × M matrix. For instance when investigating tourist activities, the rows could be locations visited by tourists and the columns days in the tourist season. The goal is to sample cells (i, j) of the matrix when the number of selections within each row and each column is fixed a priori. The ith row sample size represents the number of selected cells within row i; the jth column sample size is the number of selected cells within column j. A matrix sampling design gives an N × M matrix of sample indicators, with entry 1 at position (i, j) if cell (i, j) is sampled and 0 otherwise. The first matrix sampling design investigated has one level of sampling, row and column sample sizes are set in advance: the row sample sizes can vary while the column sample sizes are all equal. The fixed margins can be seen as balancing constraints and algorithms available for selecting such samples are reviewed. A new estimator for the variance of the Horvitz-Thompson estimator for the mean of survey variable y is then presented. Several levels of sampling might be necessary to account for all the constraints; this involves multi-level matrix sampling designs that are also investigated.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200013
    Description: Jean-Claude Deville is one of the most prominent researcher in survey sampling theory and practice. His research on balanced sampling, indirect sampling and calibration in particular is internationally recognized and widely used in official statistics. He was also a pioneer in the field of functional data analysis. This discussion gives us the opportunity to recognize the immense work he has accomplished, and to pay tribute to him. In the first part of this article, we recall briefly his contribution to the functional principal analysis. We also detail some recent extension of his work at the intersection of the fields of functional data analysis and survey sampling. In the second part of this paper, we present some extension of Jean-Claude’s work in indirect sampling. These extensions are motivated by concrete applications and illustrate Jean-Claude’s influence on our work as researchers.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300100006
    Description: My comments consist of three components: (1) A brief account of my professional association with Chris Skinner. (2) Observations on Skinner’s contributions to statistical disclosure control, (3) Some comments on making inferences from masked survey data.
    Release date: 2023-06-30

  • Articles and reports: 89-648-X2022001
    Description:

    This report explores the size and nature of the attrition challenges faced by the Longitudinal and International Study of Adults (LISA) survey, as well as the use of a non-response weight adjustment and calibration strategy to mitigate the effects of attrition on the LISA estimates. The study focuses on data from waves 1 (2012) to 4 (2018) and uses practical examples based on selected demographic variables, to illustrate how attrition be assessed and treated.

    Release date: 2022-11-14

  • Articles and reports: 12-001-X202200100003
    Description:

    Use of auxiliary data to improve the efficiency of estimators of totals and means through model-assisted survey regression estimation has received considerable attention in recent years. Generalized regression (GREG) estimators, based on a working linear regression model, are currently used in establishment surveys at Statistics Canada and several other statistical agencies. GREG estimators use common survey weights for all study variables and calibrate to known population totals of auxiliary variables. Increasingly, many auxiliary variables are available, some of which may be extraneous. This leads to unstable GREG weights when all the available auxiliary variables, including interactions among categorical variables, are used in the working linear regression model. On the other hand, new machine learning methods, such as regression trees and lasso, automatically select significant auxiliary variables and lead to stable nonnegative weights and possible efficiency gains over GREG. In this paper, a simulation study, based on a real business survey sample data set treated as the target population, is conducted to study the relative performance of GREG, regression trees and lasso in terms of efficiency of the estimators and properties of associated regression weights. Both probability sampling and non-probability sampling scenarios are studied.

    Release date: 2022-06-21

  • Articles and reports: 18-001-X2020001
    Description:

    This paper presents the methodology used to generate the first nationwide database of proximity measures and the results obtained with a first set of ten measures. The computational methods are presented as a generalizable model due to the fact that it is now possible to apply similar methods to a multitude of other services or amenities, in a variety of alternative specifications.

    Release date: 2021-02-15

  • Articles and reports: 11-633-X2020002
    Description:

    The concepts of urban and rural are widely debated and vary depending on a country’s geopolitical and sociodemographic composition. In Canada, population centres and statistical area classifications are widely used to distinguish urban and rural communities. However, neither of these classifications precisely classify Canadian communities into urban, rural and remote areas. A group of researchers at Statistics Canada developed an alternative tool called the “remoteness index” to measure the relative remoteness of Canadian communities. This study builds on the remoteness index, which is a continuous index, by examining how it can be classified into five discrete categories of remoteness geographies. When properly categorized, the remoteness index can be a useful tool to distinguish urban, rural and remote communities in Canada, while protecting the privacy and confidentiality of citizens. This study considers five methodological approaches and recommends three methods.

    Release date: 2020-08-11

  • Articles and reports: 82-003-X202000700002
    Description:

    This paper's objectives are to examine the feasibility of pooling linked population health surveys from three countries, facilitate the examination of health behaviours, and present useful information to assist in the planning of international population health surveillance and research studies.

    Release date: 2020-07-29

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17
Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Date modified: