Sort Help
entries

Results

All (11)

All (11) (0 to 10 of 11 results)

  • Articles and reports: 12-001-X201900300003
    Description:

    The widely used formulas for the variance of the ratio estimator may lead to serious underestimates when the sample size is small; see Sukhatme (1954), Koop (1968), Rao (1969), and Cochran (1977, pages 163-164). In order to solve this classical problem, we propose in this paper new estimators for the variance and the mean square error of the ratio estimator that do not suffer from such a large negative bias. Similar estimation formulas can be derived for alternative ratio estimators as discussed in Tin (1965). We compare three mean square error estimators for the ratio estimator in a simulation study.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300008
    Description:

    Dual frame surveys are useful when no single frame with adequate coverage exists. However estimators from dual frame designs require knowledge of the frame memberships of each sampled unit. When this information is not available from the frame itself, it is often collected from the respondent. When respondents provide incorrect membership information, the resulting estimators of means or totals can be biased. A method for reducing this bias, using accurate membership information obtained about a subsample of respondents, is proposed. The properties of the new estimator are examined and compared to alternative estimators. The proposed estimator is applied to the data from the motivating example, which was a recreational angler survey, using an address frame and an incomplete fishing license frame.

    Release date: 2019-12-17

  • Articles and reports: 11-633-X2019003
    Description:

    This report provides an overview of the definitions and competency frameworks of data literacy, as well as the assessment tools used to measure it. These are based on the existing literature and current practices around the world. Data literacy, or the ability to derive meaningful information from data, is a relatively new concept. However, it is gaining increasing recognition as a vital skillset in the information age. Existing approaches to measuring data literacy—from self-assessment tools to objective measures, and from individual to organizational assessments—are discussed in this report to inform the development of an assessment tool for data literacy in the Canadian public service.

    Release date: 2019-08-14

  • Articles and reports: 13-605-X201900100009
    Description:

    In this paper a preliminary set of statistical estimates of the amounts invested in Canadian data, databases and data science in recent years are presented. The results indicate rapid growth in investment in data, databases and data science over the last three decades and a significant accumulation of these kinds of capital over time.

    Release date: 2019-07-10

  • Articles and reports: 12-001-X201900200003
    Description:

    Merging available sources of information is becoming increasingly important for improving estimates of population characteristics in a variety of fields. In presence of several independent probability samples from a finite population we investigate options for a combined estimator of the population total, based on either a linear combination of the separate estimators or on the combined sample approach. A linear combination estimator based on estimated variances can be biased as the separate estimators of the population total can be highly correlated to their respective variance estimators. We illustrate the possibility to use the combined sample to estimate the variances of the separate estimators, which results in general pooled variance estimators. These pooled variance estimators use all available information and have potential to significantly reduce bias of a linear combination of separate estimators.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201900200008
    Description:

    High nonresponse occurs in many sample surveys today, including important surveys carried out by government statistical agencies. An adaptive data collection can be advantageous in those conditions: Lower nonresponse bias in survey estimates can be gained, up to a point, by producing a well-balanced set of respondents. Auxiliary variables serve a twofold purpose: Used in the estimation phase, through calibrated adjustment weighting, they reduce, but do not entirely remove, the bias. In the preceding adaptive data collection phase, auxiliary variables also play a major role: They are instrumental in reducing the imbalance in the ultimate set of respondents. For such combined use of auxiliary variables, the deviation of the calibrated estimate from the unbiased estimate (under full response) is studied in the article. We show that this deviation is a sum of two components. The reducible component can be decreased through adaptive data collection, all the way to zero if perfectly balanced response is realized with respect to a chosen auxiliary vector. By contrast, the resisting component changes little or not at all by a better balanced response; it represents a part of the deviation that adaptive design does not get rid of. The relative size of the former component is an indicator of the potential payoff from an adaptive survey design.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201900200009
    Description:

    In recent years, there has been a strong interest in indirect measures of nonresponse bias in surveys or other forms of data collection. This interest originates from gradually decreasing propensities to respond to surveys parallel to pressures on survey budgets. These developments led to a growing focus on the representativeness or balance of the responding sample units with respect to relevant auxiliary variables. One example of a measure is the representativeness indicator, or R-indicator. The R-indicator is based on the design-weighted sample variation of estimated response propensities. It pre-supposes linked auxiliary data. One of the criticisms of the indicator is that it cannot be used in settings where auxiliary information is available only at the population level. In this paper, we propose a new method for estimating response propensities that does not need auxiliary information for non-respondents to the survey and is based on population auxiliary information. These population-based response propensities can then be used to develop R-indicators that employ population contingency tables or population frequency counts. We discuss the statistical properties of the indicators, and evaluate their performance using an evaluation study based on real census data and an application from the Dutch Health Survey.

    Release date: 2019-06-27

  • Articles and reports: 13-605-X201900100008
    Description:

    This paper aims to expand the current national accounting concepts and statistical methods for measuring data in order to shed light on some highly consequential changes in society that are related to the rising usage of data. The paper concludes by discussing possible methods that can be used to assign an economic value to the various elements in the information chain and tests these concepts and methods by presenting results for Canada as a first attempt to measure the value of data.

    Release date: 2019-06-24

  • Articles and reports: 11-633-X2019002
    Description:

    Survey data collection through mobile devices, such as tablets and smartphones, is underway in Canada. However, little is known about the representativeness of the data collected through these devices. In March 2017, Statistics Canada commissioned survey data collection through the Carrot Rewards Application and included 11 questions on the Carrot Rewards Mobile App Survey (Carrot) drawn from the 2017 Canadian Community Health Survey (CCHS).

    Release date: 2019-06-04

  • Articles and reports: 12-001-X201900100006
    Description:

    The empirical predictor under an area level version of the generalized linear mixed model (GLMM) is extensively used in small area estimation (SAE) for counts. However, this approach does not use the sampling weights or clustering information that are essential for valid inference given the informative samples produced by modern complex survey designs. This paper describes an SAE method that incorporates this sampling information when estimating small area proportions or counts under an area level version of the GLMM. The approach is further extended under a spatial dependent version of the GLMM (SGLMM). The mean squared error (MSE) estimation for this method is also discussed. This SAE method is then applied to estimate the extent of household poverty in different districts of the rural part of the state of Uttar Pradesh in India by linking data from the 2011-12 Household Consumer Expenditure Survey collected by the National Sample Survey Office (NSSO) of India, and the 2011 Indian Population Census. Results from this application indicate a substantial gain in precision for the new methods compared to the direct survey estimates.

    Release date: 2019-05-07
Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (11)

Articles and reports (11) (0 to 10 of 11 results)

  • Articles and reports: 12-001-X201900300003
    Description:

    The widely used formulas for the variance of the ratio estimator may lead to serious underestimates when the sample size is small; see Sukhatme (1954), Koop (1968), Rao (1969), and Cochran (1977, pages 163-164). In order to solve this classical problem, we propose in this paper new estimators for the variance and the mean square error of the ratio estimator that do not suffer from such a large negative bias. Similar estimation formulas can be derived for alternative ratio estimators as discussed in Tin (1965). We compare three mean square error estimators for the ratio estimator in a simulation study.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900300008
    Description:

    Dual frame surveys are useful when no single frame with adequate coverage exists. However estimators from dual frame designs require knowledge of the frame memberships of each sampled unit. When this information is not available from the frame itself, it is often collected from the respondent. When respondents provide incorrect membership information, the resulting estimators of means or totals can be biased. A method for reducing this bias, using accurate membership information obtained about a subsample of respondents, is proposed. The properties of the new estimator are examined and compared to alternative estimators. The proposed estimator is applied to the data from the motivating example, which was a recreational angler survey, using an address frame and an incomplete fishing license frame.

    Release date: 2019-12-17

  • Articles and reports: 11-633-X2019003
    Description:

    This report provides an overview of the definitions and competency frameworks of data literacy, as well as the assessment tools used to measure it. These are based on the existing literature and current practices around the world. Data literacy, or the ability to derive meaningful information from data, is a relatively new concept. However, it is gaining increasing recognition as a vital skillset in the information age. Existing approaches to measuring data literacy—from self-assessment tools to objective measures, and from individual to organizational assessments—are discussed in this report to inform the development of an assessment tool for data literacy in the Canadian public service.

    Release date: 2019-08-14

  • Articles and reports: 13-605-X201900100009
    Description:

    In this paper a preliminary set of statistical estimates of the amounts invested in Canadian data, databases and data science in recent years are presented. The results indicate rapid growth in investment in data, databases and data science over the last three decades and a significant accumulation of these kinds of capital over time.

    Release date: 2019-07-10

  • Articles and reports: 12-001-X201900200003
    Description:

    Merging available sources of information is becoming increasingly important for improving estimates of population characteristics in a variety of fields. In presence of several independent probability samples from a finite population we investigate options for a combined estimator of the population total, based on either a linear combination of the separate estimators or on the combined sample approach. A linear combination estimator based on estimated variances can be biased as the separate estimators of the population total can be highly correlated to their respective variance estimators. We illustrate the possibility to use the combined sample to estimate the variances of the separate estimators, which results in general pooled variance estimators. These pooled variance estimators use all available information and have potential to significantly reduce bias of a linear combination of separate estimators.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201900200008
    Description:

    High nonresponse occurs in many sample surveys today, including important surveys carried out by government statistical agencies. An adaptive data collection can be advantageous in those conditions: Lower nonresponse bias in survey estimates can be gained, up to a point, by producing a well-balanced set of respondents. Auxiliary variables serve a twofold purpose: Used in the estimation phase, through calibrated adjustment weighting, they reduce, but do not entirely remove, the bias. In the preceding adaptive data collection phase, auxiliary variables also play a major role: They are instrumental in reducing the imbalance in the ultimate set of respondents. For such combined use of auxiliary variables, the deviation of the calibrated estimate from the unbiased estimate (under full response) is studied in the article. We show that this deviation is a sum of two components. The reducible component can be decreased through adaptive data collection, all the way to zero if perfectly balanced response is realized with respect to a chosen auxiliary vector. By contrast, the resisting component changes little or not at all by a better balanced response; it represents a part of the deviation that adaptive design does not get rid of. The relative size of the former component is an indicator of the potential payoff from an adaptive survey design.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201900200009
    Description:

    In recent years, there has been a strong interest in indirect measures of nonresponse bias in surveys or other forms of data collection. This interest originates from gradually decreasing propensities to respond to surveys parallel to pressures on survey budgets. These developments led to a growing focus on the representativeness or balance of the responding sample units with respect to relevant auxiliary variables. One example of a measure is the representativeness indicator, or R-indicator. The R-indicator is based on the design-weighted sample variation of estimated response propensities. It pre-supposes linked auxiliary data. One of the criticisms of the indicator is that it cannot be used in settings where auxiliary information is available only at the population level. In this paper, we propose a new method for estimating response propensities that does not need auxiliary information for non-respondents to the survey and is based on population auxiliary information. These population-based response propensities can then be used to develop R-indicators that employ population contingency tables or population frequency counts. We discuss the statistical properties of the indicators, and evaluate their performance using an evaluation study based on real census data and an application from the Dutch Health Survey.

    Release date: 2019-06-27

  • Articles and reports: 13-605-X201900100008
    Description:

    This paper aims to expand the current national accounting concepts and statistical methods for measuring data in order to shed light on some highly consequential changes in society that are related to the rising usage of data. The paper concludes by discussing possible methods that can be used to assign an economic value to the various elements in the information chain and tests these concepts and methods by presenting results for Canada as a first attempt to measure the value of data.

    Release date: 2019-06-24

  • Articles and reports: 11-633-X2019002
    Description:

    Survey data collection through mobile devices, such as tablets and smartphones, is underway in Canada. However, little is known about the representativeness of the data collected through these devices. In March 2017, Statistics Canada commissioned survey data collection through the Carrot Rewards Application and included 11 questions on the Carrot Rewards Mobile App Survey (Carrot) drawn from the 2017 Canadian Community Health Survey (CCHS).

    Release date: 2019-06-04

  • Articles and reports: 12-001-X201900100006
    Description:

    The empirical predictor under an area level version of the generalized linear mixed model (GLMM) is extensively used in small area estimation (SAE) for counts. However, this approach does not use the sampling weights or clustering information that are essential for valid inference given the informative samples produced by modern complex survey designs. This paper describes an SAE method that incorporates this sampling information when estimating small area proportions or counts under an area level version of the GLMM. The approach is further extended under a spatial dependent version of the GLMM (SGLMM). The mean squared error (MSE) estimation for this method is also discussed. This SAE method is then applied to estimate the extent of household poverty in different districts of the rural part of the state of Uttar Pradesh in India by linking data from the 2011-12 Household Consumer Expenditure Survey collected by the National Sample Survey Office (NSSO) of India, and the 2011 Indian Population Census. Results from this application indicate a substantial gain in precision for the new methods compared to the direct survey estimates.

    Release date: 2019-05-07
Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Date modified: