Sort Help
entries

Results

All (10)

All (10) ((10 results))

  • Articles and reports: 11-522-X202200100020
    Description: The reconciliation of 2021 census dwellings with the new Statistical Building Register (SBgR) presented linkage challenges. The Census of Population collected information from various dwelling types. For a large proportion of the population, mailing addresses were at the centre: they were used for reaching out to people and collected as contact info. In parallel, the register environment has been evolving. The agency is transitioning from the Address Register (AR) to the SBgR holding both mailing and location addresses, while also covering non-residential buildings. The reconciliation was conducted using a combination of systems, notably the new Register Matching Engine (RME) for difficult cases. The RME holds an interesting range of sophisticated string comparators. A deterministic linkage approach was used, while incorporating some data knowledge like the entropy. Through metadata, the matching expert could also reduce the amounts of false positives and false negatives.
    Release date: 2024-03-25

  • Articles and reports: 46-28-0001202200100001
    Description:

    When a survey publishes statistics with a quality indicator, it is usually derived from measures based on sampling theory. The production of quality indicators is a significant challenge when statistics are produced using alternative sources for which no sampling is done. This paper describes a new method used to create a quality indicator that combines indicators obtained at different stages of data processing. An example of the application of the method in the Canadian Housing Statistics Program is provided in the Appendix.

    Release date: 2022-01-06

  • Articles and reports: 11-522-X202100100015
    Description: National statistical agencies such as Statistics Canada have a responsibility to convey the quality of statistical information to users. The methods traditionally used to do this are based on measures of sampling error. As a result, they are not adapted to the estimates produced using administrative data, for which the main sources of error are not due to sampling. A more suitable approach to reporting the quality of estimates presented in a multidimensional table is described in this paper. Quality indicators were derived for various post-acquisition processing steps, such as linkage, geocoding and imputation, by estimation domain. A clustering algorithm was then used to combine domains with similar quality levels for a given estimate. Ratings to inform users of the relative quality of estimates across domains were assigned to the groups created. This indicator, called the composite quality indicator (CQI), was developed and experimented with in the Canadian Housing Statistics Program (CHSP), which aims to produce official statistics on the residential housing sector in Canada using multiple administrative data sources.

    Keywords: Unsupervised machine learning, quality assurance, administrative data, data integration, clustering.

    Release date: 2021-10-22

  • Articles and reports: 12-001-X201000211384
    Description:

    The current economic downturn in the US could challenge costly strategies in survey operations. In the Behavioral Risk Factor Surveillance System (BRFSS), ending the monthly data collection at 31 days could be a less costly alternative. However, this could potentially exclude a portion of interviews completed after 31 days (late responders) whose respondent characteristics could be different in many respects from those who completed the survey within 31 days (early responders). We examined whether there are differences between the early and late responders in demographics, health-care coverage, general health status, health risk behaviors, and chronic disease conditions or illnesses. We used 2007 BRFSS data, where a representative sample of the noninstitutionalized adult U.S. population was selected using a random digit dialing method. Late responders were significantly more likely to be male; to report race/ethnicity as Hispanic; to have annual income higher than $50,000; to be younger than 45 years of age; to have less than high school education; to have health-care coverage; to be significantly more likely to report good health; and to be significantly less likely to report hypertension, diabetes, or being obese. The observed differences between early and late responders on survey estimates may hardly influence national and state-level estimates. As the proportion of late responders may increase in the future, its impact on surveillance estimates should be examined before excluding from the analysis. Analysis on late responders only should combine several years of data to produce reliable estimates.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X200900211037
    Description:

    Randomized response strategies, which have originally been developed as statistical methods to reduce nonresponse as well as untruthful answering, can also be applied in the field of statistical disclosure control for public use microdata files. In this paper a standardization of randomized response techniques for the estimation of proportions of identifying or sensitive attributes is presented. The statistical properties of the standardized estimator are derived for general probability sampling. In order to analyse the effect of different choices of the method's implicit "design parameters" on the performance of the estimator we have to include measures of privacy protection in our considerations. These yield variance-optimum design parameters given a certain level of privacy protection. To this end the variables have to be classified into different categories of sensitivity. A real-data example applies the technique in a survey on academic cheating behaviour.

    Release date: 2009-12-23

  • Articles and reports: 12-001-X200800210758
    Description:

    We propose a method for estimating the variance of estimators of changes over time, a method that takes account of all the components of these estimators: the sampling design, treatment of non-response, treatment of large companies, correlation of non-response from one wave to another, the effect of using a panel, robustification, and calibration using a ratio estimator. This method, which serves to determine the confidence intervals of changes over time, is then applied to the Swiss survey of value added.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 11-522-X200600110449
    Description:

    Traditionally administrative hospital discharge databases have been mainly used for administrative purposes. Recently, health services researchers and population health researchers have been using the databases for a wide variety of studies; in particular health care outcomes. Tools, such as comorbidity indexes, have been developed to facilitate this analysis. Every time the coding system for diagnoses and procedures is revised or a new one is developed, these comorbidity indexes need to be updated. These updates are important in maintaining consistency when trends are examined over time.

    Release date: 2008-03-17

  • Articles and reports: 11-522-X20040018743
    Description:

    To reach homeless people, INED decided to follow the "itinerant services", which unlike the "regular services" for the homeless, try to reach the homeless by visiting them where they live, mostly in public places.

    Release date: 2005-10-27

  • Articles and reports: 11-522-X20030017695
    Description:

    This paper proposes methods to correct a seasonally adjusted series so that its annual totals match those of the raw series. The methods are illustrated with a seasonally adjusted series obtained with either X-11-ARIMA or X-12-ARIMA.

    Release date: 2005-01-26
Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (10)

Articles and reports (10) ((10 results))

  • Articles and reports: 11-522-X202200100020
    Description: The reconciliation of 2021 census dwellings with the new Statistical Building Register (SBgR) presented linkage challenges. The Census of Population collected information from various dwelling types. For a large proportion of the population, mailing addresses were at the centre: they were used for reaching out to people and collected as contact info. In parallel, the register environment has been evolving. The agency is transitioning from the Address Register (AR) to the SBgR holding both mailing and location addresses, while also covering non-residential buildings. The reconciliation was conducted using a combination of systems, notably the new Register Matching Engine (RME) for difficult cases. The RME holds an interesting range of sophisticated string comparators. A deterministic linkage approach was used, while incorporating some data knowledge like the entropy. Through metadata, the matching expert could also reduce the amounts of false positives and false negatives.
    Release date: 2024-03-25

  • Articles and reports: 46-28-0001202200100001
    Description:

    When a survey publishes statistics with a quality indicator, it is usually derived from measures based on sampling theory. The production of quality indicators is a significant challenge when statistics are produced using alternative sources for which no sampling is done. This paper describes a new method used to create a quality indicator that combines indicators obtained at different stages of data processing. An example of the application of the method in the Canadian Housing Statistics Program is provided in the Appendix.

    Release date: 2022-01-06

  • Articles and reports: 11-522-X202100100015
    Description: National statistical agencies such as Statistics Canada have a responsibility to convey the quality of statistical information to users. The methods traditionally used to do this are based on measures of sampling error. As a result, they are not adapted to the estimates produced using administrative data, for which the main sources of error are not due to sampling. A more suitable approach to reporting the quality of estimates presented in a multidimensional table is described in this paper. Quality indicators were derived for various post-acquisition processing steps, such as linkage, geocoding and imputation, by estimation domain. A clustering algorithm was then used to combine domains with similar quality levels for a given estimate. Ratings to inform users of the relative quality of estimates across domains were assigned to the groups created. This indicator, called the composite quality indicator (CQI), was developed and experimented with in the Canadian Housing Statistics Program (CHSP), which aims to produce official statistics on the residential housing sector in Canada using multiple administrative data sources.

    Keywords: Unsupervised machine learning, quality assurance, administrative data, data integration, clustering.

    Release date: 2021-10-22

  • Articles and reports: 12-001-X201000211384
    Description:

    The current economic downturn in the US could challenge costly strategies in survey operations. In the Behavioral Risk Factor Surveillance System (BRFSS), ending the monthly data collection at 31 days could be a less costly alternative. However, this could potentially exclude a portion of interviews completed after 31 days (late responders) whose respondent characteristics could be different in many respects from those who completed the survey within 31 days (early responders). We examined whether there are differences between the early and late responders in demographics, health-care coverage, general health status, health risk behaviors, and chronic disease conditions or illnesses. We used 2007 BRFSS data, where a representative sample of the noninstitutionalized adult U.S. population was selected using a random digit dialing method. Late responders were significantly more likely to be male; to report race/ethnicity as Hispanic; to have annual income higher than $50,000; to be younger than 45 years of age; to have less than high school education; to have health-care coverage; to be significantly more likely to report good health; and to be significantly less likely to report hypertension, diabetes, or being obese. The observed differences between early and late responders on survey estimates may hardly influence national and state-level estimates. As the proportion of late responders may increase in the future, its impact on surveillance estimates should be examined before excluding from the analysis. Analysis on late responders only should combine several years of data to produce reliable estimates.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X200900211037
    Description:

    Randomized response strategies, which have originally been developed as statistical methods to reduce nonresponse as well as untruthful answering, can also be applied in the field of statistical disclosure control for public use microdata files. In this paper a standardization of randomized response techniques for the estimation of proportions of identifying or sensitive attributes is presented. The statistical properties of the standardized estimator are derived for general probability sampling. In order to analyse the effect of different choices of the method's implicit "design parameters" on the performance of the estimator we have to include measures of privacy protection in our considerations. These yield variance-optimum design parameters given a certain level of privacy protection. To this end the variables have to be classified into different categories of sensitivity. A real-data example applies the technique in a survey on academic cheating behaviour.

    Release date: 2009-12-23

  • Articles and reports: 12-001-X200800210758
    Description:

    We propose a method for estimating the variance of estimators of changes over time, a method that takes account of all the components of these estimators: the sampling design, treatment of non-response, treatment of large companies, correlation of non-response from one wave to another, the effect of using a panel, robustification, and calibration using a ratio estimator. This method, which serves to determine the confidence intervals of changes over time, is then applied to the Swiss survey of value added.

    Release date: 2008-12-23

  • Articles and reports: 12-001-X200800110619
    Description:

    Small area prediction based on random effects, called EBLUP, is a procedure for constructing estimates for small geographical areas or small subpopulations using existing survey data. The total of the small area predictors is often forced to equal the direct survey estimate and such predictors are said to be calibrated. Several calibrated predictors are reviewed and a criterion that unifies the derivation of these calibrated predictors is presented. The predictor that is the unique best linear unbiased predictor under the criterion is derived and the mean square error of the calibrated predictors is discussed. Implicit in the imposition of the restriction is the possibility that the small area model is misspecified and the predictors are biased. Augmented models with one additional explanatory variable for which the usual small area predictors achieve the self-calibrated property are considered. Simulations demonstrate that calibrated predictors have slightly smaller bias compared to those of the usual EBLUP predictor. However, if the bias is a concern, a better approach is to use an augmented model with an added auxiliary variable that is a function of area size. In the simulation, the predictors based on the augmented model had smaller MSE than EBLUP when the incorrect model was used for prediction. Furthermore, there was a very small increase in MSE relative to EBLUP if the auxiliary variable was added to the correct model.

    Release date: 2008-06-26

  • Articles and reports: 11-522-X200600110449
    Description:

    Traditionally administrative hospital discharge databases have been mainly used for administrative purposes. Recently, health services researchers and population health researchers have been using the databases for a wide variety of studies; in particular health care outcomes. Tools, such as comorbidity indexes, have been developed to facilitate this analysis. Every time the coding system for diagnoses and procedures is revised or a new one is developed, these comorbidity indexes need to be updated. These updates are important in maintaining consistency when trends are examined over time.

    Release date: 2008-03-17

  • Articles and reports: 11-522-X20040018743
    Description:

    To reach homeless people, INED decided to follow the "itinerant services", which unlike the "regular services" for the homeless, try to reach the homeless by visiting them where they live, mostly in public places.

    Release date: 2005-10-27

  • Articles and reports: 11-522-X20030017695
    Description:

    This paper proposes methods to correct a seasonally adjusted series so that its annual totals match those of the raw series. The methods are illustrated with a seasonally adjusted series obtained with either X-11-ARIMA or X-12-ARIMA.

    Release date: 2005-01-26
Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Date modified: