Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

What’s new on our website

Statistics Canada's Trust Centre

Results

All (43)

All (43) (0 to 10 of 43 results)

1. The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment
Articles and reports: 12-001-X202200100005
Description:
Methodological studies of the effects that human interviewers have on the quality of survey data have long been limited by a critical assumption: that interviewers in a given survey are assigned random subsets of the larger overall sample (also known as interpenetrated assignment). Absent this type of study design, estimates of interviewer effects on survey measures of interest may reflect differences between interviewers in the characteristics of their assigned sample members, rather than recruitment or measurement effects specifically introduced by the interviewers. Previous attempts to approximate interpenetrated assignment have typically used regression models to condition on factors that might be related to interviewer assignment. We introduce a new approach for overcoming this lack of interpenetrated assignment when estimating interviewer effects. This approach, which we refer to as the “anchoring” method, leverages correlations between observed variables that are unlikely to be affected by interviewers (“anchors”) and variables that may be prone to interviewer effects to remove components of within-interviewer correlations that lack of interpenetrated assignment may introduce. We consider both frequentist and Bayesian approaches, where the latter can make use of information about interviewer effect variances in previous waves of a study, if available. We evaluate this new methodology empirically using a simulation study, and then illustrate its application using real survey data from the Behavioral Risk Factor Surveillance System (BRFSS), where interviewer IDs are provided on public-use data files. While our proposed method shares some of the limitations of the traditional approach – namely the need for variables associated with the outcome of interest that are also free of measurement error – it avoids the need for conditional inference and thus has improved inferential qualities when the focus is on marginal estimates, and it shows evidence of further reducing overestimation of larger interviewer effects relative to the traditional approach.

Release date: 2022-06-21
2. Maximum entropy classification for record linkage
Articles and reports: 12-001-X202200100007
Description:
By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in machine learning to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is fully automatic, unlike the classical approach that generally requires clerical review to resolve the undecided cases.
Release date: 2022-06-21
3. Confidentiality Vetting Support: Dominance and homogeneity using SAS Archived
Stats in brief: 89-20-00082021001
Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-29
4. Confidentiality Vetting Support: Proportion and round tool using SAS Archived
Stats in brief: 89-20-00082021002
Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
5. Confidentiality Vetting Support: Rounding proportions using Stata Archived
Stats in brief: 89-20-00082021003
Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
6. Confidentiality Vetting Support: Dominance and homogeneity using the tcensus function (Stata) Archived
Stats in brief: 89-20-00082021004
Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-27
7. Confidentiality Vetting Support: Rounding proportions using Rounder An R Shiny App Archived
Stats in brief: 89-20-00082021005
Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
8. Confidentiality Vetting Support: Dominance and homogeneity using R Archived
Stats in brief: 89-20-00082021006
Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-27
9. Supervised Text Classification with Leveled Homomorphic Encryption Archived
Articles and reports: 11-522-X202100100027
Description:
Privacy concerns are a barrier to applying remote analytics, including machine learning, on sensitive data via the cloud. In this work, we use a leveled fully Homomorphic Encryption scheme to train an end-to-end supervised machine learning algorithm to classify texts while protecting the privacy of the input data points. We train our single-layer neural network on a large simulated dataset, providing a practical solution to a real-world multi-class text classification task. To improve both accuracy and training time, we train an ensemble of such classifiers in parallel using ciphertext packing.

Key Words: Privacy Preservation, Machine Learning, Encryption

Release date: 2021-10-29
10. Sample empirical likelihood approach under complex survey design with scrambled responses
Articles and reports: 12-001-X202100100003
Description:
One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.
Release date: 2021-06-24

Stats in brief (6)

Stats in brief (6) ((6 results))

1. Confidentiality Vetting Support: Dominance and homogeneity using SAS Archived
Stats in brief: 89-20-00082021001
Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-29
2. Confidentiality Vetting Support: Proportion and round tool using SAS Archived
Stats in brief: 89-20-00082021002
Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
3. Confidentiality Vetting Support: Rounding proportions using Stata Archived
Stats in brief: 89-20-00082021003
Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
4. Confidentiality Vetting Support: Dominance and homogeneity using the tcensus function (Stata) Archived
Stats in brief: 89-20-00082021004
Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-27
5. Confidentiality Vetting Support: Rounding proportions using Rounder An R Shiny App Archived
Stats in brief: 89-20-00082021005
Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
6. Confidentiality Vetting Support: Dominance and homogeneity using R Archived
Stats in brief: 89-20-00082021006
Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-27

Articles and reports (37)

Articles and reports (37) (0 to 10 of 37 results)

1. The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment
Articles and reports: 12-001-X202200100005
Description:
Methodological studies of the effects that human interviewers have on the quality of survey data have long been limited by a critical assumption: that interviewers in a given survey are assigned random subsets of the larger overall sample (also known as interpenetrated assignment). Absent this type of study design, estimates of interviewer effects on survey measures of interest may reflect differences between interviewers in the characteristics of their assigned sample members, rather than recruitment or measurement effects specifically introduced by the interviewers. Previous attempts to approximate interpenetrated assignment have typically used regression models to condition on factors that might be related to interviewer assignment. We introduce a new approach for overcoming this lack of interpenetrated assignment when estimating interviewer effects. This approach, which we refer to as the “anchoring” method, leverages correlations between observed variables that are unlikely to be affected by interviewers (“anchors”) and variables that may be prone to interviewer effects to remove components of within-interviewer correlations that lack of interpenetrated assignment may introduce. We consider both frequentist and Bayesian approaches, where the latter can make use of information about interviewer effect variances in previous waves of a study, if available. We evaluate this new methodology empirically using a simulation study, and then illustrate its application using real survey data from the Behavioral Risk Factor Surveillance System (BRFSS), where interviewer IDs are provided on public-use data files. While our proposed method shares some of the limitations of the traditional approach – namely the need for variables associated with the outcome of interest that are also free of measurement error – it avoids the need for conditional inference and thus has improved inferential qualities when the focus is on marginal estimates, and it shows evidence of further reducing overestimation of larger interviewer effects relative to the traditional approach.

Release date: 2022-06-21
2. Maximum entropy classification for record linkage
Articles and reports: 12-001-X202200100007
Description:
By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in machine learning to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is fully automatic, unlike the classical approach that generally requires clerical review to resolve the undecided cases.
Release date: 2022-06-21
3. Supervised Text Classification with Leveled Homomorphic Encryption Archived
Articles and reports: 11-522-X202100100027
Description:
Privacy concerns are a barrier to applying remote analytics, including machine learning, on sensitive data via the cloud. In this work, we use a leveled fully Homomorphic Encryption scheme to train an end-to-end supervised machine learning algorithm to classify texts while protecting the privacy of the input data points. We train our single-layer neural network on a large simulated dataset, providing a practical solution to a real-world multi-class text classification task. To improve both accuracy and training time, we train an ensemble of such classifiers in parallel using ciphertext packing.

Key Words: Privacy Preservation, Machine Learning, Encryption

Release date: 2021-10-29
4. Sample empirical likelihood approach under complex survey design with scrambled responses
Articles and reports: 12-001-X202100100003
Description:
One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.
Release date: 2021-06-24
5. Statistics Canada British Columbia Opioid Overdose Analytical File: Technical Report
Articles and reports: 11-633-X2021003
Description:
Canada continues to experience an opioid crisis. While there is solid information on the demographic and geographic characteristics of people experiencing fatal and non-fatal opioid overdoses in Canada, there is limited information on the social and economic conditions of those who experience these events. To fill this information gap, Statistics Canada collaborated with existing partnerships in British Columbia, including the BC Coroners Service, BC Stats, the BC Centre for Disease Control and the British Columbia Ministry of Health, to create the Statistics Canada British Columbia Opioid Overdose Analytical File (BC-OOAF).
Release date: 2021-02-17
6. Considering interviewer and design effects when planning sample sizes
Articles and reports: 12-001-X202000100005
Description:
Selecting the right sample size is central to ensure the quality of a survey. The state of the art is to account for complex sampling designs by calculating effective sample sizes. These effective sample sizes are determined using the design effect of central variables of interest. However, in face-to-face surveys empirical estimates of design effects are often suspected to be conflated with the impact of the interviewers. This typically leads to an over-estimation of design effects and consequently risks misallocating resources towards a higher sample size instead of using more interviewers or improving measurement accuracy. Therefore, we propose a corrected design effect that separates the interviewer effect from the effects of the sampling design on the sampling variance. The ability to estimate the corrected design effect is tested using a simulation study. In this respect, we address disentangling cluster and interviewer variance. Corrected design effects are estimated for data from the European Social Survey (ESS) round 6 and compared with conventional design effect estimates. Furthermore, we show that for some countries in the ESS round 6 the estimates of conventional design effect are indeed strongly inflated by interviewer effects.
Release date: 2020-06-30
7. Bayesian small area demography Archived
Articles and reports: 12-001-X201900100001
Description:
Demographers are facing increasing pressure to disaggregate their estimates and forecasts by characteristics such as region, ethnicity, and income. Traditional demographic methods were designed for large samples, and perform poorly with disaggregated data. Methods based on formal Bayesian statistical models offer better performance. We illustrate with examples from a long-term project to develop Bayesian approaches to demographic estimation and forecasting. In our first example, we estimate mortality rates disaggregated by age and sex for a small population. In our second example, we simultaneously estimate and forecast obesity prevalence disaggregated by age. We conclude by addressing two traditional objections to the use of Bayesian methods in statistical agencies.
Release date: 2019-05-07
8. Improved Horvitz-Thompson estimator in survey sampling Archived
Articles and reports: 12-001-X201900100007
Description:
The Horvitz-Thompson (HT) estimator is widely used in survey sampling. However, the variance of the HT estimator becomes large when the inclusion probabilities are highly heterogeneous. To overcome this shortcoming, in this paper we propose a hard-threshold method for the first-order inclusion probabilities. Specifically, we carefully choose a threshold value, then replace the inclusion probabilities smaller than the threshold by the threshold. Through this shrinkage strategy, we construct a new estimator called the improved Horvitz-Thompson (IHT) estimator to estimate the population total. The IHT estimator increases the estimation accuracy much although it brings a bias which is relatively small. We derive the IHT estimator’s mean squared error and its unbiased estimator, and theoretically compare the IHT estimator with the HT estimator. We also apply our idea to construct an improved ratio estimator. We numerically analyze simulated and real data sets to illustrate that the proposed estimators are more efficient and robust than the classical estimators.
Release date: 2019-05-07
9. Small area quantile estimation via spline regression and empirical likelihood Archived
Articles and reports: 12-001-X201900100008
Description:
This paper studies small area quantile estimation under a unit level non-parametric nested-error regression model. We assume the small area specific error distributions satisfy a semi-parametric density ratio model. We fit the non-parametric model via the penalized spline regression method of Opsomer, Claeskens, Ranalli, Kauermann and Breidt (2008). Empirical likelihood is then applied to estimate the parameters in the density ratio model based on the residuals. This leads to natural area-specific estimates of error distributions. A kernel method is then applied to obtain smoothed error distribution estimates. These estimates are then used for quantile estimation in two situations: one is where we only have knowledge of covariate power means at the population level, the other is where we have covariate values of all sample units in the population. Simulation experiments indicate that the proposed methods for small area quantiles estimation work well for quantiles around the median in the first situation, and for a broad range of the quantiles in the second situation. A bootstrap mean square error estimator of the proposed estimators is also investigated. An empirical example based on Canadian income data is included.
Release date: 2019-05-07
10. Methodological Challenges in Official Statistics Archived
Articles and reports: 11-522-X201700014704
Description:
We identify several research areas and topics for methodological research in official statistics. We argue why these are important, and why these are the most important ones for official statistics. We describe the main topics in these research areas and sketch what seems to be the most promising ways to address them. Here we focus on: (i) Quality of National accounts, in particular the rate of growth of GNI (ii) Big data, in particular how to create representative estimates and how to make the most of big data when this is difficult or impossible. We also touch upon: (i) Increasing timeliness of preliminary and final statistical estimates (ii) Statistical analysis, in particular of complex and coherent phenomena. These topics are elements in the present Strategic Methodological Research Program that has recently been adopted at Statistics Netherlands
Release date: 2016-03-24

Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Report a problem or mistake on this page

Date modified:: 2024-05-25