Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

What’s new on our website

Statistics Canada's Trust Centre

Results

All (171)

All (171) (0 to 10 of 171 results)

1. Statistical disclosure control and special focus groups: A European perspective Archived
Articles and reports: 11-522-X202200100007
Description: With the availability of larger and more diverse data sources, Statistical Institutes in Europe are inclined to publish statistics on smaller groups than they used to do. Moreover, high impact global events like the Covid crisis and the situation in Ukraine may also ask for statistics on specific subgroups of the population. Publishing on small, targeted groups not only raises questions on statistical quality of the figures, it also raises issues concerning statistical disclosure risk. The principle of statistical disclosure control does not depend on the size of the groups the statistics are based on. However, the risk of disclosure does depend on the group size: the smaller a group, the higher the risk. Traditional ways to deal with statistical disclosure control and small group sizes include suppressing information and coarsening categories. These methods essentially increase the (mean) group sizes. More recent approaches include perturbative methods that have the intention to keep the group sizes small in order to preserve as much information as possible while reducing the disclosure risk sufficiently. In this paper we will mention some European examples of special focus group statistics and discuss the implications on statistical disclosure control. Additionally, we will discuss some issues that the use of perturbative methods brings along: its impact on disclosure risk and utility as well as the challenges in proper communication thereof.
Release date: 2024-03-25
2. Children born into vulnerability: Challenges encountered in a Quebec longitudinal survey Archived
Articles and reports: 11-522-X202200100010
Description: Growing Up in Québec is a longitudinal population survey that began in the spring of 2021 at the Institut de la statistique du Québec. Among the children targeted by this longitudinal follow-up, some will experience developmental difficulties at some point in their lives. Those same children often have characteristics associated with higher sample attrition (low-income family, parents with a low level of education). This article describes the two main challenges we encountered when trying to ensure sufficient representativeness of these children, in both the overall results and the subpopulation analyses.
Release date: 2024-03-25
3. Integration of existing data to develop an ethnicity indicator in the LSDDP Archived
Articles and reports: 11-522-X202200100018
Description: The Longitudinal Social Data Development Program (LSDDP) is a social data integration approach aimed at providing longitudinal analytical opportunities without imposing additional burden on respondents. The LSDDP uses a multitude of signals from different data sources for the same individual, which helps to better understand their interactions and track changes over time. This article looks at how the ethnicity status of people in Canada can be estimated at the most detailed disaggregated level possible using the results from a variety of business rules applied to linked data and to the LSDDP denominator. It will then show how improvements were obtained using machine learning methods, such as decision trees and random forest techniques.
Release date: 2024-03-25
4. Market Basket Measure research paper: Applying the Market Basket Measure methodology to an administrative data source
Articles and reports: 75F0002M2024002
Description: This discussion paper describes considerations for applying the Market Basket Measure (MBM) methodology onto a purely administrative data source. The paper will begin by outlining a rationale for estimating MBM poverty statistics using administrative income data sources. It then explains a proposal for creating annual samples along with the caveats of creating these samples, followed by a brief analysis using the proposed samples. The paper concludes with potential future improvements to the samples and provides the opportunity for reader’s feedback.
Release date: 2024-02-08
5. Longitudinal Immigration Database (IMDB) Technical Report, 2022
Articles and reports: 11-633-X2024001
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.
Release date: 2024-01-22
6. Model-based stratification of payment populations in Medicare integrity investigations
Articles and reports: 12-001-X202300200001
Description: When a Medicare healthcare provider is suspected of billing abuse, a population of payments X made to that provider over a fixed timeframe is isolated. A certified medical reviewer, in a time-consuming process, can determine the overpayment Y = X - (amount justified by the evidence) associated with each payment. Typically, there are too many payments in the population to examine each with care, so a probability sample is selected. The sample overpayments are then used to calculate a 90% lower confidence bound for the total population overpayment. This bound is the amount demanded for recovery from the provider. Unfortunately, classical methods for calculating this bound sometimes fail to provide the 90% confidence level, especially when using a stratified sample.
In this paper, 166 redacted samples from Medicare integrity investigations are displayed and described, along with 156 associated payment populations. The 7,588 examined (Y, X) sample pairs show (1) Medicare audits have high error rates: more than 76% of these payments were considered to have been paid in error; and (2) the patterns in these samples support an “All-or-Nothing” mixture model for (Y, X) previously defined in the literature. Model-based Monte Carlo testing procedures for Medicare sampling plans are discussed, as well as stratification methods based on anticipated model moments. In terms of viability (achieving the 90% confidence level) a new stratification method defined here is competitive with the best of the many existing methods tested and seems less sensitive to choice of operating parameters. In terms of overpayment recovery (equivalent to precision) the new method is also comparable to the best of the many existing methods tested. Unfortunately, no stratification algorithm tested was ever viable for more than about half of the 104 test populations.
Release date: 2024-01-03
7. Sample designs and estimators for multimode surveys with face-to-face data collection
Articles and reports: 12-001-X202300200006
Description: Survey researchers are increasingly turning to multimode data collection to deal with declines in survey response rates and increasing costs. An efficient approach offers the less costly modes (e.g., web) followed with a more expensive mode for a subsample of the units (e.g., households) within each primary sampling unit (PSU). We present two alternatives to this traditional design. One alternative subsamples PSUs rather than units to constrain costs. The second is a hybrid design that includes a clustered (two-stage) sample and an independent, unclustered sample. Using a simulation, we demonstrate the hybrid design has considerable advantages.
Release date: 2024-01-03
8. QR prediction for statistical data integration
Articles and reports: 12-001-X202300200009
Description: In this paper, we investigate how a big non-probability database can be used to improve estimates of finite population totals from a small probability sample through data integration techniques. In the situation where the study variable is observed in both data sources, Kim and Tam (2021) proposed two design-consistent estimators that can be justified through dual frame survey theory. First, we provide conditions ensuring that these estimators are more efficient than the Horvitz-Thompson estimator when the probability sample is selected using either Poisson sampling or simple random sampling without replacement. Then, we study the class of QR predictors, introduced by Särndal and Wright (1984), to handle the less common case where the non-probability database contains no study variable but auxiliary variables. We also require that the non-probability database is large and can be linked to the probability sample. We provide conditions ensuring that the QR predictor is asymptotically design-unbiased. We derive its asymptotic design variance and provide a consistent design-based variance estimator. We compare the design properties of different predictors, in the class of QR predictors, through a simulation study. This class includes a model-based predictor, a model-assisted estimator and a cosmetic estimator. In our simulation setups, the cosmetic estimator performed slightly better than the model-assisted estimator. These findings are confirmed by an application to La Poste data, which also illustrates that the properties of the cosmetic estimator are preserved irrespective of the observed non-probability sample.
Release date: 2024-01-03
9. Comments by Françoise Dupont on “Jean-Claude Deville’s contributions to survey theory and official statistics”
Articles and reports: 12-001-X202300200014
Description: Many things have been written about Jean-Claude Deville in tributes from the statistical community (see Tillé, 2022a; Tillé, 2022b; Christine, 2022; Ardilly, 2022; and Matei, 2022) and from the École nationale de la statistique et de l’administration économique (ENSAE) and the Société française de statistique. Pascal Ardilly, David Haziza, Pierre Lavallée and Yves Tillé provide an in-depth look at Jean-Claude Deville’s contributions to survey theory. To pay tribute to him, I would like to discuss Jean-Claude Deville’s contribution to the more day-to-day application of methodology for all the statisticians at the Institut national de la statistique et des études économiques (INSEE) and at the public statistics service. To do this, I will use my work experience, and particularly the four years (1992 to 1996) I spent working with him in the Statistical Methods Unit and the discussions we had thereafter, especially in the 2000s on the rolling census.
Release date: 2024-01-03
10. Construction of a Northern Market Basket Measure of poverty for Nunavut
Articles and reports: 75F0002M2022003
Description: This discussion paper describes the proposed methodology for a Northern Market Basket Measure (MBM-N) for Nunavut, as well as identifies research which could be conducted in preparation for the 2023 review. The paper presents initial MBM-N thresholds and provides preliminary poverty estimates for reference years 2018 to 2021. A review period will follow the release of this paper, during which time Statistics Canada and Employment and Social Development Canada will welcome feedback from interested parties and work with experts, stakeholders, indigenous organizations, federal, provincial and territorial officials to validate the results.
Release date: 2023-06-21

Stats in brief (6)

Stats in brief (6) ((6 results))

1. Confidentiality Vetting Support: Dominance and homogeneity using SAS
Stats in brief: 89-20-00082021001
Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-29
2. Confidentiality Vetting Support: Proportion and round tool using SAS
Stats in brief: 89-20-00082021002
Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
3. Confidentiality Vetting Support: Rounding proportions using Stata
Stats in brief: 89-20-00082021003
Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
4. Confidentiality Vetting Support: Dominance and homogeneity using the tcensus function (Stata)
Stats in brief: 89-20-00082021004
Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-27
5. Confidentiality Vetting Support: Rounding proportions using Rounder An R Shiny App
Stats in brief: 89-20-00082021005
Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to create proportion output for researchers working with confidential data.
Release date: 2022-04-27
6. Confidentiality Vetting Support: Dominance and homogeneity using R
Stats in brief: 89-20-00082021006
Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to perform the dominance and homogeneity test while using the Census.
Release date: 2022-04-27

Articles and reports (165)

Articles and reports (165) (0 to 10 of 165 results)

1. Statistical disclosure control and special focus groups: A European perspective Archived
Articles and reports: 11-522-X202200100007
Description: With the availability of larger and more diverse data sources, Statistical Institutes in Europe are inclined to publish statistics on smaller groups than they used to do. Moreover, high impact global events like the Covid crisis and the situation in Ukraine may also ask for statistics on specific subgroups of the population. Publishing on small, targeted groups not only raises questions on statistical quality of the figures, it also raises issues concerning statistical disclosure risk. The principle of statistical disclosure control does not depend on the size of the groups the statistics are based on. However, the risk of disclosure does depend on the group size: the smaller a group, the higher the risk. Traditional ways to deal with statistical disclosure control and small group sizes include suppressing information and coarsening categories. These methods essentially increase the (mean) group sizes. More recent approaches include perturbative methods that have the intention to keep the group sizes small in order to preserve as much information as possible while reducing the disclosure risk sufficiently. In this paper we will mention some European examples of special focus group statistics and discuss the implications on statistical disclosure control. Additionally, we will discuss some issues that the use of perturbative methods brings along: its impact on disclosure risk and utility as well as the challenges in proper communication thereof.
Release date: 2024-03-25
2. Children born into vulnerability: Challenges encountered in a Quebec longitudinal survey Archived
Articles and reports: 11-522-X202200100010
Description: Growing Up in Québec is a longitudinal population survey that began in the spring of 2021 at the Institut de la statistique du Québec. Among the children targeted by this longitudinal follow-up, some will experience developmental difficulties at some point in their lives. Those same children often have characteristics associated with higher sample attrition (low-income family, parents with a low level of education). This article describes the two main challenges we encountered when trying to ensure sufficient representativeness of these children, in both the overall results and the subpopulation analyses.
Release date: 2024-03-25
3. Integration of existing data to develop an ethnicity indicator in the LSDDP Archived
Articles and reports: 11-522-X202200100018
Description: The Longitudinal Social Data Development Program (LSDDP) is a social data integration approach aimed at providing longitudinal analytical opportunities without imposing additional burden on respondents. The LSDDP uses a multitude of signals from different data sources for the same individual, which helps to better understand their interactions and track changes over time. This article looks at how the ethnicity status of people in Canada can be estimated at the most detailed disaggregated level possible using the results from a variety of business rules applied to linked data and to the LSDDP denominator. It will then show how improvements were obtained using machine learning methods, such as decision trees and random forest techniques.
Release date: 2024-03-25
4. Market Basket Measure research paper: Applying the Market Basket Measure methodology to an administrative data source
Articles and reports: 75F0002M2024002
Description: This discussion paper describes considerations for applying the Market Basket Measure (MBM) methodology onto a purely administrative data source. The paper will begin by outlining a rationale for estimating MBM poverty statistics using administrative income data sources. It then explains a proposal for creating annual samples along with the caveats of creating these samples, followed by a brief analysis using the proposed samples. The paper concludes with potential future improvements to the samples and provides the opportunity for reader’s feedback.
Release date: 2024-02-08
5. Longitudinal Immigration Database (IMDB) Technical Report, 2022
Articles and reports: 11-633-X2024001
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.
Release date: 2024-01-22
6. Model-based stratification of payment populations in Medicare integrity investigations
Articles and reports: 12-001-X202300200001
Description: When a Medicare healthcare provider is suspected of billing abuse, a population of payments X made to that provider over a fixed timeframe is isolated. A certified medical reviewer, in a time-consuming process, can determine the overpayment Y = X - (amount justified by the evidence) associated with each payment. Typically, there are too many payments in the population to examine each with care, so a probability sample is selected. The sample overpayments are then used to calculate a 90% lower confidence bound for the total population overpayment. This bound is the amount demanded for recovery from the provider. Unfortunately, classical methods for calculating this bound sometimes fail to provide the 90% confidence level, especially when using a stratified sample.
In this paper, 166 redacted samples from Medicare integrity investigations are displayed and described, along with 156 associated payment populations. The 7,588 examined (Y, X) sample pairs show (1) Medicare audits have high error rates: more than 76% of these payments were considered to have been paid in error; and (2) the patterns in these samples support an “All-or-Nothing” mixture model for (Y, X) previously defined in the literature. Model-based Monte Carlo testing procedures for Medicare sampling plans are discussed, as well as stratification methods based on anticipated model moments. In terms of viability (achieving the 90% confidence level) a new stratification method defined here is competitive with the best of the many existing methods tested and seems less sensitive to choice of operating parameters. In terms of overpayment recovery (equivalent to precision) the new method is also comparable to the best of the many existing methods tested. Unfortunately, no stratification algorithm tested was ever viable for more than about half of the 104 test populations.
Release date: 2024-01-03
7. Sample designs and estimators for multimode surveys with face-to-face data collection
Articles and reports: 12-001-X202300200006
Description: Survey researchers are increasingly turning to multimode data collection to deal with declines in survey response rates and increasing costs. An efficient approach offers the less costly modes (e.g., web) followed with a more expensive mode for a subsample of the units (e.g., households) within each primary sampling unit (PSU). We present two alternatives to this traditional design. One alternative subsamples PSUs rather than units to constrain costs. The second is a hybrid design that includes a clustered (two-stage) sample and an independent, unclustered sample. Using a simulation, we demonstrate the hybrid design has considerable advantages.
Release date: 2024-01-03
8. QR prediction for statistical data integration
Articles and reports: 12-001-X202300200009
Description: In this paper, we investigate how a big non-probability database can be used to improve estimates of finite population totals from a small probability sample through data integration techniques. In the situation where the study variable is observed in both data sources, Kim and Tam (2021) proposed two design-consistent estimators that can be justified through dual frame survey theory. First, we provide conditions ensuring that these estimators are more efficient than the Horvitz-Thompson estimator when the probability sample is selected using either Poisson sampling or simple random sampling without replacement. Then, we study the class of QR predictors, introduced by Särndal and Wright (1984), to handle the less common case where the non-probability database contains no study variable but auxiliary variables. We also require that the non-probability database is large and can be linked to the probability sample. We provide conditions ensuring that the QR predictor is asymptotically design-unbiased. We derive its asymptotic design variance and provide a consistent design-based variance estimator. We compare the design properties of different predictors, in the class of QR predictors, through a simulation study. This class includes a model-based predictor, a model-assisted estimator and a cosmetic estimator. In our simulation setups, the cosmetic estimator performed slightly better than the model-assisted estimator. These findings are confirmed by an application to La Poste data, which also illustrates that the properties of the cosmetic estimator are preserved irrespective of the observed non-probability sample.
Release date: 2024-01-03
9. Comments by Françoise Dupont on “Jean-Claude Deville’s contributions to survey theory and official statistics”
Articles and reports: 12-001-X202300200014
Description: Many things have been written about Jean-Claude Deville in tributes from the statistical community (see Tillé, 2022a; Tillé, 2022b; Christine, 2022; Ardilly, 2022; and Matei, 2022) and from the École nationale de la statistique et de l’administration économique (ENSAE) and the Société française de statistique. Pascal Ardilly, David Haziza, Pierre Lavallée and Yves Tillé provide an in-depth look at Jean-Claude Deville’s contributions to survey theory. To pay tribute to him, I would like to discuss Jean-Claude Deville’s contribution to the more day-to-day application of methodology for all the statisticians at the Institut national de la statistique et des études économiques (INSEE) and at the public statistics service. To do this, I will use my work experience, and particularly the four years (1992 to 1996) I spent working with him in the Statistical Methods Unit and the discussions we had thereafter, especially in the 2000s on the rolling census.
Release date: 2024-01-03
10. Construction of a Northern Market Basket Measure of poverty for Nunavut
Articles and reports: 75F0002M2022003
Description: This discussion paper describes the proposed methodology for a Northern Market Basket Measure (MBM-N) for Nunavut, as well as identifies research which could be conducted in preparation for the 2023 review. The paper presents initial MBM-N thresholds and provides preliminary poverty estimates for reference years 2018 to 2021. A review period will follow the release of this paper, during which time Statistics Canada and Employment and Social Development Canada will welcome feedback from interested parties and work with experts, stakeholders, indigenous organizations, federal, provincial and territorial officials to validate the results.
Release date: 2023-06-21

Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Report a problem or mistake on this page

Date modified:: 2024-04-18