Statistical Methodology Research and Development Program Achievements, 2024/2025
4. Confidentiality and Access
Confidentiality research at Statistics Canada continued to focus on developing new methods and ideas that offer alternative forms of access while continuing to ensure that personal individual and business information is not disclosed in any way. Progress was made on the projects described below. The team responsible for the Centre for Confidentiality and Access at Statistics Canada also continued to offer consultation services to internal and external partners as a way to help develop capacity in disclosure risk identification and treatment (see Section 5.5).
PROJECT: Confidentiality assessment for small area estimates
As indicated in the previous report, Statistics Canada has no official guidance on confidentiality rules for releasing small area estimates and no official study has yet been conducted on the subject.
Progress:
A paper (Tang, 2024) outlining the simulation process and discussing the justifications for proposed confidentiality rules was presented at the 2024 Methodology Symposium and will be published as part of the proceedings.
For more information, please contact:
Cissy Tang (cissy.tang@statcan.gc.ca).
Reference
Tang, C. (2024). Statistical disclosure control analysis for small area estimation. Proceedings: Symposium 2024, The Future of Official Statistics, Statistics Canada, Ottawa, Canada.
PROJECT: Synthetic data
Working towards more options for data users is essential. Creating synthetic data is a way to address confidentiality issues with personal data while retaining as much analytical value as possible. Synthetic data can be especially useful when looking for collaborative opportunities with external stakeholders that may not have access to the confidential microdata.
Progress:
Based on the success of releasing the Synthetic database for the PASSAGES dynamic microsimulation model, a draft version of an update to internal guidelines for the creation of synthetic data files was prepared (Gauvin, 2025). This will replace the existing documentation used to guide developers in the creation of synthetic data.
Gauvin (2024) presented her work on developing the synthetic data for the PASSAGES model at the 2024 Statistical Society of Canada’s Annual Meeting.
Yu (2024) presented a review on synthetic data disclosure risk assessment as part of the 2024 International Methodology Symposium.
For more information, please contact:
Héloïse Gauvin (heloise.gauvin@statcan.gc.ca) or
Steven Thomas (steven.thomas@statcan.gc.ca).
References
Gauvin, H. (2025). Practical Guidelines for the Creation of Synthetic Data Files. Internal document, Statistics Canada.
Gauvin, H. (2024). Creating a synthetic version of a longitudinal and structured file: Challenges and lessons. Proceedings of the Survey Methods Section, Statistical Society of Canada, St. Johns, NF, https://ssc.ca/sites/default/files/imce/gauvin_ssc2024.pdf.
Yu, Z. (2024). Synthetic data disclosure risk assessment. Proceedings: Symposium 2024, The Future of Official Statistics, Statistics Canada, Ottawa, Canada.
PROJECT: Optimization strategies for complementary cell suppression
Complementary Cell Suppression (CCS) is a standard method for suppressing confidentially sensitive cells when releasing tabular magnitude variables. This methodology is well developed and supported through Statistics Canada G-Confid solution, where optimal suppression solutions are obtained that ensure that suppression patterns are valid and minimize the amount of information being suppressed.
Progress:
Statistics Canada has successfully built a beta, Python-based solution of G-Confid. That includes an optimal additive rounding solution, a tool for the calculation of sensitivity measures, the creation of an optimal CCS pattern, and an audit program to ensure that patterns are valid. Open-source solutions available through the PuLP Python package were studied as potential replacements and with careful implementation applied on the most complex cases at Statistics Canada.
For more information, please contact:
Steven Thomas (steven.thomas@statcan.gc.ca).
PROJECT: A Utility-Disclosure Risk Framework for Comparing Statistical Disclosure Control Mechanisms
As a National Statistical Organization (NSO), Statistics Canada has been tasked with finding ways to release data at more detailed levels. There is a particular interest in granting statistical analyses on small subpopulations at finer geographies. This is referred to as the Disaggregated Data Action Plan (DDAP) initiative. However, data dissemination must always be done in a manner that complies with the confidentiality provisions of the Statistics Act. A key component of releasing confidentiality compliant statistical output is to apply Statistical Disclosure Control (SDC) to statistical output prior to its release.
Progress:
To better suit the needs of users, an analysis of the various SDC methods is being researched, with the goal of measuring the utility and disclosure risk associated with them. A utility-disclosure risk trade-off framework is being proposed for comparing the presented SDC methods. By applying the framework, a data disseminator can select an SDC method that optimally balances disclosure risk and utility in their given situation. An internal working paper is in development and will be considered for external publication.
For more information, please contact:
Joshua Miller (joshua.miller@statcan.gc.ca).
PROJECT: A Utility-Disclosure Risk-Based Impact Assessment of the Implementation of Differential Privacy in the Context of Tabular Data Dissemination
Differential Privacy (DP) is a framework for Statistical Disclosure Control (SDC) that seeks to constrain the level of private information leakage that would be incurred based on any single individual’s contribution to a private database after statistical output is released. This framework was first studied in Dwork, McSherry, Nissim and Smith (2006). A DP compliant SDC framework has been proposed by academics and government agencies. Specifically, methods that lend themselves to the framework have been employed as alternatives to traditional SDC methods such as rounding and cell suppression. For example, the United States (U.S.) Census Bureau has implemented DP algorithms as a method to publish results for their 2020 Census as highlighted in Abowd, Ashmead, Cumings-Menon, Garfinkel, Heineck, Heiss, Johns, Kifer, Leclerc, Machanavajjhala, Moran, Sexton, Spence and Zhuravlev (2022). It is therefore of utmost interest for Statistics Canada to fully understand the DP framework and invest in areas where it could translate into great improvements for its data dissemination ecosystem.
Although DP has been known to provide strong privacy guarantees, it is not without its criticisms. A common criticism of DP is the impact that the methods have on the utility of the published output. Moreover, delineating an appropriate trade-off between utility and disclosure risk can be difficult. In the DP context, this translates to determining an appropriate privacy budget. Furthermore, if improperly implemented, DP can fail to protect sensitive cells. Thus, an assessment on the utility of DP compliant output as well as a closer study of the challenges of DP is warranted. This also entails a comparison of DP compliant methods to traditional SDC methods.
Progress:
A detailed paper was written that summarized the practical challenges that are faced when adopting a DP framework (Miller, 2025). This paper includes a summary of DP and analysis of its core properties. A comparison of the different definitions of DP is also included. Moreover, an intuitive explanation of what guarantees a DP framework provides is discussed. Additionally, traditional SDC methods, such as random rounding, are examined through the lens of DP. A major practical barrier that disseminators face when considering the adoption of DP is the selection of an appropriate privacy parameter. This issue is discussed at length in the paper with some suggestions proposed.
A DP compliant SDC algorithm was tested on 2021 Canadian Census data to demonstrate what the adoption of DP could look like in practice. The algorithm that was implemented was based off the U.S. Census Bureau TopDown algorithm. The TopDown algorithm was first applied to 2020 American Census data as discussed in Abowd et al. (2022). A key finding of the Canadian Census case study was that the utility of statistical output can be significantly compromised if the goal of the disseminator is to provide meaningful privacy guarantees.
The DP paper by Miller (2025) is under peer review and will be officialised in the 2025-2026 fiscal year as a branch working paper. The research results were also shared in the form of a methodology research seminar.
For more information, please contact:
Joshua Miller (joshua.miller@statcan.gc.ca).
References
Abowd, J.M., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M. and Zhuravlev, P. (2022). The 2020 Census disclosure avoidance system TopDown algorithm. Harvard Data Science Review, (Special Issue 2).
Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3), 17-51.
Miller, J. (2025). A Privacy-Utility Based Study of Differential Privacy. Internal report, Statistics Canada.
- Date modified: