Practical Applications of Synthetic Data Generation
Articles and reports: 11-522-X202500100001Description: Synthetic data generation (SDG) is increasingly applied across sectors for privacy-preserving data sharing, de-biasing and augmentation. Each use case requires a distinct set of evaluation metrics that must account for the stochasticity of the SDG process: membership and attribute disclosure vulnerability are critical for privacy; fidelity and downstream task utility apply more broadly; and fairness and diversity are relevant for de-biasing and augmentation, respectively. Presenting accumulated evidence and through exemplar case studies, it is shown that SDG can perform well across many of these use cases and our key learnings from our experiences with synthetic health data are shared.Issue Number: 2025001Author(s): El Emam, Khaled; Pilgram, LisaMain Product:Statistics Canada International Symposium Series: Proceedings