Life in the FastText Lane: Harnessing Linear Programming Constrained Machine Learning for Classifications Revision

Articles and reports: 11-522-X202500100010
Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
Issue Number: 2025001
Author(s): Evans, Justin; Wile, Laura
Main Product: Statistics Canada International Symposium Series: Proceedings
Format Release date More information
PDF September 8, 2025

Related information

Subjects and keywords

Subjects

Keywords