Survey Methodology
Semi-automated classification for multi-label open-ended questions

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

by Hyukjun Gweon, Matthias Schonlau and Marika Wenemark^{Note 1}

Release date: December 15, 2020

More information

Abstract

In surveys, text answers from open-ended questions are important because they allow respondents to provide more information without constraints. When classifying open-ended questions automatically using supervised learning, often the accuracy is not high enough. Alternatively, a semi-automated classification strategy can be considered: answers in the easy-to-classify group are classified automatically, answers in the hard-to-classify group are classified manually. This paper presents a semi-automated classification method for multi-label open-ended questions where text answers may be associated with multiple classes simultaneously. The proposed method effectively combines multiple probabilistic classifier chains while avoiding prohibitive computational costs. The performance evaluation on three different data sets demonstrates the effectiveness of the proposed method.

Key Words: Semi-automated classification; Open-ended questions; Multi-label data.

How to cite

Gweon, H., Schonlau, M. and Wenemark, M. (2020). Semi-automated classification for multi-label open-ended questions. Survey Methodology, Statistics Canada, Catalogue No. 12-001-X, Vol. 46, No. 2. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2020002/article/00005-eng.htm.

Note

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2020-12-15

Language selection

Search and menus

Search

Survey Methodology
Semi-automated classification for multi-label open-ended questions

Archived Content

Abstract

Table of contents

How to cite

Note

Survey Methodology Semi-automated classification for multi-label open-ended questions

Archived Content

Abstract

Table of contents

How to cite

Note

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Survey Methodology
Semi-automated classification for multi-label open-ended questions