Survey Methodology
Semi-automated classification for multi-label open-ended questions
by Hyukjun Gweon, Matthias Schonlau and Marika WenemarkNote 1
- Release date: December 15, 2020
Abstract
In surveys, text answers from open-ended questions are important because they allow respondents to provide more information without constraints. When classifying open-ended questions automatically using supervised learning, often the accuracy is not high enough. Alternatively, a semi-automated classification strategy can be considered: answers in the easy-to-classify group are classified automatically, answers in the hard-to-classify group are classified manually. This paper presents a semi-automated classification method for multi-label open-ended questions where text answers may be associated with multiple classes simultaneously. The proposed method effectively combines multiple probabilistic classifier chains while avoiding prohibitive computational costs. The performance evaluation on three different data sets demonstrates the effectiveness of the proposed method.
Key Words: Semi-automated classification; Open-ended questions; Multi-label data.
Table of contents
- Section 1. Introduction
- Section 2. Semi-automated classification for text data
- Section 3. Multi-label classification
- Section 4. The majority-voted-based ensemble of PCC for semi-automated classification
- Section 5. Experiments
- Section 6. Discussion
- References
How to cite
Gweon, H., Schonlau, M. and Wenemark, M. (2020). Semi-automated classification for multi-label open-ended questions. Survey Methodology, Statistics Canada, Catalogue No. 12-001-X, Vol. 46, No. 2. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2020002/article/00005-eng.htm.
Note
- Date modified: