Semi-automated classification for multi-label open-ended questions
Section 5. Experiments

5.1  Data

We evaluated the performance of the MEPCC algorithm on three different data sets: Civil disobedience, Immigrant and Happy data (the Happy data are available upon request by contacting Marika Wenemark marika.wenemark@liu.se. The Immigrant and Civil Disobedience data are available from the GESIS Datorium http://dx.doi.org/10.7802/1795). For each data set, an open-ended question was asked to the respondents and their answers have been coded manually with possibly multiple labels.

The Civil data set was collected to study cross-cultural equivalence about Civil disobedience. Behr, Braun, Kaczmirek and Bandilla (2014) first asked respondents a closed-ended question from the ISSP (ISSP Research Group, 2012) How important is it that citizens may engage in acts of Civil disobedience when they oppose government actions? (Not at all important 1 − Very important 7). The respondents were then asked: What ideas do you associate with the phrase “Civil disobedience”? Please give examples. Answers were classified into 12 labels: non-productive, violence, disturbances, peaceful, listing activities, breadth of actions, breaking law, breaking rules, government:dissatisfaction, government:deep rift, copy/paste from the Internet, other. The survey data were collected in different languages and we use a merged data set (Spanish, German and Danish) that contains 1,029 observations.

The Immigrant data set was collected to study cross-national equivalence of measures of xenophobia. In the 2003 International Social Survey Program (ISSP) on National Identity, the questionnaire contained four statements regarding beliefs on Immigrants such as Immigrants take jobs from people who were born in Germany. After rating each statement, respondents were asked to answer to an open-ended question: Which type of Immigrants were you thinking of when you answered the question? The previous statement was: [text of the corresponding item]. Braun, Behr and Kaczmirek (2013) classified answers into 14 labels: non-productive, positive, negative, neutral/work, general, Muslim countries, eastern European, Asia, ex-Yugoslavia, EU15, sub Sahara, Sinti/Roma, legal/illegal, other. In this article, we use 1,006 observations from the German survey.

The Happy data set was collected to study the relationship between positive factors and mental health and care needs. Wenemark, Borgstedt-Risberg, Garvin, Dahlin, Jusufbegovic, Gamme, Johansson and Bjrn (2018) asked respondents “Name some positive things in your life, that are uplifting or make you Happy: (you may write several things)”. Answers were classified into 13 labels: nothing, relationships (family or romantic), working/studying, health, self-esteem, joy/happiness, well-being: drinking/eating/drugs/sex, spirituality, money, nature, hobbies, culture, and exercise. The data set contains 2,350 observations.

Table 5.1 contains summary statistics about the three data sets.


Table 5.1
Summary statistics of data sets: number of total observations, features and labels and average number of relevant labels, and percentage of observations that are associated with more than one label ( P |L|>1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8qrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeGabeqadeaakeaacaGGOaGaaCiuamaaBaaaleaacaaI8b GaaGjcVlaadYeacaaMi8UaaGiFaiaaysW7caaI+aGaaGjbVlaaigda aeqaaOGaaiykaaaa@3E1A@
Table summary
This table displays the results of Summary statistics of data sets: number of total observations. The information is grouped by Data (appearing as row headers), > # MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeGabeqadeaakeaacaaIJaaaaa@33F5@ observations, > # MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeGabeqadeaakeaacaaIJaaaaa@33F5@ features, L, av. > # MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeGabeqadeaakeaacaaIJaaaaa@33F5@ of labels and P |L|>1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeGabeqadeaakeaacaWHqbWaaSbaaSqaaiaaiYhacaaMi8 UaamitaiaayIW7caaI8bGaaGjbVlaai6dacaaMe8UaaGymaaqabaaa aa@3EEA@ (appearing as column headers).
Data # observations # features L av. # of labels P |L|>1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeGabeqadeaakeaacaWHqbWaaSbaaSqaaiaaiYhacaaMi8 UaamitaiaayIW7caaI8bGaaGjbVlaai6dacaaMe8UaaGymaaqabaaa aa@3EEA@
Civil 1,029 305 12 1.15 13.80%
Immigrant 1,006 273 14 1.19 13.72%
Happy 2,350 492 13 2.77 87.40%

5.2  Experimental setup

We compared the proposed MEPCC method against BR and LP and PCC. For PCC, we used the uniform search to reach a predicted label set and the estimated probability of equation (3.1) for the confidence score of the prediction. EPCC was not included in the comparison because its computational cost makes it infeasible for prediction for our data sets. (In our experiment on the Immigrant data with 14 labels, running the exhaustive search for PCC ( m = 1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqadeqaaiaad2gacaaMe8Uaeyypa0 JaaGjbVlaaigdaaiaawIcacaGLPaaaaaa@3873@ for a single prediction took a single computer (Intel Core i7 CPU with 8GB RAM) over 30 minutes. This implies that predicting 200 observations using EPCC ( m = 10 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqadeqaaiaad2gacaaMe8Uaeyypa0 JaaGjbVlaaigdacaaIWaaacaGLOaGaayzkaaaaaa@392D@ would take more than 1,000 hours.) Support vector machines (SVM) (Vapnik, 2000) were used as the base classifier on unscaled variables with a linear kernel and tuning parameter C = 1. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGdbGaaGjbVlaai2dacaaMe8UaaG ymaiaac6caaaa@3732@ For probabilistic output, the SVM scores were converted into probabilities using Platt’s method (Platt, 2000). The analysis was conducted in R MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGsbaaaa@31F3@ (R Core Team, 2014) using the e 1071 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGLbGaaGymaiaaicdacaaI3aGaaG ymaaaa@34F7@ package (Meyer, Dimitriadou, Hornik, Weingessel and Leisch, 2014) for SVM.

For each data set, 5-fold cross validation (CV) was performed. That is, we randomly divided the data into five equal-sized parts and used the first four parts as the training data and the last part as the test data. Performance evaluation is only made on the test data. Each of the five parts were used as test data and the results were averaged.

5.3  Performance of the MEPCC approach

We first investigated the performance of the MEPCC. The score in equation (4.1) has two components. To demonstrate that both components are helpful, we evaluate the proposed score as well as two different scores where one of the components is missing. That is, we compared the MEPCC with three different scores θ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaH4oqCcaGGSaaaaa@3382@ θ 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaH4oqCdaWgaaWcbaGaaGymaaqaba aaaa@33B9@ and θ 2 : MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaH4oqCdaWgaaWcbaGaaGOmaaqaba GccaaMi8UaaeOoaaaa@3612@

( MEPCC ) θ = ( i J P j | J | ) ( | J | m ) ( MEPCC 1 ) θ 1 = ( i J P j | J | ) ( MEPCC 2 ) θ 2 = ( | J | m ) . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakqaabeqaaiaacIcacaWGnbGaamyraiaadc facaWGdbGaam4qaiaacMcacqaH4oqCcaaMe8UaaGypaiaaysW7daqa daqaamaalaaabaWaaabeaeaacaWHqbWaaSbaaSqaaiaadQgaaeqaaa qaaiaadMgacqGHiiIZcaWGkbaabeqdcqGHris5aaGcbaWaaqWabeaa caaMi8UaamOsaiaayIW7aiaawEa7caGLiWoaaaaacaGLOaGaayzkaa GaaGjbVpaabmaabaWaaSaaaeaadaabdeqaaiaayIW7caWGkbGaaGjc VdGaay5bSlaawIa7aaqaaiaad2gaaaaacaGLOaGaayzkaaaabaGaai ikaiaad2eacaWGfbGaamiuaiaadoeacaWGdbGaeyOeI0IaaGPaVlaa ykW7caqGXaGaaGykaiaaywW7cqaH4oqCdaWgaaWcbaGaaGymaaqaba GccaaI9aWaaeWaaeaadaWcaaqaamaaqababaGaaCiuamaaBaaaleaa caWGQbaabeaaaeaacaWGPbGaeyicI4SaamOsaaqab0GaeyyeIuoaaO qaamaaemqabaGaaGjcVlaadQeacaaMi8oacaGLhWUaayjcSdaaaaGa ayjkaiaawMcaaaqaaiaacIcacaWGnbGaamyraiaadcfacaWGdbGaam 4qaiabgkHiTiaaykW7caaMc8UaaeOmaiaaiMcacaaMf8UaeqiUde3a aSbaaSqaaiaaikdaaeqaaOGaaGjbVlaai2dacaaMe8+aaeWaaeaada WcaaqaamaaemqabaGaaGjcVlaadQeacaaMi8oacaGLhWUaayjcSdaa baGaamyBaaaaaiaawIcacaGLPaaacaaIUaaaaaa@9261@

Prioritizing the text answers based on θ 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaH4oqCdaWgaaWcbaGaaGOmaaqaba aaaa@33BA@ results in many ties. The tied answers were randomly reordered to be able to calculate subset accuracy at each production rate. Figure 5.1 shows the subset accuracy of each approach as a function of the production rate. The text answers with higher scores were classified first. For example, production rate 0.2 means only 20% of the test data with the highest scores were classified automatically by the models. When the production rate equals 1, there was no difference between the MEPCC models because the predicted label sets are always the same. The difference is how they prioritize the text answers from the easiest-to-classify to the hardest-to-classify answers. When the production rate was less than 1, MEPCC outperformed MEPCC-1 and MEPCC-2 for all three data. The results show that both components in equation (4.1) were helpful for prioritizing the observations.

Figure 5.1 Subset accuracy of three variations on MEPCC as a function of production rate

Description for Figure 5.1

Figure presenting three graphs, one for each of the following data sets: Civil, Immigrant and Happy. The subset accuracy is on the y-axis, ranging from 0.6 to 1.0. The percentage of automated categorization is on the x-axis, ranging from 0.2 to 1.0. Each graph compares three approaches: MEPCC, MEPCC -1 and MEPCC -2. When the production rate is 1, there is no difference between the MEPCC models. When the production rate is lower than 1, MEPCC outperforms MEPCC-1 and MEPCC-2 for the three data sets.

5.4  Effect of the number of PCC models

We then investigated to what extent the number of PCC models affects the predictive performance of MEPCC. Figure 5.2 shows the performance of MEPCC for different number of PCC models ( m ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqadeqaaiaayIW7caWGTbGaaGjcVd GaayjkaiaawMcaaiaac6caaaa@376C@ When m MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbaaaa@320E@ was low, increasing m MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbaaaa@320E@ led to huge improvement of the subset accuracy of MEPCC. However, once there were enough PCC models (e.g., m = 10 ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbGaaGjbVlabg2da9iaaysW7ca aIXaGaaGimaiaacMcacaGGSaaaaa@3900@ adding more PCC models did not improve the subset accuracy. The empirical results show that MEPCC does not require many PCC models for performing well.

Figure 5.2 The effect of the number of PCC models (m) used for MEPCC

Description for Figure 5.2

Figure presenting three graphs, one for each of the following data sets: Civil, Immigrant and Happy. They show the performance of MEPCC for different number of PCC models. The subset accuracy is on the y-axis, ranging from 0.6 to 1.0. The percentage of automated categorization is on the x-axis, ranging from 0.2 to 1.0. Each graph compares five numbers of PCC models: 1, 5, 10, 20 and 30. When the number of PCC models is low, increasing it leads to huge improvement of the subset accuracy of MEPCC. However, once there are enough PCC model (e.g., m = 10 ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9r8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbGaeyypa0JaaGymaiaaicdaca GGPaGaaiilaaaa@35DF@  adding more PCC models doesn’t improve the subset accuracy. The empirical results show that MEPCC doesn’t require many PCC models for performing well.

5.5  Comparison with other methods

At last we investigated the performance of MEPCC ( m = 10 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqadeqaaiaad2gacaaMe8UaaGypai aaysW7caaIXaGaaGimaaGaayjkaiaawMcaaaaa@38EE@ compared to the established methods (BR, LP and PCC). For all methods, a production rate of x% refers to the x% of the data that have the highest score. MEPCC used θ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaH4oqCaaa@32D2@ as a score, while each of the other approaches used the probability of the predicted label set estimated by that method. Note when m = 1 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbGaaGjbVlaai2dacaaMe8UaaG ymaiaacYcaaaa@375A@ MEPCC and PCC are identical; the score θ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaH4oqCaaa@32D2@ coincides with the probability of the label set predicted by PCC.

Figures 5.3 and 5.4 illustrate the respective subset accuracy and Hamming loss for the different methods as a function of the production rate on the Happy, Immigrant and Civil data. For the Immigrant and Happy data, the highest subset accuracy at most production rates was obtained by MEPCC. For the Civil data, MEPCC and LP performed the best. In terms of Hamming loss, MEPCC achieved the lowest error at most production rates for all data.

Figure 5.3 Semi-automated result (subset accuracy) for the three data from the 5-fold cross validation

Description for Figure 5.3

Figure presenting three graphs, one for each of the following data sets: Civil, Immigrant and Happy. The subset accuracy is on the y-axis, ranging from 0.5 to 1.0 for the first two data sets and from 0.4 to 1.0 for the last one. The percentage of automated categorization is on the x-axis, ranging from 0.2 to 1.0. Each graph compares four methods: BR, LP, PCC and MEPCC. For the Immigrant and Happy data sets, the highest subset accuracy at most production rate was obtained by MEPCC. For the Civil data, MEPCC and LP performed the best.

Figure 5.4 Semi-automated result (Hamming loss) for the three data from the 5-fold cross validation

Description for Figure 5.4

Figure presenting three graphs, one for each of the following data sets: Civil, Immigrant and Happy. The Hamming loss is on the y-axis, ranging from 0.00 to 0.08 for the first two data sets and from 0.00 to 0.10 for the last one. The percentage of automated categorization is on the x-axis, ranging from 0.2 to 1.0. Each graph compares four methods: BR, LP, PCC and MEPCC. MEPCC achieves the lowest error at most production rates for the three data sets.

Next, we consider the performance of each method given target predicted accuracy values. To decide the fraction of automatic categorization, a practitioner will typically set a threshold probability above which texts are coded automatically. For MEPCC, the relationship between true accuracy and the confidence score ( θ ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqadeqaaiaayIW7cqaH4oqCcaaMi8 oacaGLOaGaayzkaaaaaa@377D@ were estimated via cross-validation on the training data. We used Platt’s scaling to convert the confidence scores into probability outputs. Since Platt’s scaling could improve the level of calibration (Niculescu-Mizil and Caruana, 2005), the same technique was also applied to BR, LP and PCC.

Table 5.2 illustrates the tradeoff between the percentages of automated prediction and the corresponding subset accuracy of each method as a function of different thresholds. The threshold refers to the minimum predicted subset accuracy required for automated prediction. The minimum predicted subset accuracy helps us decide which text answers should be classified automatically and which should be classified manually. For example, if the client decides that at least 80% accuracy is required for automated classification, then approximately 39.3% of the Civil data, 42.5% of the Immigrant data, and 27.6% of the Happy data can be classified automatically by MEPCC with subset accuracy 0.891, 0.916 and 0.857, respectively. Note that this is a huge improvement compared to applying BR that could only automatically classify 9.3% of the Civil data, 12.8% of the Immigrant data, and 8.7% of the Happy data with lower subset accuracies. Table 5.3 shows the relationship between predicted and actual accuracy by aggregating to ranges of predictions for each method and data set. For MEPCC the actual accuracy is within the range of the predicted accuracy in most cases, much better than for the other methods.


Table 5.2
Semi-automated result for the three data at different decision thresholds. P represents the percentage of automated predictions and SA represents the subset accuracy for the automated prediction results
Table summary
This table displays the results of Semi-automated result for the three data at different decision thresholds. P represents the percentage of automated predictions and SA represents the subset accuracy for the automated prediction results. The information is grouped by Data (appearing as row headers), Threshold, BR, LP, PCC and MEPCC (appearing as column headers).
Data Threshold BR LP PCC MEPCC
P SA P SA P SA P SA
Civil 0.9 0.7% 0.667 16.5% 0.967 0.0% NA 13.0% 0.978
0.8 9.3% 0.893 34.3% 0.898 15.1% 0.787 39.3% 0.891
0.7 18.4% 0.846 46.6% 0.852 36.4% 0.817 45.8% 0.860
0.6 25.4% 0.768 50.6% 0.831 52.1% 0.771 52.9% 0.820
Immigrant 0.9 3.7% 0.858 11.1% 0.959 1.3% 0.558 31.5% 0.947
0.8 12.8% 0.779 30.4% 0.890 27.7% 0.859 42.5% 0.916
0.7 26.6% 0.743 38.6% 0.863 42.4% 0.829 55.1% 0.862
0.6 41.7% 0.715 53.6% 0.806 50.5% 0.795 62.7% 0.839
Happy 0.9 1.3% 0.592 8.9% 0.850 0.1% 0.750 1.0% 0.830
0.8 8.7% 0.734 14.3% 0.802 7.2% 0.726 27.6% 0.857
0.7 32.8% 0.776 17.7% 0.793 29.9% 0.767 43.7% 0.817
0.6 53.2% 0.745 22.2% 0.761 49.2% 0.744 52.0% 0.790

Table 5.3
Semi-automated result for the three data at different ranges of thresholds. P represents the percentage of automated predictions and SA represents the subset accuracy for the automated prediction results
Table summary
This table displays the results of Semi-automated result for the three data at different ranges of thresholds. P represents the percentage of automated predictions and SA represents the subset accuracy for the automated prediction results. The information is grouped by Data (appearing as row headers), Predicted accuracy, BR, LP, PCC and MEPCC (appearing as column headers).
Data Predicted accuracy BR LP PCC MEPCC
P SA P SA P SA P SA
Civil [ 0.9,1.0 ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaWadeqaaiaaicdacaaIUaGaaGyoai aaiYcacaaMe8UaaGymaiaai6cacaaIWaaacaGLBbGaayzxaaaaaa@3BD7@ 0.7% 0.667 16.5% 0.967 0.0% NA 13.0% 0.978
[ 0.8,0.9 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaGioai aaiYcacaaMe8UaaGimaiaai6cacaaI5aaacaGLBbGaayzkaaaaaa@3BBE@ 8.7% 0.896 17.8% 0.834 15.1% 0.787 26.2% 0.846
[ 0.7,0.8 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaG4nai aaiYcacaaMe8UaaGimaiaai6cacaaI4aaacaGLBbGaayzkaaaaaa@3BBC@ 9.0% 0.769 12.2% 0.710 21.3% 0.828 6.5% 0.681
[ 0.6,0.7 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaGOnai aaiYcacaaMe8UaaGimaiaai6cacaaI3aaacaGLBbGaayzkaaaaaa@3BBA@ 7.0% 0.566 4.1% 0.584 15.7% 0.655 7.1% 0.563
Immigrant [ 0.9,1.0 ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaWadeqaaiaaicdacaaIUaGaaGyoai aaiYcacaaMe8UaaGymaiaai6cacaaIWaaacaGLBbGaayzxaaaaaa@3BD7@ 3.7% 0.858 11.1% 0.959 1.3% 0.558 31.5% 0.947
[ 0.8,0.9 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaGioai aaiYcacaaMe8UaaGimaiaai6cacaaI5aaacaGLBbGaayzkaaaaaa@3BBE@ 9.1% 0.750 19.3% 0.843 26.4% 0.869 11.0% 0.829
[ 0.7,0.8 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaG4nai aaiYcacaaMe8UaaGimaiaai6cacaaI4aaacaGLBbGaayzkaaaaaa@3BBC@ 13.8% 0.710 8.2% 0.747 14.7% 0.757 12.5% 0.688
[ 0.6,0.7 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaGOnai aaiYcacaaMe8UaaGimaiaai6cacaaI3aaacaGLBbGaayzkaaaaaa@3BBA@ 15.1% 0.602 15.0% 0.659 8.1% 0.623 7.7% 0.670
Happy [ 0.9,1.0 ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaWadeqaaiaaicdacaaIUaGaaGyoai aaiYcacaaMe8UaaGymaiaai6cacaaIWaaacaGLBbGaayzxaaaaaa@3BD7@ 1.3% 0.592 8.9% 0.850 0.1% 0.750 1.0% 0.830
[ 0.8,0.9 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaGioai aaiYcacaaMe8UaaGimaiaai6cacaaI5aaacaGLBbGaayzkaaaaaa@3BBE@ 7.4% 0.755 5.4% 0.717 7.1% 0.730 26.5% 0.858
[ 0.7,0.8 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaG4nai aaiYcacaaMe8UaaGimaiaai6cacaaI4aaacaGLBbGaayzkaaaaaa@3BBC@ 24.0% 0.792 3.4% 0.751 22.7% 0.779 16.2% 0.749
[ 0.6,0.7 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqcsaqaaiaaicdacaaIUaGaaGOnai aaiYcacaaMe8UaaGimaiaai6cacaaI3aaacaGLBbGaayzkaaaaaa@3BBA@ 20.4% 0.693 4.6% 0.615 19.3% 0.703 8.3% 0.647

Table 5.4 shows the runtime of each method for training the model and predicting all instances in test data (Intel Core i7 CPU with 8GB RAM). Unsurprisingly, the runtime of MEPCC at m = 10 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbGaaGjbVlaai2dacaaMe8UaaG ymaiaaicdaaaa@3764@ is roughly 10 times of that of PCC in both of the training and prediction stages.


Table 5.4
Runtime (in seconds) of each method for the three data
Table summary
This table displays the results of Runtime (in seconds) of each method for the three data. The information is grouped by Data (appearing as row headers), Stage, BR, LP, PCC and MEPCC (appearing as column headers).
Data Stage BR LP PCC MEPCC
Civil Train 1.688 0.641 1.128 11.787
Prediction 0.269 0.044 37.142 374.611
Immigrant Train 1.363 0.510 0.894 8.724
Prediction 0.200 0.056 35.369 334.075
Happy Train 11.160 16.164 7.371 78.293
Prediction 0.567 3.691 177.847 1,746.529

Date modified: