Occupational coding in Germany is mostly done using dictionary approaches with subsequent manual revision of cases which could not be coded. Since manual coding is expensive, it is desirable to assign a higher number of codes automatically. At the same time the quality of the automatic coding must at least reach that of the manual coding. As a possible solution we employ different machine learning algorithms for the task using a substantial amount of manually coded occuptions available from recent studies as training data. We asses the feasibility of these methods of evaluating performance and quality of the algorithms.

Author(s): Bethmann, Arne; Schierholz, Malte; Wenzig, Knut; Zielonka, Markus
