Automatic Classification of Open-Ended Questions: Check-All-That-Apply Questions

2019 ◽  
pp. 089443931986921 ◽  
Author(s):  
Matthias Schonlau ◽  
Hyukjun Gweon ◽  
Marika Wenemark

Text data from open-ended questions in surveys are challenging to analyze and are often ignored. Open-ended questions are important though because they do not constrain respondents’ answers. Where open-ended questions are necessary, often human coders manually code answers. When data sets are large, it is impractical or too costly to manually code all answer texts. Instead, text answers can be converted into numerical variables, and a statistical/machine learning algorithm can be trained on a subset of manually coded data. This statistical model is then used to predict the codes of the remainder. We consider open-ended questions where the answers are coded into multiple labels (all-that-apply questions). For example, in the open-ended question in our Happy example respondents are explicitly told they may list multiple things that make them happy. Algorithms for multilabel data take into account the correlation among the answer codes and may therefore give better prediction results. For example, when giving examples of civil disobedience, respondents talking about “minor nonviolent offenses” were also likely to talk about “crimes.” We compare the performance of two different multilabel algorithms (random k-labelsets [RAKEL], classifier chains [CC]) to the default method of binary relevance (BR) which applies single-label algorithms to each code separately. Performance is evaluated on data from three open-ended questions (Happy, Civil Disobedience, and Immigrant). We found weak bivariate label correlations in the Happy data (90th percentile: 7.6%), and stronger bivariate label correlations in the Civil Disobedience (90th percentile: 17.2%) and Immigrant (90th percentile: 19.2%) data. For the data with stronger correlations, we found both multilabel methods performed substantially better than BR using 0/1 loss (“at least one label is incorrect”) and had little effect when using Hamming loss (average error). For data with weak label correlations, we found no difference in performance between multilabel methods and BR. We conclude that automatic classification of open-ended questions that allow multiple answers may benefit from using multilabel algorithms for 0/1 loss. The degree of correlations among the labels may be a useful prognostic tool.

2021 ◽  
Vol 2089 (1) ◽  
pp. 012058
Author(s):  
P. Giriprasad Gaddam ◽  
A Sanjeeva reddy ◽  
R.V. Sreehari

Abstract In the current article, an automatic classification of cardiac arrhythmias is presented using a transfer deep learning approach with the help of electrocardiography (ECG) signal analysis. Now a days, an ECG waveform serves as a powerful tool used for the analysis of cardiac arrhythmias (irregularities). The goal of the present work is to implement an algorithm based on deep learning for classification of different cardiac arrhythmias. Initially, the one dimensional (1-D) ECG signals are transformed to two dimensional (2-D) scalogram images with the help of Continuous Wavelet(CWT). Four different categories of ECG waveform were selected from four PhysioNet MIT-BIH databases, namely arrhythmia database, Normal Sinus Rhythm database, Malignant Ventricular Ectopy database and BIDMC Congestive heart failure database to examine the proposed technique. The major interest of the present study is to develop a transferred deep learning algorithm for automatic categorization of the mentioned four different heart diseases. Final results proved that the 2-D scalogram images trained with a deep convolutional neural network CNN with transfer learning technique (AlexNet) pepped up with a prominent accuracy of 95.67%. Hence, it is worthwhile to say the above stated algorithm demonstrates as an effective automated heart disease detection tool


2017 ◽  
Author(s):  
Jie Xie

Acoustic classification of frogs has received increasing attention for its promising application in ecological studies. Various studies have been proposed for classifying frog species, but most recordings are assumed to have only a single species. In this study, a method to classify multiple frog species in an audio clip is presented. To be specific, continuous frog recordings are first cropped into audio clips (10 seconds). Then, various time-frequency representations are generated for each 10-s recording. Next, instead of using traditional hand-crafted features, a deep learning algorithm is used to find the most important feature. Finally, a binary relevance based multi-label classification approach is proposed to classify simultaneously vocalizing frog species with our proposed features. Experimental results show that our proposed features extracted using deep learning can achieve better classification performance when compared to hand-crafted features for frog call classification.


2017 ◽  
Author(s):  
Jie Xie

Acoustic classification of frogs has received increasing attention for its promising application in ecological studies. Various studies have been proposed for classifying frog species, but most recordings are assumed to have only a single species. In this study, a method to classify multiple frog species in an audio clip is presented. To be specific, continuous frog recordings are first cropped into audio clips (10 seconds). Then, various time-frequency representations are generated for each 10-s recording. Next, instead of using traditional hand-crafted features, a deep learning algorithm is used to find the most important feature. Finally, a binary relevance based multi-label classification approach is proposed to classify simultaneously vocalizing frog species with our proposed features. Experimental results show that our proposed features extracted using deep learning can achieve better classification performance when compared to hand-crafted features for frog call classification.


Author(s):  
Dorian Ruiz Alonso ◽  
Claudia Zepeda Cortés ◽  
Hilda Castillo Zacatelco ◽  
José Luis Carballido Carranza ◽  
José Luis Garcé-a Cué

This work deals with educational text mining, a field of natural language processing applied to education. The objective is to classify the feedback generated by teachers in online courses to the activities sent by students according to the model of Hattie and Timperley (2007), considering that feedback may be at the levels task, process, regulation, praise and other. Four multi-label classification methods of the data transformation approach - binary relevance, classification chains, power labelset and rakel-d - are compared with the base algorithms SVM, Random Forest, Logistic Regression and Naive Bayes. The methodology was applied to a case study in which 11013 feedbacks written in Spanish language from 121 online courses of the Law degree from a public university in Mexico were collected from the Blackboard learning manager system. The results show that the random forests algorithms and vector support machines will have the best performance when using the binary relevance transformation and classifier chains methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Xue Xing ◽  
Dexin Yu ◽  
Wei Zhang

This paper analyzes the problem of meaningless outliers in traffic detective data sets and researches characteristics about the data of monophyletic detector and multisensor detector based on real-time data on highway. Based on analysis of the current random forests algorithm, which is a learning algorithm of high accuracy and fast speed, new optimum random forests about filtrating outlier in the sample are proposed, which employ bagging strategy combined with boosting strategy. Random forests of different number of trees are applied to analyze status classification of meaningless outliers in traffic detective data sets, respectively, based on traffic flow, spot mean speed, and roadway occupancy rate of traffic parameters. The results show that optimum model of random forest is more accurate to filtrate meaningless outliers in traffic detective data collected from road intersections. With filtrated data for processing, transportation information system can decrease the influence of error data to improve highway traffic information services.


Sign in / Sign up

Export Citation Format

Share Document