Multi-label classification of feedbacks

Author(s):  
Dorian Ruiz Alonso ◽  
Claudia Zepeda Cortés ◽  
Hilda Castillo Zacatelco ◽  
José Luis Carballido Carranza ◽  
José Luis Garcé-a Cué

This work deals with educational text mining, a field of natural language processing applied to education. The objective is to classify the feedback generated by teachers in online courses to the activities sent by students according to the model of Hattie and Timperley (2007), considering that feedback may be at the levels task, process, regulation, praise and other. Four multi-label classification methods of the data transformation approach - binary relevance, classification chains, power labelset and rakel-d - are compared with the base algorithms SVM, Random Forest, Logistic Regression and Naive Bayes. The methodology was applied to a case study in which 11013 feedbacks written in Spanish language from 121 online courses of the Law degree from a public university in Mexico were collected from the Blackboard learning manager system. The results show that the random forests algorithms and vector support machines will have the best performance when using the binary relevance transformation and classifier chains methods.

2020 ◽  
Vol 12 (3) ◽  
pp. 759
Author(s):  
Jūratė Sužiedelytė Visockienė ◽  
Eglė Tumelienė ◽  
Vida Maliene

H. sosnowskyi (Heracleum sosnowskyi) is a plant that is widespread both in Lithuania and other countries and causes abundant problems. The damage caused by the population of the plant is many-sided: it menaces the biodiversity of the land, poses risk to human health, and causes considerable economic losses. In order to find effective and complex measures against this invasive plant, it is very important to identify places and areas where H. sosnowskyi grows, carry out a detailed analysis, and monitor its spread to avoid leaving this process to chance. In this paper, the remote sensing methodology was proposed to identify territories covered with H. sosnowskyi plants (land classification). Two categories of land cover classification were used: supervised (human-guided) and unsupervised (calculated by software). In the application of the supervised method, the average wavelength of the spectrum of H. sosnowskyi was calculated for the classification of the RGB image and according to this, the unsupervised classification by the program was accomplished. The combination of both classification methods, performed in steps, allowed obtaining better results than using one. The application of authors’ proposed methodology was demonstrated in a Lithuanian case study discussed in this paper.


2015 ◽  
Vol 23 (2) ◽  
pp. 20-30 ◽  
Author(s):  
Dávid Kováts ◽  
Andrea Harnos

Abstract In this paper, a complex morphological comparison of four Common Nightingale groups (Luscinia megarhynchos) is demonstrated. In total, 121 territorial nightingales were mist-netted and measured individually on four study areas called ‘Bódva’, ‘Felső-Tisza’, ‘Szatmár-Bereg’ and ‘Bátorliget’ in the North-Eastern part of Hungary in 2006–2013. To distinguish groups by morphology, Classification and Regression Trees (CART), Random Forest (RF) and Linear Discriminant Analysis (LDA) methods were used. Comparison of the four studied Common Nightingale groups shows substantial morphological differences in the length of the second, the third and the fourth primaries (P2, P3, P4), in bill length (BL) and bill width (BW), while other characteristics showed greater similarities. Based on the results of all the applied classification methods, birds originated from Szatmár-Bereg were clearly distinguishable from the others. The differences in morphology can be explained by interspecific competition or phenotypic plasticity resulting from the change of ecological, environmental parameters. Our case study highlights the advantageous differences of the classification methods to distinguish groups with similar morphology and to choose important variables for classification. In conclusion, broad application of the classification methods RF and CART is highly recommended in comparative ecological studies.


2019 ◽  
pp. 089443931986921 ◽  
Author(s):  
Matthias Schonlau ◽  
Hyukjun Gweon ◽  
Marika Wenemark

Text data from open-ended questions in surveys are challenging to analyze and are often ignored. Open-ended questions are important though because they do not constrain respondents’ answers. Where open-ended questions are necessary, often human coders manually code answers. When data sets are large, it is impractical or too costly to manually code all answer texts. Instead, text answers can be converted into numerical variables, and a statistical/machine learning algorithm can be trained on a subset of manually coded data. This statistical model is then used to predict the codes of the remainder. We consider open-ended questions where the answers are coded into multiple labels (all-that-apply questions). For example, in the open-ended question in our Happy example respondents are explicitly told they may list multiple things that make them happy. Algorithms for multilabel data take into account the correlation among the answer codes and may therefore give better prediction results. For example, when giving examples of civil disobedience, respondents talking about “minor nonviolent offenses” were also likely to talk about “crimes.” We compare the performance of two different multilabel algorithms (random k-labelsets [RAKEL], classifier chains [CC]) to the default method of binary relevance (BR) which applies single-label algorithms to each code separately. Performance is evaluated on data from three open-ended questions (Happy, Civil Disobedience, and Immigrant). We found weak bivariate label correlations in the Happy data (90th percentile: 7.6%), and stronger bivariate label correlations in the Civil Disobedience (90th percentile: 17.2%) and Immigrant (90th percentile: 19.2%) data. For the data with stronger correlations, we found both multilabel methods performed substantially better than BR using 0/1 loss (“at least one label is incorrect”) and had little effect when using Hamming loss (average error). For data with weak label correlations, we found no difference in performance between multilabel methods and BR. We conclude that automatic classification of open-ended questions that allow multiple answers may benefit from using multilabel algorithms for 0/1 loss. The degree of correlations among the labels may be a useful prognostic tool.


2004 ◽  
Vol 1 (10) ◽  
Author(s):  
Duncan G. LaBay ◽  
Clare L. Comm

What are student expectations in a traditional course versus a distance learning course? The authors analyze student course selection and expected outcomes from data collected in an undergraduate marketing course at a public university in the Northeast. Key findings reveal that students generally have a favorable predisposition towards online coursework despite their beliefs that online courses require more work and have lower learning outcomes. Further, this case study provides an initial step in better understanding student expectations in online courses as well as in the traditional classroom.


2011 ◽  
Vol 8 (1) ◽  
pp. 101-119 ◽  
Author(s):  
Sergeja Vogrincic ◽  
Zoran Bosnic

The paper presents an approach to the task of automatic document categorization in the field of economics. Since the documents can be annotated with multiple keywords (labels), we approach this task by applying and evaluating multi-label classification methods of supervised machine learning. We describe forming a test corpus of 1015 economic documents that we automatically classify using a tool which integrates ontology construction with text mining methods. In our experimental work, we evaluate three groups of multi-label classification approaches: transformation to single-class problems, specialized multi-label models, and hierarchical/ranking models. The classification accuracies of all tested classification models indicate that there is a potential for using all of the evaluated methods to solve this task. The results show the benefits of using complex groups of approaches which benefit from exploiting dependence between the labels. A good alternative to these approaches is also single-class naive Bayes classifiers coupled with the binary relevance transformation approach.


2018 ◽  
Vol 2 (1) ◽  
pp. 21
Author(s):  
Pedro Urena

<p>Ontology  enrichment  is  a  classification  problem  in which  an  algorithm  categorizes  an  input conceptual unit  in the corresponding node  in a target ontology. Conceptual enrichment  is of great importance both to Knowledge Engineering and Natural Language Processing, because it helps maximize the efficacy of intelligent systems, making them more adaptable to scenarios where  information  is  produced  by  means  of  language.  Following  previous  research  on distributional  semantics,  this  paper  presents  a  case  study  of  ontology  enrichment  using  a feature-extraction  method  which  relies  on  collocational  information  from  corpora.  The  major advantage  of  this  method  is  that  it  can  help  locate  an  input  unit  within  its  corresponding superordinate node in a taxonomy using a relatively small number of lexical features. In order to  evaluate  the  proposed  framework,  this  paper  presents  an  experiment  consisting  of  the automatic classification of a chemical substance in a taxonomy of toxicology.</p>


2011 ◽  
Vol 15 (2) ◽  
Author(s):  
Philip Ice ◽  
Angela M. Gibson ◽  
Wally Boston ◽  
Dave Becher

Though online enrollments continue to accelerate at a rapid pace, there is significant concern over student retention. With drop rates significantly higher than in face-to-face classes it is imperative that online providers develop an understanding of factors that lead students to disenroll. This study examines course-level disenrollment through the lens of student satisfaction with the projection of Teaching, Social and Cognitive Presence. In comparing the highest and lowest disenrollment quartiles of all courses at American Public University the value of effective Instructional Design and Organization, and initiation of the Triggering Event phase of Cognitive Presence were found to be significant predictors of student satisfaction in the lowest disenrollment quartile. For the highest disenrollment quartile, the lack of follow-through vis-à-vis Facilitation of Discourse and Cognitive Integration were found to be negative predictors of student satisfaction.


2018 ◽  
Vol 71 (3) ◽  
pp. 942-950
Author(s):  
Vania Dias Cruz ◽  
Silvana Sidney Costa Santos ◽  
Jamila Geri Tomaschewski-Barlem ◽  
Bárbara Tarouco da Silva ◽  
Celmira Lange ◽  
...  

ABSTRACT Objective: To assess the health/functioning of the older adult who consumes psychoactive substances through the International Classification of Functioning, Disability and Health, considering the theory of complexity. Method: Qualitative case study, with 11 older adults, held between December 2015 and February 2016 in the state of Rio Grande do Sul, using interviews, documents and non-systematic observation. It was approved by the ethics committee. The analysis followed the propositions of the case study, using the complexity of Morin as theoretical basis. Results: We identified older adults who consider themselves healthy and show alterations - the alterations can be exacerbated by the use of psychoactive substances - of health/functioning expected according to the natural course of aging such as: systemic arterial hypertension; depressive symptoms; dizziness; tinnitus; harmed sleep/rest; and inadequate food and water consumption. Final consideration: The assessment of health/functioning of older adults who use psychoactive substances, guided by complex thinking, exceeds the accuracy limits to risk the understanding of the phenomena in its complexity.


2021 ◽  
pp. 1-13
Author(s):  
Xiaoyan Wang ◽  
Jianbin Sun ◽  
Qingsong Zhao ◽  
Yaqian You ◽  
Jiang Jiang

It is difficult for many classic classification methods to consider expert experience and classify small-sample datasets well. The evidential reasoning rule (ER rule) classifier can solve these problems. The ER rule has strong processing and comprehensive analysis abilities for diversified mixed information and can solve problems with expert experience effectively. Moreover, the initial parameters of the classifier constructed based on the ER rule can be set according to empirical knowledge instead of being trained by a large number of samples, which can help the classifier classify small-sample datasets well. However, the initial parameters of the ER rule classifier need to be optimized, and choosing the best optimization algorithm is still a challenge. Considering these problems, the ER rule classifier with an optimization operator recommendation is proposed in this paper. First, the initial ER rule classifier is constructed based on training samples and expert experience. Second, the adjustable parameters are optimized, in which the optimization operator recommendation strategy is applied to select the best algorithm by partial samples, and then experiments with full samples are carried out. Finally, a case study on a turbofan engine degradation simulation dataset is carried out, and the results indicate that the ER rule classifier has a higher classification accuracy than other classic classifiers, which demonstrates the capability and effectiveness of the proposed ER rule classifier with an optimization operator recommendation.


Sign in / Sign up

Export Citation Format

Share Document