Harnessing Multi-label Classification Approaches for Economic Phenomena Categorization

One fashion to report a country’s economic state is by compiling economic phenomena from several sources. The collected data may be explored based on their sentiments and economic categories. This research attempted to perform and analyze multiple approaches to multi-label text classification in addition to providing sentiment analysis on the economic phenomena. The sentiment and single-label category classification was performed utilizing the logistic regression model. Meanwhile, the multi-label category classification was fulfilled using a combination of logistic regression, support vector machines, k-nearest neighbor, naïve Bayes, and decision trees as base classifiers, with binary relevance, classifier chain, and label power set as the implementation approaches. The results showed that logistic regression works well in sentiment and single-label classification, with a classification accuracy of 80.08% and 92.71%, respectively. However, it was also discovered that it works poorly as a base classifier in multi-label classification, indicated by the classification accuracy dropping to 13.35%, 15.40%, and 30.65% for binary relevance, classifier chain, and label power set, respectively. Alternatively, naïve Bayes works best as a base classifier in the label power set approach for multi-label classification, with a classification accuracy of 63.22%, followed by decision trees and support vector machines.

Download Full-text

Maschinelles Lernen mit Aussagen zur Modellkompetenz

Zeitschrift für Didaktik der Naturwissenschaften ◽

10.1007/s40573-020-00118-7 ◽

2020 ◽

Vol 26 (1) ◽

pp. 157-172

Author(s):

Dirk Krüger ◽

Moritz Krell

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Maschinelles Lernen ◽

Vector Machines

ZusammenfassungVerfahren des maschinellen Lernens können dazu beitragen, Aussagen in Aufgaben im offenen Format in großen Stichproben zu analysieren. Am Beispiel von Aussagen von Biologielehrkräften, Biologie-Lehramtsstudierenden und Fachdidaktiker*innen zu den fünf Teilkompetenzen von Modellkompetenz (NTraining = 456; NKlassifikation = 260) wird die Qualität maschinellen Lernens mit vier Algorithmen (naïve Bayes, logistic regression, support vector machines und decision trees) untersucht. Evidenz für die Validität der Interpretation der Kodierungen einzelner Algorithmen liegt mit zufriedenstellender bis guter Übereinstimmung zwischen menschlicher und computerbasierter Kodierung beim Training (345–607 Aussagen je nach Teilkompetenz) vor, bei der Klassifikation (157–260 Aussagen je nach Teilkompetenz) reduziert sich dies auf eine moderate Übereinstimmung. Positive Korrelationen zwischen dem kodierten Niveau und dem externen Kriterium Antwortlänge weisen darauf hin, dass die Kodierung mit naïve Bayes keine gültigen Ergebnisse liefert. Bedeutsame Attribute, die die Algorithmen bei der Klassifikation nutzen, entsprechen relevanten Begriffen der Niveaufestlegungen im zugrunde liegenden Kodierleitfaden. Abschließend wird diskutiert, inwieweit maschinelles Lernen mit den eingesetzten Algorithmen bei Aussagen zur Modellkompetenz die Qualität einer menschlichen Kodierung erreicht und damit für Zweitkodierungen oder in Vermittlungssituationen genutzt werden könnte.

Download Full-text

Efficient and Private Scoring of Decision Trees, based on Pre-Computation Technique with Support Vector Machines and Logistic Regression Model

International Journal of Computing Communications and Networking ◽

10.30534/ijccn/2018/16722018 ◽

2018 ◽

Vol 7 (2) ◽

pp. 96-99

Author(s):

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Regression Model ◽

Decision Trees ◽

Logistic Regression Model ◽

Support Vector ◽

Computation Technique ◽

Vector Machines

Download Full-text

A Semantic Scattering model for the automatic interpretation of English genitives

Natural Language Engineering ◽

10.1017/s1351324908004798 ◽

2009 ◽

Vol 15 (2) ◽

pp. 215-239 ◽

Cited By ~ 1

Author(s):

ADRIANA BADULESCU ◽

DAN MOLDOVAN

Keyword(s):

Support Vector Machines ◽

Decision Trees ◽

Naive Bayes ◽

Word Sense Disambiguation ◽

Naïve Bayes ◽

Semantic Relations ◽

Support Vector ◽

Word Sense ◽

Vector Machines ◽

Bayes Algorithm

AbstractAn important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper addresses the automatic classification of thesemantic relationsexpressed by English genitives. A learning model is introduced based on the statistical analysis of the distribution of genitives' semantic relations in a corpus. The semantic and contextual features of the genitive's noun phrase constituents play a key role in the identification of the semantic relation. The algorithm was trained and tested on a corpus of approximately 20,000 sentences and achieved an f-measure of 79.80 per cent for of-genitives, far better than the 40.60 per cent obtained using a Decision Trees algorithm, the 50.55 per cent obtained using a Naive Bayes algorithm, or the 72.13 per cent obtained using a Support Vector Machines algorithm on the same corpus using the same features. The results were similar for s-genitives: 78.45 per cent using Semantic Scattering, 47.00 per cent using Decision Trees, 43.70 per cent using Naive Bayes, and 70.32 per cent using a Support Vector Machines algorithm. The results demonstrate the importance of word sense disambiguation and semantic generalization/specialization for this task. They also demonstrate that different patterns (in our case the two types of genitive constructions) encode different semantic information and should be treated differently in the sense that different models should be built for different patterns.

Download Full-text

Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression

Journal of Clinical Epidemiology ◽

10.1016/j.jclinepi.2009.11.020 ◽

2010 ◽

Vol 63 (8) ◽

pp. 826-833 ◽

Cited By ~ 168

Author(s):

Daniel Westreich ◽

Justin Lessler ◽

Michele Jonsson Funk

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Support Vector Machines ◽

Propensity Score ◽

Decision Trees ◽

Support Vector ◽

Vector Machines

Download Full-text

Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification

Baltic Journal of Modern Computing ◽

10.22364/bjmc.2017.5.2.05 ◽

2017 ◽

Vol 5 (2) ◽

Cited By ~ 22

Author(s):

Tomas Pranckevičius ◽

Virginijus Marcinkevičius

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Vector Machines

Download Full-text

An investigation of the factors influencing cost system functionality using decision trees, support vector machines and logistic regression

International Journal of Accounting and Information Management ◽

10.1108/ijaim-04-2017-0052 ◽

2019 ◽

Vol 27 (1) ◽

pp. 27-55 ◽

Cited By ~ 1

Author(s):

Cemil Kuzey ◽

Ali Uyar ◽

Dursun Delen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Prediction Models ◽

Support Vector ◽

Content Type ◽

Cost System ◽

Factors Influencing ◽

Vector Machines

Purpose The paper aims to identify and critically analyze the factors influencing cost system functionality (CSF) using several machine learning techniques including decision trees, support vector machines and logistic regression. Design/methodology/approach The study used a self-administered survey method to collect the necessary data from companies conducting business in Turkey. Several prediction models are developed and tested; a series of sensitivity analyses is performed on the developed prediction models to assess the ranked importance of factors/variables. Findings Certain factors/variables influence CSF much more than others. The findings of the study suggest that utilization of management accounting practices require a functional cost system, which is supported by a comprehensive cost data management process (i.e. acquisition, storage and utilization). Research limitations/implications The underlying data were collected using a questionnaire survey; thus, it is subjective which reflects the perceptions of the respondents. Ideally, it is expected to reflect the objective of the practices of the firms. Second, the authors have measured CSF it on a “Yes” or “No” basis which does not allow survey respondents reply in between them; thus, it might have limited the choices of the respondents. Third, the Likert scales adopted in the measurement of the other constructs might be limiting the answers of the respondents. Practical implications Information technology plays a very important role for the success of CSF practices. That is, successful implementation of a functional cost system relies heavily on a fully integrated information infrastructure capable of constantly feeding CSF with accurate, relevant and timely data. Originality/value In addition to providing evidence regarding the factors underlying CSF based on a broad range of industries interesting finding, this study also illustrates the viability of machine learning methods as a research framework to critically analyze domain specific data.

Download Full-text

Exploration of Lymph Node-Negative Breast Cancers by Support Vector Machines, Naïve Bayes, and Decision Trees: A Comparative Study

Handbook of Artificial Intelligence in Biomedical Engineering ◽

10.1201/9781003045564-23 ◽

2020 ◽

pp. 509-524

Author(s):

J. Satya Eswari ◽

Pradeep Singh

Keyword(s):

Lymph Node ◽

Support Vector Machines ◽

Comparative Study ◽

Decision Trees ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Breast Cancers ◽

Node Negative ◽

Vector Machines

Download Full-text

Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models Based on Pre-Computation

IEEE Transactions on Dependable and Secure Computing ◽

10.1109/tdsc.2017.2679189 ◽

2019 ◽

Vol 16 (2) ◽

pp. 217-230 ◽

Cited By ~ 20

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Caleb Horst ◽

Raj Katti ◽

Anderson C. A. Nascimento ◽

...

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Regression Models ◽

Support Vector ◽

Logistic Regression Models ◽

Vector Machines

Download Full-text

Avaliando atributos para a classificação de estrutura retórica em resumos científicos

Linguamática ◽

10.21814/lm.11.1.273 ◽

2019 ◽

Vol 11 (1) ◽

pp. 41-53

Author(s):

Alessandra Harumi Iriguti ◽

Valéria Delisandra Feltrim

Keyword(s):

Support Vector Machines ◽

Decision Trees ◽

Random Fields ◽

Conditional Random Fields ◽

Naive Bayes ◽

Nearest Neighbors ◽

Support Vector ◽

Word Embeddings ◽

K Nearest Neighbors ◽

Vector Machines

A classificação de estrutura retórica é uma tarefa de PLN na qual se busca identificar os componentes retóricos de um discurso e seus relacionamentos. No caso deste trabalho, buscou-se identificar automaticamente categorias em nível de sentenças que compõem a estrutura retórica de resumos científicos. Especificamente, o objetivo foi avaliar o impacto de diferentes conjuntos de atributos na implementação de classificadores retóricos para resumos científicos escritos em português. Para isso, foram utilizados atributos superficiais (extraídos como valores TF-IDF e selecionados com o teste chi-quadrado), atributos morfossintáticos (implementados pelo classificador AZPort) e atributos extraídos a partir de modelos de word embeddings (Word2Vec, Wang2Vec e GloVe, todos previamente treinados). Tais conjuntos de atributos, bem como as suas combinações, foram usados para o treinamento de classificadores usando os seguintes algoritmos de aprendizado supervisionado: Support Vector Machines, Naive Bayes, K-Nearest Neighbors, Decision Trees e Conditional Random Fields (CRF). Os classificadores foram avaliados por meio de validação cruzada sobre três corpora compostos por resumos de teses e dissertações. O melhor resultado, 94% de F1, foi obtido pelo classificador CRF com as seguintes combinações de atributos: (i) Wang2Vec--Skip-gram de dimensões 100 com os atributos provenientes do AZPort; (ii) Wang2Vec--Skip-gram e GloVe de dimensão 300 com os atributos do AZPort; (iii) TF-IDF, AZPort e embeddings extraídos com os modelos Wang2Vec--Skip-gram de dimensões 100 e 300 e GloVe de dimensão 300. A partir dos resultados obtidos, conclui-se que os atributos provenientes do classificador AZPort foram fundamentais para o bom desempenho do classificador CRF, enquanto que a combinação com word embeddings se mostrou válida para a melhoria dos resultados.

Download Full-text

Directional Support Vector Machines

Applied Sciences ◽

10.3390/app9040725 ◽

2019 ◽

Vol 9 (4) ◽

pp. 725

Author(s):

Diogo Pernes ◽

Kelwin Fernandes ◽

Jaime Cardoso

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Naive Bayes ◽

Real Data ◽

Naïve Bayes ◽

Support Vector ◽

Real Nature ◽

Periodic Data ◽

Vector Machines ◽

Classification Tasks

Several phenomena are represented by directional—angular or periodic—data; from time references on the calendar to geographical coordinates. These values are usually represented as real values restricted to a given range (e.g., [ 0 , 2 π ) ), hiding the real nature of this information. In order to handle these variables properly in supervised classification tasks, alternatives to the naive Bayes classifier and logistic regression were proposed in the past. In this work, we propose directional-aware support vector machines. We address several realizations of the proposed models, studying their kernelized counterparts and their expressiveness. Finally, we validate the performance of the proposed Support Vector Machines (SVMs) against the directional naive Bayes and directional logistic regression with real data, obtaining competitive results.

Download Full-text