A supervised machine learning classification algorithm for research articles

Author(s):  
Leonidas Akritidis ◽  
Panayiotis Bozanis
Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1578
Author(s):  
Daniel Szostak ◽  
Adam Włodarczyk ◽  
Krzysztof Walkowiak

Rapid growth of network traffic causes the need for the development of new network technologies. Artificial intelligence provides suitable tools to improve currently used network optimization methods. In this paper, we propose a procedure for network traffic prediction. Based on optical networks’ (and other network technologies) characteristics, we focus on the prediction of fixed bitrate levels called traffic levels. We develop and evaluate two approaches based on different supervised machine learning (ML) methods—classification and regression. We examine four different ML models with various selected features. The tested datasets are based on real traffic patterns provided by the Seattle Internet Exchange Point (SIX). Obtained results are analyzed using a new quality metric, which allows researchers to find the best forecasting algorithm in terms of network resources usage and operational costs. Our research shows that regression provides better results than classification in case of all analyzed datasets. Additionally, the final choice of the most appropriate ML algorithm and model should depend on the network operator expectations.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Nasser Assery ◽  
Yuan (Dorothy) Xiaohong ◽  
Qu Xiuli ◽  
Roy Kaushik ◽  
Sultan Almalki

Purpose This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used supervised machine learning models. Design/methodology/approach First historical tweets on two recent hurricane events are collected via Twitter API. Then a credibility scoring system is implemented in which the tweet features are analyzed to give a credibility score and credibility label to the tweet. After that, supervised machine learning classification is implemented using various classification algorithms and their performances are compared. Findings The proposed unsupervised learning model could enhance the emergency response by providing a fast way to determine the credibility of disaster-related tweets. Additionally, the comparison of the supervised classification models reveals that the Random Forest classifier performs significantly better than the SVM and Logistic Regression classifiers in classifying the credibility of disaster-related tweets. Originality/value In this paper, an unsupervised 10-point scoring model is proposed to evaluate the tweets’ credibility based on the user-based and content-based features. This technique could be used to evaluate the credibility of disaster-related tweets on future hurricanes and would have the potential to enhance emergency response during critical events. The comparative study of different supervised learning methods has revealed effective supervised learning methods for evaluating the credibility of Tweeter data.


2021 ◽  
pp. 177-191
Author(s):  
Natalia V. Revollo ◽  
G. Noelia Revollo Sarmiento ◽  
Claudio Delrieux ◽  
Marcela Herrera ◽  
Rolando González-José

Author(s):  
Tyler F. Rooks ◽  
Andrea S. Dargie ◽  
Valeta Carol Chancey

Abstract A shortcoming of using environmental sensors for the surveillance of potentially concussive events is substantial uncertainty regarding whether the event was caused by head acceleration (“head impacts”) or sensor motion (with no head acceleration). The goal of the present study is to develop a machine learning model to classify environmental sensor data obtained in the field and evaluate the performance of the model against the performance of the proprietary classification algorithm used by the environmental sensor. Data were collected from Soldiers attending sparring sessions conducted under a U.S. Army Combatives School course. Data from one sparring session were used to train a decision tree classification algorithm to identify good and bad signals. Data from the remaining sparring sessions were kept as an external validation set. The performance of the proprietary algorithm used by the sensor was also compared to the trained algorithm performance. The trained decision tree was able to correctly classify 95% of events for internal cross-validation and 88% of events for the external validation set. Comparatively, the proprietary algorithm was only able to correctly classify 61% of the events. In general, the trained algorithm was better able to predict when a signal was good or bad compared to the proprietary algorithm. The present study shows it is possible to train a decision tree algorithm using environmental sensor data collected in the field.


2021 ◽  
Author(s):  
Ravi Iyer ◽  
Elizabeth Seabrook ◽  
Suku Sukunesan ◽  
Maja Nedeljkovic ◽  
Denny Meyer

Abstract We aimed to demonstrate how a large collection of publicly accessible Australian Coroner’s Court case files (n=4459) (2009-2019) can be automatically classified for determination of death by suicide, presence of mental health disorder and sex of deceased via Natural Language Processing (NLP) methods - supervised machine learning and unsupervised dictionary-based and string search based approaches. We achieved superior levels of accuracy in the machine learning classification (Gradient Boosting vs. Random Forest baseline) of deaths by suicide of 83.3% (sensitivity = 85.1%, Specificity = 79.1%) and an accuracy of 98.3% for the dictionary-based classification of mental health disorder, as defined by the OCD-10 (sensitivity = 99.0%, specificity = 97.9%). Our machine learning approach automatically classified 24.2% (1078/4459) of the case files as referring to deaths by suicide while 63.7% (2940/4459) where classified as exhibiting a mental health disorder1. We employed a two-stage machine learning approach involving feature engineering, followed by predictive modelling in the second. Feature engineering involved several steps including removal of low value text, parts of speech analysis, term document weighting and topic clustering. Predictive classification involved extensive hyperparameter tuning to yield the most accurate model. We validated our models against a manually pre-coded subsample of case files, and also via binary logistic regression to test the contribution of each classified mental health disorder against determinations of deaths by suicide according to extant literature. This validation step confirmed elevated odds of suicide attributed to diagnoses of Depression, Schizophrenia and Obsessive Compulsive Disorder. Finally, we offer a short case study to demonstrate the efficacy of our approach in investigating a subset of case findings referring to suicides resulting from family violence. We offer a proof of concept model that demonstrates an objective and scalable approach to the analysis of legal texts. The use of NLP methods in analysing Coroner's Court case findings has important implications for the ongoing development of a real-time surveillance of suicide system in Australia.


Sign in / Sign up

Export Citation Format

Share Document