supervised learning algorithms Latest Research Papers

Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.

Download Full-text

A comparative study using supervised learning for anomaly detection in network traffic

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012030 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012030

Author(s):

R Garg ◽

S Mukherjee

Keyword(s):

Intrusion Detection ◽

Supervised Learning ◽

Local System ◽

Intrusion Detection Systems ◽

Network Intrusion Detection ◽

Detection Systems ◽

Noisy Information ◽

Network Intrusion ◽

Network Intrusion Detection Systems ◽

Supervised Learning Algorithms

Abstract A user connects to hundreds of remote networks daily, some of which can be corrupted by malicious sources. To overcome this problem, a variety of Network Intrusion Detection systems are built, which aim to detect harmful networks before they establish a connection with the user’s local system. This paper focuses on proposing a model for Anomaly based Network Intrusion Detection systems (NIDS), by performing comparisons of various Supervised Learning Algorithms on metric of their accuracy. Two datasets were used and analysed, each having different properties in terms of the volume of data they contain and their use cases. Feature engineering was done to retrieve the most optimum features of both the datasets and only the top 25% best features were used to build the models – a smaller subset of features not only aids in decreasing the capital required to collect the data but also gets rid of redundant and noisy information. Two different splicing methods were used to train the data and each method showed different trends on the ML models.

Download Full-text

Deep Semi-Supervised Image Classification Algorithms: a Survey

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.77029 ◽

2021 ◽

Vol 27 (12) ◽

pp. 1390-1407

Author(s):

Ani Vanyan ◽

Hrant Khachatrian

Keyword(s):

Machine Learning ◽

Image Classification ◽

Supervised Learning ◽

Classification Accuracy ◽

Classification Algorithms ◽

The Past ◽

Supervised Learning Algorithms ◽

Supervised Image Classification ◽

Learning Focused ◽

Remarkable Progress

Semi-supervised learning is a branch of machine learning focused on improving the performance of models when the labeled data is scarce, but there is access to large number of unlabeled examples. Over the past five years there has been a remarkable progress in designing algorithms which are able to get reasonable image classification accuracy having access to the labels for only 0.1% of the samples. In this survey, we describe most of the recently proposed deep semi-supervised learning algorithms for image classification and identify the main trends of research in the field. Next, we compare several components of the algorithms, discuss the challenges of reproducing the results in this area, and highlight recently proposed applications of the methods originally developed for semi-supervised learning.

Download Full-text

YouTube based religious hate speech and extremism detection dataset with machine learning baselines

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219264 ◽

2021 ◽

pp. 1-9

Author(s):

Noman Ashraf ◽

Abid Rafiq ◽

Sabur Butt ◽

Hafiz Muhammad Faisal Shehzad ◽

Grigori Sidorov ◽

...

Keyword(s):

Machine Learning ◽

Social Networking ◽

Social Networking Sites ◽

Nearest Neighbor ◽

Hate Speech ◽

Support Vector ◽

K Nearest Neighbor ◽

Speech Detection ◽

Supervised Learning Algorithms ◽

Youtube Videos

On YouTube, billions of videos are watched online and millions of short messages are posted each day. YouTube along with other social networking sites are used by individuals and extremist groups for spreading hatred among users. In this paper, we consider religion as the most targeted domain for spreading hate speech among people of different religions. We present a methodology for the detection of religion-based hate videos on YouTube. Messages posted on YouTube videos generally express the opinions of users’ related to that video. We provide a novel dataset for religious hate speech detection on Youtube comments. The proposed methodology applies data mining techniques on extracted comments from religious videos in order to filter religion-oriented messages and detect those videos which are used for spreading hate. The supervised learning algorithms: Support Vector Machine (SVM), Logistic Regression (LR), and k-Nearest Neighbor (k-NN) are used for baseline results.

Download Full-text

Learning network embeddings using small graphlets

Social Network Analysis and Mining ◽

10.1007/s13278-021-00846-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Luce le Gorrec ◽

Philip A. Knight ◽

Auguste Caen

Keyword(s):

State Of The Art ◽

Learning Algorithms ◽

Selection Procedure ◽

Principal Component ◽

Feature Extraction Method ◽

Learning Network ◽

Convolutional Networks ◽

Original Dataset ◽

Supervised Learning Algorithms ◽

As Graph

AbstractTechniques for learning vectorial representations of graphs (graph embeddings) have recently emerged as an effective approach to facilitate machine learning on graphs. Some of the most popular methods involve sophisticated features such as graph kernels or convolutional networks. In this work, we introduce two straightforward supervised learning algorithms based on small-size graphlet counts, combined with a dimension reduction step. The first relies on a classic feature extraction method powered by principal component analysis (PCA). The second is a feature selection procedure also based on PCA. Despite their conceptual simplicity, these embeddings are arguably more meaningful than some popular alternatives and at the same time are competitive with state-of-the-art methods. We illustrate this second point on a downstream classification task. We then use our algorithms in a novel setting, namely to conduct an analysis of author relationships in Wikipedia articles, for which we present an original dataset. Finally, we provide empirical evidence suggesting that our methods could also be adapted to unsupervised learning algorithms.

Download Full-text

Early Stage Identification of COVID-19 Patients in Mexico Using Machine Learning: A Case Study for the Tijuana General Hospital

Information ◽

10.3390/info12120490 ◽

2021 ◽

Vol 12 (12) ◽

pp. 490

Author(s):

Cristián Castillo-Olea ◽

Roberto Conte-Galván ◽

Clemente Zuñiga ◽

Alexandra Siono ◽

Angelica Huerta ◽

...

Keyword(s):

Machine Learning ◽

Mathematical Model ◽

Logistic Regression ◽

General Hospital ◽

Early Stage ◽

Individual Characteristics ◽

Supervised Learning Algorithms ◽

Neural Network Algorithm ◽

The Individual ◽

Logistic Regression Algorithm

Background: The current pandemic caused by SARS-CoV-2 is an acute illness of global concern. SARS-CoV-2 is an infectious disease caused by a recently discovered coronavirus. Most people who get sick from COVID-19 experience either mild, moderate, or severe symptoms. In order to help make quick decisions regarding treatment and isolation needs, it is useful to determine which significant variables indicate infection cases in the population served by the Tijuana General Hospital (Hospital General de Tijuana). An Artificial Intelligence (Machine Learning) mathematical model was developed in order to identify early-stage significant variables in COVID-19 patients. Methods: The individual characteristics of the study subjects included age, gender, age group, symptoms, comorbidities, diagnosis, and outcomes. A mathematical model that uses supervised learning algorithms, allowing the identification of the significant variables that predict the diagnosis of COVID-19 with high precision, was developed. Results: Automatic algorithms were used to analyze the data: for Systolic Arterial Hypertension (SAH), the Logistic Regression algorithm showed results of 91.0% in area under ROC (AUC), 80% accuracy (CA), 80% F1 and 80% Recall, and 80.1% precision for the selected variables, while for Diabetes Mellitus (DM) with the Logistic Regression algorithm it obtained 91.2% AUC, 89.2% accuracy, 88.8% F1, 89.7% precision, and 89.2% recall for the selected variables. The neural network algorithm showed better results for patients with Obesity, obtaining 83.4% AUC, 91.4% accuracy, 89.9% F1, 90.6% precision, and 91.4% recall. Conclusions: Statistical analyses revealed that the significant predictive symptoms in patients with SAH, DM, and Obesity were more substantial in fatigue and myalgias/arthralgias. In contrast, the third dominant symptom in people with SAH and DM was odynophagia.

Download Full-text

Combating money laundering with machine learning – applicability of supervised-learning algorithms at cryptocurrency exchanges

Journal of Money Laundering Control ◽

10.1108/jmlc-09-2021-0106 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Eric Pettersson Ruiz ◽

Jannis Angelis

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Money Laundering ◽

Qualitative Interviews ◽

Learning Algorithms ◽

Content Type ◽

Study Results ◽

Supervised Learning Algorithms ◽

Multiple Accounts ◽

Combat Money Laundering

Purpose This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing funds to multiple accounts and then reexchanging the crypto back. This process of exchanging currencies is done through cryptocurrency exchanges. Current preventive efforts are outdated, and ML may provide novel ways to identify illicit currency movements. Hence, this study investigates ML applicability for combatting money laundering activities using cryptocurrency. Design/methodology/approach Four supervised-learning algorithms were compared using the Bitcoin Elliptic Dataset. The method covered a quantitative analysis of the algorithmic performance, capturing differences in three key evaluation metrics of F1-scores, precision and recall. Two complementary qualitative interviews were performed at cryptocurrency exchanges to identify fit and applicability of the algorithms. Findings The study results show that the current implemented ML tools for preventing money laundering at cryptocurrency exchanges are all too slow and need to be optimized for the task. The results also show that while not one single algorithm is most suitable for detecting transactions related to money-laundering, the specific applicability of the decision tree algorithm is most suitable for adoption by cryptocurrency exchanges. Originality/value Given the growth of cryptocurrency use, this study explores the newly developed field of algorithmic tools to combat illicit currency movement, in particular in the growing arena of cryptocurrencies. The study results provide new insights into the applicability of ML as a tool to combat money laundering using cryptocurrency exchanges.

Download Full-text

Developing a Framework for Detecting Phishing URLs using Machine Learning

International Journal of Emerging Technology and Advanced Engineering ◽

10.46338/ijetae1121_08 ◽

2021 ◽

Vol 11 (11) ◽

pp. 61-67

Author(s):

Nguyen Tung Lam ◽

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Learning Algorithms ◽

End Users ◽

Research Results ◽

Supervised Learning Algorithms ◽

User Data ◽

Abnormal Behaviors

The attack technique targeting end-users through phishing URLs is very dangerous nowadays. With this technique, attackers could steal user data or take control of the system, etc. Therefore, early detecting phishing URLs is essential. In this paper, we propose a method to detect phishing URLs based on supervised learning algorithms and abnormal behaviors from URLs. Finally, based on the research results, we build a framework for detecting phishing URLs through endusers. The novelty and advantage of our proposed method are that abnormal behaviors are extracted based on URLs which are monitored and collected directly from attack campaigns instead of using inefficient old datasets. Keywords— phishing URLs; detecting phishing URLs; abnormal behaviors of phishing URLs; Machine learning

Download Full-text

Textile Wastewater Treatment in a Spinning Disc Reactor: Improved Performances—Experimental, Modeling and SVM Optimization

Processes ◽

10.3390/pr9112003 ◽

2021 ◽

Vol 9 (11) ◽

pp. 2003

Author(s):

Carmen Zaharia ◽

Florin Leon ◽

Silvia Curteanu ◽

Eugenia Teodora Iacob-Tudose

Keyword(s):

Wastewater Treatment ◽

Suspended Solids ◽

Textile Wastewater ◽

Primary Treatment ◽

Fenton Oxidation ◽

Ferrous Ions ◽

Supervised Learning Algorithms ◽

Fenton Oxidation Process ◽

Textile Wastewater Treatment ◽

Spinning Disc

The paper presents an experimental study regarding the treatment of a real textile wastewater using the spinning disc (SD) technology, either individually or associated with an advanced Fenton oxidation step. The SD efficiency was investigated by studying the color, suspended solids, or turbidity removals, at distinctive feeding flowrates (10–30 L/h) and disc rotating speeds (100–1500 rpm). The data revealed increasing removal trends and allowed to establish the highest removal values. Based on obtained experimental results, the wastewater treatment efficiency by SD technology was reasonably good and thus, the WW indicators can be improved within relatively short periods of time. Additionally, based on supervised learning algorithms, the study includes treatment modeling for turbidity and color removal, followed by turbidity removal optimization relying on the best learned models. Satisfactory results obtained with the modeling and optimization procedures provide useful predictions for the approached treatment processes. Furthermore, within this study, a Fenton oxidation process was applied to SD technology to minimize the color and solids content. The influence of pH, hydrogen peroxide and ferrous ions concentrations was also investigated in order to establish the highest removal efficiencies. Overall, the SD technology applied in textile effluents treatment proved to be an appropriate and efficient alternative to classical mechanical step applied within the primary treatment step and, when associated with an advanced oxidative process in the secondary step, rendered good improvement, namely of 62.84% and 69.46% for color and respectively, suspended solids removal.

Download Full-text

supervised learning algorithms
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Predicting bend-induced heterogeneity in sediment microbial communities by integrating bacteria-based index of biotic integrity and supervised learning algorithms

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

A comparative study using supervised learning for anomaly detection in network traffic

Deep Semi-Supervised Image Classification Algorithms: a Survey

YouTube based religious hate speech and extremism detection dataset with machine learning baselines

Learning network embeddings using small graphlets

Early Stage Identification of COVID-19 Patients in Mexico Using Machine Learning: A Case Study for the Tijuana General Hospital

Combating money laundering with machine learning – applicability of supervised-learning algorithms at cryptocurrency exchanges

Developing a Framework for Detecting Phishing URLs using Machine Learning

Textile Wastewater Treatment in a Spinning Disc Reactor: Improved Performances—Experimental, Modeling and SVM Optimization

Export Citation Format

supervised learning algorithmsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Predicting bend-induced heterogeneity in sediment microbial communities by integrating bacteria-based index of biotic integrity and supervised learning algorithms

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

A comparative study using supervised learning for anomaly detection in network traffic

Deep Semi-Supervised Image Classification Algorithms: a Survey

YouTube based religious hate speech and extremism detection dataset with machine learning baselines

Learning network embeddings using small graphlets

Early Stage Identification of COVID-19 Patients in Mexico Using Machine Learning: A Case Study for the Tijuana General Hospital

Combating money laundering with machine learning – applicability of supervised-learning algorithms at cryptocurrency exchanges

Developing a Framework for Detecting Phishing URLs using Machine Learning

Textile Wastewater Treatment in a Spinning Disc Reactor: Improved Performances—Experimental, Modeling and SVM Optimization

supervised learning algorithms
Recently Published Documents