Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem

The current paper presents a novel method for reducing a multiclass confusion matrix into a 2×2 version enabling the exploitation of the relevant performance metrics and methods such as the receiver operating characteristic and area under the curve for the assessment of different classification algorithms. The reduction method is based on class grouping and leads to a special type of matrix called the reduced confusion matrix. The developed method is then exploited for the assessment of state of the art machine learning algorithms applied on the net promoter score classification problem in the field of customer experience analytics indicating the value of the proposed method in real world classification problems.

Download Full-text

Multi-Class Confusion Matrix Reduction method and its application on Net Promoter Score classification problem

The 14th PErvasive Technologies Related to Assistive Environments Conference ◽

10.1145/3453892.3461323 ◽

2021 ◽

Author(s):

Ioannis Markoulidakis ◽

George Kopsiaftis ◽

Ioannis Rallis ◽

Ioannis Georgoulas

Keyword(s):

Reduction Method ◽

Confusion Matrix ◽

Classification Problem ◽

Matrix Reduction ◽

Net Promoter Score ◽

Net Promoter

Download Full-text

Logistic Regression for Health Profiling

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1294.0886s219 ◽

2019 ◽

Vol 8 (6S2) ◽

pp. 974-977

Keyword(s):

Logistic Regression ◽

Ordinal Data ◽

Binary Classification ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Classification Problems ◽

Specific Product ◽

Logistic Analysis ◽

Medical Researcher ◽

Diabetes Prediction

in an event when there is lots of risk factor then the logistic regression is used for predicting the probability. For binary and ordinal data the medical researcher increase the use of logistic analysis. Several classification problems like spam detection used logistic regression. If a customer purchases a specific product in Diabetes prediction or they will inspire with any other competitor, whether customer click on given advertisement link or not are some example. For two class classification the Logistic Regression is one of the most simple and common machine Learning algorithms. For any binary classification problem it is very easy to use as a basic approach. Deep learning is also its fundamental concept. The relationship measurement and description between dependent binary variable and independent variables can be done by logistic regression.

Download Full-text

Exploring Symmetry of Binary Classification Performance Metrics

Symmetry ◽

10.3390/sym11010047 ◽

2019 ◽

Vol 11 (1) ◽

pp. 47 ◽

Cited By ~ 1

Author(s):

Amalia Luque ◽

Alejandro Carrasco ◽

Alejandro Martín ◽

Juan Ramón Lama

Keyword(s):

Performance Metrics ◽

Binary Classification ◽

Confusion Matrix ◽

Full Range ◽

Classification Performance ◽

Classification Problems ◽

Performance Metric ◽

Selection For ◽

Proper Performance ◽

Insight Into

Selecting the proper performance metric constitutes a key issue for most classification problems in the field of machine learning. Although the specialized literature has addressed several topics regarding these metrics, their symmetries have yet to be systematically studied. This research focuses on ten metrics based on a binary confusion matrix and their symmetric behaviour is formally defined under all types of transformations. Through simulated experiments, which cover the full range of datasets and classification results, the symmetric behaviour of these metrics is explored by exposing them to hundreds of simple or combined symmetric transformations. Cross-symmetries among the metrics and statistical symmetries are also explored. The results obtained show that, in all cases, three and only three types of symmetries arise: labelling inversion (between positive and negative classes); scoring inversion (concerning good and bad classifiers); and the combination of these two inversions. Additionally, certain metrics have been shown to be independent of the imbalance in the dataset and two cross-symmetries have been identified. The results regarding their symmetries reveal a deeper insight into the behaviour of various performance metrics and offer an indicator to properly interpret their values and a guide for their selection for certain specific applications.

Download Full-text

Performance Evaluation of Different Machine Learning Classification Algorithms for Disease Diagnosis

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101.oa5 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-28

Author(s):

Munder Abdulatef Al-Hashem ◽

Ali Mohammad Alqudah ◽

Qasem Qananwah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Confusion Matrix ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.

Download Full-text

Classification model for accuracy and intrusion detection using machine learning approach

PeerJ Computer Science ◽

10.7717/peerj-cs.437 ◽

2021 ◽

Vol 7 ◽

pp. e437

Author(s):

Arushi Agarwal ◽

Purushottam Sharma ◽

Mohammed Alshehri ◽

Ahmed A. Mohamed ◽

Osama Alfarraj

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Nearest Neighbor ◽

Performance Metrics ◽

Detection System ◽

Confusion Matrix ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor

In today’s cyber world, the demand for the internet is increasing day by day, increasing the concern of network security. The aim of an Intrusion Detection System (IDS) is to provide approaches against many fast-growing network attacks (e.g., DDoS attack, Ransomware attack, Botnet attack, etc.), as it blocks the harmful activities occurring in the network system. In this work, three different classification machine learning algorithms—Naïve Bayes (NB), Support Vector Machine (SVM), and K-nearest neighbor (KNN)—were used to detect the accuracy and reducing the processing time of an algorithm on the UNSW-NB15 dataset and to find the best-suited algorithm which can efficiently learn the pattern of the suspicious network activities. The data gathered from the feature set comparison was then applied as input to IDS as data feeds to train the system for future intrusion behavior prediction and analysis using the best-fit algorithm chosen from the above three algorithms based on the performance metrics found. Also, the classification reports (Precision, Recall, and F1-score) and confusion matrix were generated and compared to finalize the support-validation status found throughout the testing phase of the model used in this approach.

Download Full-text

Application Programming Interface for a Customer Experience Analysis Tool

10.3233/faia210092 ◽

2021 ◽

Author(s):

George Kopsiaftis ◽

Ioannis Georgoulas ◽

Ioannis Rallis ◽

Ioannis Markoulidakis ◽

Kostis Tzanettis ◽

...

Keyword(s):

Graphical Representation ◽

Application Programming Interface ◽

Customer Experience ◽

Machine Learning Algorithms ◽

Analysis Tool ◽

Net Promoter Score ◽

Application Programming ◽

Net Promoter ◽

User Friendly ◽

Programming Interface

This paper analyzes the architecture of an application programming interface (API) developed for a novel customer experience tool. The CX tool aims to monitor the customer satisfaction, based on several experience attributes and metrics, such as the Net Promoter Score. The API aims to create an efficient and user-friendly environment, which allow users to utilize all the available features of the customer experience system, including the exploitation of state-of-the-art machine learning algorithms, the analysis of the data and the graphical representation of the results.

Download Full-text

Multiclass Boosting with Adaptive Group-BasedkNN and Its Application in Text Categorization

Mathematical Problems in Engineering ◽

10.1155/2012/793490 ◽

2012 ◽

Vol 2012 ◽

pp. 1-24 ◽

Cited By ~ 6

Author(s):

Lei La ◽

Qiao Guo ◽

Dequan Yang ◽

Qimin Cao

Keyword(s):

Chinese Text ◽

Text Categorization ◽

Nearest Neighbor ◽

Classification Problem ◽

Support Vector ◽

Classification Problems ◽

Adaboost Algorithm ◽

Novel Method ◽

Categorization System ◽

Multi Class Classification

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine (SVM), neural networks (NN), naïve Bayes, andk-nearest neighbor (kNN). This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple two-class classification problems. This novel method is more effective. In addition, it keeps the accuracy advantage of existing AdaBoost. An adaptive group-basedkNN method is proposed in this paper to build more accurate weak classifiers and in this way control the number of basis classifiers in an acceptable range. To further enhance the performance, weak classifiers are combined into a strong classifier through a double iterative weighted way and construct an adaptive group-basedkNN boosting algorithm (AGkNN-AdaBoost). We implement AGkNN-AdaBoost in a Chinese text categorization system. Experimental results showed that the classification algorithm proposed in this paper has better performance both in precision and recall than many other text categorization methods including traditional AdaBoost. In addition, the processing speed is significantly enhanced than original AdaBoost and many other classic categorization algorithms.

Download Full-text

Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements

Processes ◽

10.3390/pr8060638 ◽

2020 ◽

Vol 8 (6) ◽

pp. 638 ◽

Cited By ~ 1

Author(s):

Simon Orozco-Arias ◽

Johan S. Piña ◽

Reinel Tabares-Soto ◽

Luis F. Castillo-Ossa ◽

Romain Guyot ◽

...

Keyword(s):

Machine Learning ◽

Transposable Elements ◽

Performance Metrics ◽

Confusion Matrix ◽

Machine Learning Algorithms ◽

Invariance Properties ◽

Negative Class ◽

Precision Recall Curve ◽

And Training ◽

Selection Of

Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure the realistic performance of algorithms. Each metric has specific characteristics and measures properties that may be different from the predicted results. Although the most commonly used way to compare measures is by using empirical analysis, a non-result-based methodology has been proposed, called measure invariance properties. These properties are calculated on the basis of whether a given measure changes its value under certain modifications in the confusion matrix, giving comparative parameters independent of the datasets. Measure invariance properties make metrics more or less informative, particularly on unbalanced, monomodal, or multimodal negative class datasets and for real or simulated datasets. Although several studies applied ML to detect and classify TEs, there are no works evaluating performance metrics in TE tasks. Here, we analyzed 26 different metrics utilized in binary, multiclass, and hierarchical classifications, through bibliographic sources, and their invariance properties. Then, we corroborated our findings utilizing freely available TE datasets and commonly used ML algorithms. Based on our analysis, the most suitable metrics for TE tasks must be stable, even using highly unbalanced datasets, multimodal negative class, and training datasets with errors or outliers. Based on these parameters, we conclude that the F1-score and the area under the precision-recall curve are the most informative metrics since they are calculated based on other metrics, providing insight into the development of an ML application.

Download Full-text

Machine learning applications to predict two-phase flow patterns

PeerJ Computer Science ◽

10.7717/peerj-cs.798 ◽

2021 ◽

Vol 7 ◽

pp. e798

Author(s):

Harold Brayan Arteaga-Arteaga ◽

Alejandro Mora-Rubio ◽

Frank Florez ◽

Nicolas Murcia-Orjuela ◽

Cristhian Eduardo Diaz-Ortega ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Two Phase Flow ◽

Classification Problem ◽

Flow Patterns ◽

Machine Learning Algorithms ◽

Test Machine ◽

Phase Flow ◽

Classification Problems ◽

Two Phase

Recent advances in artificial intelligence with traditional machine learning algorithms and deep learning architectures solve complex classification problems. This work presents the performance of different artificial intelligence models to classify two-phase flow patterns, showing the best alternatives for this specific classification problem using two-phase flow regimes (liquid and gas) in pipes. Flow patterns are affected by physical variables such as superficial velocity, viscosity, density, and superficial tension. They also depend on the construction characteristics of the pipe, such as the angle of inclination and the diameter. We selected 12 databases (9,029 samples) to train and test machine learning models, considering these variables that influence the flow patterns. The primary dataset is Shoham (1982), containing 5,675 samples with six different flow patterns. An extensive set of metrics validated the results obtained. The most relevant characteristics for training the models using Shoham (1982) dataset are gas and liquid superficial velocities, angle of inclination, and diameter. Regarding the algorithms, the Extra Trees model classifies the flow patterns with the highest degree of fidelity, achieving an accuracy of 98.8%.

Download Full-text

Prediction of GPCR-Ligand Binding Using Machine Learning Algorithms

Computational and Mathematical Methods in Medicine ◽

10.1155/2018/6565241 ◽

2018 ◽

Vol 2018 ◽

pp. 1-5

Author(s):

Sangmin Seo ◽

Jonghwan Choi ◽

Soon Kil Ahn ◽

Kil Won Kim ◽

Jaekwang Kim ◽

...

Keyword(s):

Ligand Binding ◽

3D Structure ◽

Area Under The Curve ◽

G Protein Coupled Receptors ◽

Machine Learning Algorithms ◽

Amino Acid Motif ◽

Average Area ◽

Novel Method ◽

Cycle Structures ◽

G Protein Coupled

We propose a novel method that predicts binding of G-protein coupled receptors (GPCRs) and ligands. The proposed method uses hub and cycle structures of ligands and amino acid motif sequences of GPCRs, rather than the 3D structure of a receptor or similarity of receptors or ligands. The experimental results show that these new features can be effective in predicting GPCR-ligand binding (average area under the curve [AUC] of 0.944), because they are thought to include hidden properties of good ligand-receptor binding. Using the proposed method, we were able to identify novel ligand-GPCR bindings, some of which are supported by several studies.

Download Full-text