scholarly journals Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem

Technologies ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 81
Author(s):  
Ioannis Markoulidakis ◽  
Ioannis Rallis ◽  
Ioannis Georgoulas ◽  
George Kopsiaftis ◽  
Anastasios Doulamis ◽  
...  

The current paper presents a novel method for reducing a multiclass confusion matrix into a 2×2 version enabling the exploitation of the relevant performance metrics and methods such as the receiver operating characteristic and area under the curve for the assessment of different classification algorithms. The reduction method is based on class grouping and leads to a special type of matrix called the reduced confusion matrix. The developed method is then exploited for the assessment of state of the art machine learning algorithms applied on the net promoter score classification problem in the field of customer experience analytics indicating the value of the proposed method in real world classification problems.

in an event when there is lots of risk factor then the logistic regression is used for predicting the probability. For binary and ordinal data the medical researcher increase the use of logistic analysis. Several classification problems like spam detection used logistic regression. If a customer purchases a specific product in Diabetes prediction or they will inspire with any other competitor, whether customer click on given advertisement link or not are some example. For two class classification the Logistic Regression is one of the most simple and common machine Learning algorithms. For any binary classification problem it is very easy to use as a basic approach. Deep learning is also its fundamental concept. The relationship measurement and description between dependent binary variable and independent variables can be done by logistic regression.


Symmetry ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 47 ◽  
Author(s):  
Amalia Luque ◽  
Alejandro Carrasco ◽  
Alejandro Martín ◽  
Juan Ramón Lama

Selecting the proper performance metric constitutes a key issue for most classification problems in the field of machine learning. Although the specialized literature has addressed several topics regarding these metrics, their symmetries have yet to be systematically studied. This research focuses on ten metrics based on a binary confusion matrix and their symmetric behaviour is formally defined under all types of transformations. Through simulated experiments, which cover the full range of datasets and classification results, the symmetric behaviour of these metrics is explored by exposing them to hundreds of simple or combined symmetric transformations. Cross-symmetries among the metrics and statistical symmetries are also explored. The results obtained show that, in all cases, three and only three types of symmetries arise: labelling inversion (between positive and negative classes); scoring inversion (concerning good and bad classifiers); and the combination of these two inversions. Additionally, certain metrics have been shown to be independent of the imbalance in the dataset and two cross-symmetries have been identified. The results regarding their symmetries reveal a deeper insight into the behaviour of various performance metrics and offer an indicator to properly interpret their values and a guide for their selection for certain specific applications.


Author(s):  
Munder Abdulatef Al-Hashem ◽  
Ali Mohammad Alqudah ◽  
Qasem Qananwah

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.


2021 ◽  
Vol 7 ◽  
pp. e437
Author(s):  
Arushi Agarwal ◽  
Purushottam Sharma ◽  
Mohammed Alshehri ◽  
Ahmed A. Mohamed ◽  
Osama Alfarraj

In today’s cyber world, the demand for the internet is increasing day by day, increasing the concern of network security. The aim of an Intrusion Detection System (IDS) is to provide approaches against many fast-growing network attacks (e.g., DDoS attack, Ransomware attack, Botnet attack, etc.), as it blocks the harmful activities occurring in the network system. In this work, three different classification machine learning algorithms—Naïve Bayes (NB), Support Vector Machine (SVM), and K-nearest neighbor (KNN)—were used to detect the accuracy and reducing the processing time of an algorithm on the UNSW-NB15 dataset and to find the best-suited algorithm which can efficiently learn the pattern of the suspicious network activities. The data gathered from the feature set comparison was then applied as input to IDS as data feeds to train the system for future intrusion behavior prediction and analysis using the best-fit algorithm chosen from the above three algorithms based on the performance metrics found. Also, the classification reports (Precision, Recall, and F1-score) and confusion matrix were generated and compared to finalize the support-validation status found throughout the testing phase of the model used in this approach.


2021 ◽  
Author(s):  
George Kopsiaftis ◽  
Ioannis Georgoulas ◽  
Ioannis Rallis ◽  
Ioannis Markoulidakis ◽  
Kostis Tzanettis ◽  
...  

This paper analyzes the architecture of an application programming interface (API) developed for a novel customer experience tool. The CX tool aims to monitor the customer satisfaction, based on several experience attributes and metrics, such as the Net Promoter Score. The API aims to create an efficient and user-friendly environment, which allow users to utilize all the available features of the customer experience system, including the exploitation of state-of-the-art machine learning algorithms, the analysis of the data and the graphical representation of the results.


2012 ◽  
Vol 2012 ◽  
pp. 1-24 ◽  
Author(s):  
Lei La ◽  
Qiao Guo ◽  
Dequan Yang ◽  
Qimin Cao

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine (SVM), neural networks (NN), naïve Bayes, andk-nearest neighbor (kNN). This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple two-class classification problems. This novel method is more effective. In addition, it keeps the accuracy advantage of existing AdaBoost. An adaptive group-basedkNN method is proposed in this paper to build more accurate weak classifiers and in this way control the number of basis classifiers in an acceptable range. To further enhance the performance, weak classifiers are combined into a strong classifier through a double iterative weighted way and construct an adaptive group-basedkNN boosting algorithm (AGkNN-AdaBoost). We implement AGkNN-AdaBoost in a Chinese text categorization system. Experimental results showed that the classification algorithm proposed in this paper has better performance both in precision and recall than many other text categorization methods including traditional AdaBoost. In addition, the processing speed is significantly enhanced than original AdaBoost and many other classic categorization algorithms.


Processes ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. 638 ◽  
Author(s):  
Simon Orozco-Arias ◽  
Johan S. Piña ◽  
Reinel Tabares-Soto ◽  
Luis F. Castillo-Ossa ◽  
Romain Guyot ◽  
...  

Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure the realistic performance of algorithms. Each metric has specific characteristics and measures properties that may be different from the predicted results. Although the most commonly used way to compare measures is by using empirical analysis, a non-result-based methodology has been proposed, called measure invariance properties. These properties are calculated on the basis of whether a given measure changes its value under certain modifications in the confusion matrix, giving comparative parameters independent of the datasets. Measure invariance properties make metrics more or less informative, particularly on unbalanced, monomodal, or multimodal negative class datasets and for real or simulated datasets. Although several studies applied ML to detect and classify TEs, there are no works evaluating performance metrics in TE tasks. Here, we analyzed 26 different metrics utilized in binary, multiclass, and hierarchical classifications, through bibliographic sources, and their invariance properties. Then, we corroborated our findings utilizing freely available TE datasets and commonly used ML algorithms. Based on our analysis, the most suitable metrics for TE tasks must be stable, even using highly unbalanced datasets, multimodal negative class, and training datasets with errors or outliers. Based on these parameters, we conclude that the F1-score and the area under the precision-recall curve are the most informative metrics since they are calculated based on other metrics, providing insight into the development of an ML application.


2021 ◽  
Vol 7 ◽  
pp. e798
Author(s):  
Harold Brayan Arteaga-Arteaga ◽  
Alejandro Mora-Rubio ◽  
Frank Florez ◽  
Nicolas Murcia-Orjuela ◽  
Cristhian Eduardo Diaz-Ortega ◽  
...  

Recent advances in artificial intelligence with traditional machine learning algorithms and deep learning architectures solve complex classification problems. This work presents the performance of different artificial intelligence models to classify two-phase flow patterns, showing the best alternatives for this specific classification problem using two-phase flow regimes (liquid and gas) in pipes. Flow patterns are affected by physical variables such as superficial velocity, viscosity, density, and superficial tension. They also depend on the construction characteristics of the pipe, such as the angle of inclination and the diameter. We selected 12 databases (9,029 samples) to train and test machine learning models, considering these variables that influence the flow patterns. The primary dataset is Shoham (1982), containing 5,675 samples with six different flow patterns. An extensive set of metrics validated the results obtained. The most relevant characteristics for training the models using Shoham (1982) dataset are gas and liquid superficial velocities, angle of inclination, and diameter. Regarding the algorithms, the Extra Trees model classifies the flow patterns with the highest degree of fidelity, achieving an accuracy of 98.8%.


2018 ◽  
Vol 2018 ◽  
pp. 1-5
Author(s):  
Sangmin Seo ◽  
Jonghwan Choi ◽  
Soon Kil Ahn ◽  
Kil Won Kim ◽  
Jaekwang Kim ◽  
...  

We propose a novel method that predicts binding of G-protein coupled receptors (GPCRs) and ligands. The proposed method uses hub and cycle structures of ligands and amino acid motif sequences of GPCRs, rather than the 3D structure of a receptor or similarity of receptors or ligands. The experimental results show that these new features can be effective in predicting GPCR-ligand binding (average area under the curve [AUC] of 0.944), because they are thought to include hidden properties of good ligand-receptor binding. Using the proposed method, we were able to identify novel ligand-GPCR bindings, some of which are supported by several studies.


Sign in / Sign up

Export Citation Format

Share Document