scholarly journals Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Molecules ◽  
2019 ◽  
Vol 24 (15) ◽  
pp. 2811 ◽  
Author(s):  
Rácz ◽  
Bajusz ◽  
Héberger

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

Author(s):  
Abdullahi Adeleke ◽  
Noor Azah Samsudin ◽  
Mohd Hisyam Abdul Rahim ◽  
Shamsul Kamal Ahmad Khalid ◽  
Riswan Efendi

Machine learning involves the task of training systems to be able to make decisions without being explicitly programmed. Important among machine learning tasks is classification involving the process of training machines to make predictions from predefined labels. Classification is broadly categorized into three distinct groups: single-label (SL), multi-class, and multi-label (ML) classification. This research work presents an application of a multi-label classification (MLC) technique in automating Quranic verses labeling. MLC has been gaining attention in recent years. This is due to the increasing amount of works based on real-world classification problems of multi-label data. In traditional classification problems, patterns are associated with a single-label from a set of disjoint labels. However, in MLC, an instance of data is associated with a set of labels. In this paper, three standard <em>MLC</em> methods: <span>binary relevance (BR), classifier chain (CC), and label powerset (LP) algorithms are implemented with four baseline classifiers: support vector machine (SVM), naïve Bayes (NB), k-nearest neighbors (k-NN), and J48. The research methodology adopts the multi-label problem transformation (PT) approach. The results are validated using six conventional performance metrics. These include: hamming loss, accuracy, one error, micro-F1, macro-F1, and avg. precision. From the results, the classifiers effectively achieved above 70% accuracy mark. Overall, SVM achieved the best results with CC and LP algorithms.</span>


2021 ◽  
Vol 16 (11) ◽  
pp. P11041
Author(s):  
Y. Yeyu ◽  
J. Wenbao ◽  
H. Daqian ◽  
S. Aiyun ◽  
C. Can ◽  
...  

Abstract The identification of magnesium and aluminum in scrap metal recycling has always been a difficult point. In this paper, a material identification method of multi-energy X-ray transmission (ME-XRT) based on photon counting detector (PCD) and machine learning algorithm was proposed and used to identify and classify magnesium and aluminum. This method includes three main steps: using PCD to obtain X-ray attenuation images of five energy bins, feature extraction, and the machine learning classification. The performance of several machine learning models was compared for the fine-grained classification task. The prediction results demonstrate that the best achieved recognition rates of aluminum and magnesium are 96.43% and 98.81%, respectively.


2020 ◽  
Vol 13 (5) ◽  
pp. 508-523 ◽  
Author(s):  
Guan‐Hua Huang ◽  
Chih‐Hsuan Lin ◽  
Yu‐Ren Cai ◽  
Tai‐Been Chen ◽  
Shih‐Yen Hsu ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gabriel A. Colozza-Gama ◽  
Fabiano Callegari ◽  
Nikola Bešič ◽  
Ana C. de J. Paviza ◽  
Janete M. Cerutti

AbstractSomatic mutations in cancer driver genes can help diagnosis, prognosis and treatment decisions. Formalin-fixed paraffin-embedded (FFPE) specimen is the main source of DNA for somatic mutation detection. To overcome constraints of DNA isolated from FFPE, we compared pyrosequencing and ddPCR analysis for absolute quantification of BRAF V600E mutation in the DNA extracted from FFPE specimens and compared the results to the qualitative detection information obtained by Sanger Sequencing. Sanger sequencing was able to detect BRAF V600E mutation only when it was present in more than 15% total alleles. Although the sensitivity of ddPCR is higher than that observed for Sanger, it was less consistent than pyrosequencing, likely due to droplet classification bias of FFPE-derived DNA. To address the droplet allocation bias in ddPCR analysis, we have compared different algorithms for automated droplet classification and next correlated these findings with those obtained from pyrosequencing. By examining the addition of non-classifiable droplets (rain) in ddPCR, it was possible to obtain better qualitative classification of droplets and better quantitative classification compared to no rain droplets, when considering pyrosequencing results. Notable, only the Machine learning k-NN algorithm was able to automatically classify the samples, surpassing manual classification based on no-template controls, which shows promise in clinical practice.


2021 ◽  
Vol 79 ◽  
pp. 52-58
Author(s):  
Arnaldo Stanzione ◽  
Renato Cuocolo ◽  
Francesco Verde ◽  
Roberta Galatola ◽  
Valeria Romeo ◽  
...  

2021 ◽  
Vol 11 (3) ◽  
pp. 92
Author(s):  
Mehdi Berriri ◽  
Sofiane Djema ◽  
Gaëtan Rey ◽  
Christel Dartigues-Pallez

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.


Heliyon ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e06257
Author(s):  
Ennio Idrobo-Ávila ◽  
Humberto Loaiza-Correa ◽  
Rubiel Vargas-Cañas ◽  
Flavio Muñoz-Bolaños ◽  
Leon van Noorden

Sign in / Sign up

Export Citation Format

Share Document