scholarly journals Benchmarking Analysis of the Accuracy of Classification Methods Related to Entropy

Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 850
Author(s):  
Yolanda Orenes ◽  
Alejandro Rabasa ◽  
Jesus Javier Rodriguez-Sala ◽  
Joaquin Sanchez-Soriano

In the machine learning literature we can find numerous methods to solve classification problems. We propose two new performance measures to analyze such methods. These measures are defined by using the concept of proportional reduction of classification error with respect to three benchmark classifiers, the random and two intuitive classifiers which are based on how a non-expert person could realize classification simply by applying a frequentist approach. We show that these three simple methods are closely related to different aspects of the entropy of the dataset. Therefore, these measures account somewhat for entropy in the dataset when evaluating the performance of classifiers. This allows us to measure the improvement in the classification results compared to simple methods, and at the same time how entropy affects classification capacity. To illustrate how these new performance measures can be used to analyze classifiers taking into account the entropy of the dataset, we carry out an intensive experiment in which we use the well-known J48 algorithm, and a UCI repository dataset on which we have previously selected a subset of the most relevant attributes. Then we carry out an extensive experiment in which we consider four heuristic classifiers, and 11 datasets.

Author(s):  
Yan Zhao ◽  
Yiyu Yao

Classification is one of the main tasks in machine learning, data mining, and pattern recognition. Compared with the extensively studied automation approaches, the interactive approaches, centered on human users, are less explored. This chapter studies interactive classification at 3 levels. At the philosophical level, the motivations and a process-based framework of interactive classification are proposed. At the technical level, a granular computing model is suggested for re-examining not only existing classification problems, but also interactive classification problems. At the application level, an interactive classification system (ICS), using a granule network as the search space, is introduced. ICS allows multi-strategies for granule tree construction, and enhances the understanding and interpretation of the classification process. Interactive classification is complementary to the existing classification methods.


2021 ◽  
Vol 13 (1) ◽  
pp. 11-19
Author(s):  
Mingxing Gong

Machine learning models have been widely used in numerous classification problems and performance measures play a critical role in machine learning model development, selection, and evaluation. This paper covers a comprehensive overview of performance measures in machine learning classification. Besides, we proposed a framework to construct a novel evaluation metric that is based on the voting results of three performance measures, each of which has strengths and limitations. The new metric can be proved better than accuracy in terms of consistency and discriminancy.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Ting Wang ◽  
Sheng-Uei Guan ◽  
Ka Lok Man ◽  
T. O. Ting

Eye state identification is a kind of common time-series classification problem which is also a hot spot in recent research. Electroencephalography (EEG) is widely used in eye state classification to detect human's cognition state. Previous research has validated the feasibility of machine learning and statistical approaches for EEG eye state classification. This paper aims to propose a novel approach for EEG eye state identification using incremental attribute learning (IAL) based on neural networks. IAL is a novel machine learning strategy which gradually imports and trains features one by one. Previous studies have verified that such an approach is applicable for solving a number of pattern recognition problems. However, in these previous works, little research on IAL focused on its application to time-series problems. Therefore, it is still unknown whether IAL can be employed to cope with time-series problems like EEG eye state classification. Experimental results in this study demonstrates that, with proper feature extraction and feature ordering, IAL can not only efficiently cope with time-series classification problems, but also exhibit better classification performance in terms of classification error rates in comparison with conventional and some other approaches.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Micheal Olaolu Arowolo ◽  
Marion Olubunmi Adebiyi ◽  
Charity Aremu ◽  
Ayodele A. Adebiyi

AbstractRecently unique spans of genetic data are produced by researchers, there is a trend in genetic exploration using machine learning integrated analysis and virtual combination of adaptive data into the solution of classification problems. Detection of ailments and infections at early stage is of key concern and a huge challenge for researchers in the field of machine learning classification and bioinformatics. Considerate genes contributing to diseases are of huge dispute to a lot of researchers. This study reviews various works on Dimensionality reduction techniques for reducing sets of features that groups data effectively with less computational processing time and classification methods that contributes to the advances of RNA-Sequencing approach.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


2020 ◽  
Vol 89 ◽  
pp. 20-29
Author(s):  
Sh. K. Kadiev ◽  
◽  
R. Sh. Khabibulin ◽  
P. P. Godlevskiy ◽  
V. L. Semikov ◽  
...  

Introduction. An overview of research in the field of classification as a method of machine learning is given. Articles containing mathematical models and algorithms for classification were selected. The use of classification in intelligent management decision support systems in various subject areas is also relevant. Goal and objectives. The purpose of the study is to analyze papers on the classification as a machine learning method. To achieve the objective, it is necessary to solve the following tasks: 1) to identify the most used classification methods in machine learning; 2) to highlight the advantages and disadvantages of each of the selected methods; 3) to analyze the possibility of using classification methods in intelligent systems to support management decisions to solve issues of forecasting, prevention and elimination of emergencies. Methods. To obtain the results, general scientific and special methods of scientific knowledge were used - analysis, synthesis, generalization, as well as the classification method. Results and discussion thereof. According to the results of the analysis, studies with a mathematical formulation and the availability of software developments were identified. The issues of classification in the implementation of machine learning in the development of intelligent decision support systems are considered. Conclusion. The analysis revealed that enough algorithms were used to perform the classification while sorting the acquired knowledge within the subject area. The implementation of an accurate classification is one of the fundamental problems in the development of management decision support systems, including for fire and emergency prevention and response. Timely and effective decision by officials of operational shifts for the disaster management is also relevant. Key words: decision support, analysis, classification, machine learning, algorithm, mathematical models.


Symmetry ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1116
Author(s):  
Zeba Mahmood ◽  
Vacius Jusas

This paper introduces a blockchain-based federated learning (FL) framework with incentives for participating nodes to enhance the accuracy of classification problems. Machine learning technology has been rapidly developed and changed from a global perspective for the past few years. The FL framework is based on the Ethereum blockchain and creates an autonomous ecosystem, where nodes compete to improve the accuracy of classification problems. With privacy being one of the biggest concerns, FL makes use of the blockchain-based approach to ensure privacy and security. Another important technology that underlies the FL framework is zero-knowledge proofs (ZKPs), which ensure that data uploaded to the network are accurate and private. Basically, ZKPs allow nodes to compete fairly by only submitting accurate models to the parameter server and get rewarded for that. We have conducted an analysis and found that ZKPs can help improve the accuracy of models submitted to the parameter server and facilitate the honest participation of all nodes in FL.


Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 314 ◽  
Author(s):  
Jim Samuel ◽  
G. G. Md. Nawaz Ali ◽  
Md. Mokhlesur Rahman ◽  
Ek Esawi ◽  
Yana Samuel

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.


Mathematics ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 936
Author(s):  
Jianli Shao ◽  
Xin Liu ◽  
Wenqing He

Imbalanced data exist in many classification problems. The classification of imbalanced data has remarkable challenges in machine learning. The support vector machine (SVM) and its variants are popularly used in machine learning among different classifiers thanks to their flexibility and interpretability. However, the performance of SVMs is impacted when the data are imbalanced, which is a typical data structure in the multi-category classification problem. In this paper, we employ the data-adaptive SVM with scaled kernel functions to classify instances for a multi-class population. We propose a multi-class data-dependent kernel function for the SVM by considering class imbalance and the spatial association among instances so that the classification accuracy is enhanced. Simulation studies demonstrate the superb performance of the proposed method, and a real multi-class prostate cancer image dataset is employed as an illustration. Not only does the proposed method outperform the competitor methods in terms of the commonly used accuracy measures such as the F-score and G-means, but also successfully detects more than 60% of instances from the rare class in the real data, while the competitors can only detect less than 20% of the rare class instances. The proposed method will benefit other scientific research fields, such as multiple region boundary detection.


2021 ◽  
Vol 13 (4) ◽  
pp. 787
Author(s):  
Lei Zhou ◽  
Ting Luo ◽  
Mingyi Du ◽  
Qiang Chen ◽  
Yang Liu ◽  
...  

Machine learning has been successfully used for object recognition within images. Due to the complexity of the spectrum and texture of construction and demolition waste (C&DW), it is difficult to construct an automatic identification method for C&DW based on machine learning and remote sensing data sources. Machine learning includes many types of algorithms; however, different algorithms and parameters have different identification effects on C&DW. Exploring the optimal method for automatic remote sensing identification of C&DW is an important approach for the intelligent supervision of C&DW. This study investigates the megacity of Beijing, which is facing high risk of C&DW pollution. To improve the classification accuracy of C&DW, buildings, vegetation, water, and crops were selected as comparative training samples based on the Google Earth Engine (GEE), and Sentinel-2 was used as the data source. Three classification methods of typical machine learning algorithms (classification and regression trees (CART), random forest (RF), and support vector machine (SVM)) were selected to classify the C&DW from remote sensing images. Using empirical methods, the experimental trial method, and the grid search method, the optimal parameterization scheme of the three classification methods was studied to determine the optimal method of remote sensing identification of C&DW based on machine learning. Through accuracy evaluation and ground verification, the overall recognition accuracies of CART, RF, and SVM for C&DW were 73.12%, 98.05%, and 85.62%, respectively, under the optimal parameterization scheme determined in this study. Among these algorithms, RF was a better C&DW identification method than were CART and SVM when the number of decision trees was 50. This study explores the robust machine learning method for automatic remote sensing identification of C&DW and provides a scientific basis for intelligent supervision and resource utilization of C&DW.


Sign in / Sign up

Export Citation Format

Share Document