Benchmarking Analysis of the Accuracy of Classification Methods Related to Entropy

In the machine learning literature we can find numerous methods to solve classification problems. We propose two new performance measures to analyze such methods. These measures are defined by using the concept of proportional reduction of classification error with respect to three benchmark classifiers, the random and two intuitive classifiers which are based on how a non-expert person could realize classification simply by applying a frequentist approach. We show that these three simple methods are closely related to different aspects of the entropy of the dataset. Therefore, these measures account somewhat for entropy in the dataset when evaluating the performance of classifiers. This allows us to measure the improvement in the classification results compared to simple methods, and at the same time how entropy affects classification capacity. To illustrate how these new performance measures can be used to analyze classifiers taking into account the entropy of the dataset, we carry out an intensive experiment in which we use the well-known J48 algorithm, and a UCI repository dataset on which we have previously selected a subset of the most relevant attributes. Then we carry out an extensive experiment in which we consider four heuristic classifiers, and 11 datasets.

Download Full-text

Interactive Classification Using a Granule Network

Novel Approaches in Cognitive Informatics and Natural Intelligence ◽

10.4018/978-1-60566-170-4.ch016 ◽

2011 ◽

pp. 235-245

Author(s):

Yan Zhao ◽

Yiyu Yao

Keyword(s):

Machine Learning ◽

Data Mining ◽

Pattern Recognition ◽

Classification System ◽

Search Space ◽

Classification Methods ◽

Classification Problems ◽

Computing Model ◽

Tree Construction ◽

Learning Data

Classification is one of the main tasks in machine learning, data mining, and pattern recognition. Compared with the extensively studied automation approaches, the interactive approaches, centered on human users, are less explored. This chapter studies interactive classification at 3 levels. At the philosophical level, the motivations and a process-based framework of interactive classification are proposed. At the technical level, a granular computing model is suggested for re-examining not only existing classification problems, but also interactive classification problems. At the application level, an interactive classification system (ICS), using a granule network as the search space, is introduced. ICS allows multi-strategies for granule tree construction, and enhances the understanding and interpretation of the classification process. Interactive classification is complementary to the existing classification methods.

Download Full-text

A Novel Performance Measure for Machine Learning Classification

International Journal of Managing Information Technology ◽

10.5121/ijmit.2021.13101 ◽

2021 ◽

Vol 13 (1) ◽

pp. 11-19

Author(s):

Mingxing Gong

Keyword(s):

Machine Learning ◽

Performance Measures ◽

Critical Role ◽

Model Development ◽

Performance Measure ◽

Classification Problems ◽

Comprehensive Overview ◽

Machine Learning Classification ◽

And Performance ◽

Evaluation Metric

Machine learning models have been widely used in numerous classification problems and performance measures play a critical role in machine learning model development, selection, and evaluation. This paper covers a comprehensive overview of performance measures in machine learning classification. Besides, we proposed a framework to construct a novel evaluation metric that is based on the voting results of three performance measures, each of which has strengths and limitations. The new metric can be proved better than accuracy in terms of consistency and discriminancy.

Download Full-text

EEG Eye State Identification Using Incremental Attribute Learning with Time-Series Classification

Mathematical Problems in Engineering ◽

10.1155/2014/365101 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

Ting Wang ◽

Sheng-Uei Guan ◽

Ka Lok Man ◽

T. O. Ting

Keyword(s):

Machine Learning ◽

Time Series ◽

Hot Spot ◽

Error Rates ◽

Classification Error ◽

Time Series Classification ◽

Classification Problems ◽

State Identification ◽

State Classification ◽

Attribute Learning

Eye state identification is a kind of common time-series classification problem which is also a hot spot in recent research. Electroencephalography (EEG) is widely used in eye state classification to detect human's cognition state. Previous research has validated the feasibility of machine learning and statistical approaches for EEG eye state classification. This paper aims to propose a novel approach for EEG eye state identification using incremental attribute learning (IAL) based on neural networks. IAL is a novel machine learning strategy which gradually imports and trains features one by one. Previous studies have verified that such an approach is applicable for solving a number of pattern recognition problems. However, in these previous works, little research on IAL focused on its application to time-series problems. Therefore, it is still unknown whether IAL can be employed to cope with time-series problems like EEG eye state classification. Experimental results in this study demonstrates that, with proper feature extraction and feature ordering, IAL can not only efficiently cope with time-series classification problems, but also exhibit better classification performance in terms of classification error rates in comparison with conventional and some other approaches.

Download Full-text

A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector

Journal Of Big Data ◽

10.1186/s40537-021-00441-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Micheal Olaolu Arowolo ◽

Marion Olubunmi Adebiyi ◽

Charity Aremu ◽

Ayodele A. Adebiyi

Keyword(s):

Machine Learning ◽

Early Stage ◽

Integrated Analysis ◽

Classification Methods ◽

Rna Seq ◽

Classification Problems ◽

Machine Learning Classification ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques ◽

Huge Challenge

AbstractRecently unique spans of genetic data are produced by researchers, there is a trend in genetic exploration using machine learning integrated analysis and virtual combination of adaptive data into the solution of classification problems. Detection of ailments and infections at early stage is of key concern and a huge challenge for researchers in the field of machine learning classification and bioinformatics. Considerate genes contributing to diseases are of huge dispute to a lot of researchers. This study reviews various works on Dimensionality reduction techniques for reducing sets of features that groups data effectively with less computational processing time and classification methods that contributes to the advances of RNA-Sequencing approach.

Download Full-text

A Comparative Study of Different Machine Learning Algorithms for Disease Prediction

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0177 ◽

2017 ◽

Vol 7 (7) ◽

pp. 172

Author(s):

Anantvir Singh Romana

Keyword(s):

Machine Learning ◽

Subsequent Treatment ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Classification Problems ◽

Learning Techniques ◽

Neural Network Classifiers ◽

Diagnostic Detection

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.

Download Full-text

Review of classification studies for machine learning in the development of intelligent management decision support systems

Technology of technosphere safety ◽

10.25257/tts.2020.3.89.20-29 ◽

2020 ◽

Vol 89 ◽

pp. 20-29

Author(s):

Sh. K. Kadiev ◽

◽

R. Sh. Khabibulin ◽

P. P. Godlevskiy ◽

V. L. Semikov ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Mathematical Models ◽

Decision Support Systems ◽

Support Systems ◽

Management Decision ◽

Classification Methods ◽

Advantages And Disadvantages ◽

Intelligent Management ◽

Management Decision Support

Introduction. An overview of research in the field of classification as a method of machine learning is given. Articles containing mathematical models and algorithms for classification were selected. The use of classification in intelligent management decision support systems in various subject areas is also relevant. Goal and objectives. The purpose of the study is to analyze papers on the classification as a machine learning method. To achieve the objective, it is necessary to solve the following tasks: 1) to identify the most used classification methods in machine learning; 2) to highlight the advantages and disadvantages of each of the selected methods; 3) to analyze the possibility of using classification methods in intelligent systems to support management decisions to solve issues of forecasting, prevention and elimination of emergencies. Methods. To obtain the results, general scientific and special methods of scientific knowledge were used - analysis, synthesis, generalization, as well as the classification method. Results and discussion thereof. According to the results of the analysis, studies with a mathematical formulation and the availability of software developments were identified. The issues of classification in the implementation of machine learning in the development of intelligent decision support systems are considered. Conclusion. The analysis revealed that enough algorithms were used to perform the classification while sorting the acquired knowledge within the subject area. The implementation of an accurate classification is one of the fundamental problems in the development of management decision support systems, including for fire and emergency prevention and response. Timely and effective decision by officials of operational shifts for the disaster management is also relevant. Key words: decision support, analysis, classification, machine learning, algorithm, mathematical models.

Download Full-text

Implementation Framework for a Blockchain-Based Federated Learning Model for Classification Problems

Symmetry ◽

10.3390/sym13071116 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1116

Author(s):

Zeba Mahmood ◽

Vacius Jusas

Keyword(s):

Machine Learning ◽

Learning Model ◽

Global Perspective ◽

Learning Technology ◽

Classification Problems ◽

Zero Knowledge ◽

Privacy And Security ◽

Implementation Framework ◽

The Past

This paper introduces a blockchain-based federated learning (FL) framework with incentives for participating nodes to enhance the accuracy of classification problems. Machine learning technology has been rapidly developed and changed from a global perspective for the past few years. The FL framework is based on the Ethereum blockchain and creates an autonomous ecosystem, where nodes compete to improve the accuracy of classification problems. With privacy being one of the biggest concerns, FL makes use of the blockchain-based approach to ensure privacy and security. Another important technology that underlies the FL framework is zero-knowledge proofs (ZKPs), which ensure that data uploaded to the network are accurate and private. Basically, ZKPs allow nodes to compete fairly by only submitting accurate models to the parameter server and get rewarded for that. We have conducted an analysis and found that ZKPs can help improve the accuracy of models submitted to the parameter server and facilitate the honest participation of all nodes in FL.

Download Full-text

COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

Information ◽

10.3390/info11060314 ◽

2020 ◽

Vol 11 (6) ◽

pp. 314 ◽

Cited By ~ 17

Author(s):

Jim Samuel ◽

G. G. Md. Nawaz Ali ◽

Md. Mokhlesur Rahman ◽

Ek Esawi ◽

Yana Samuel

Keyword(s):

Machine Learning ◽

The United States ◽

Classification Methods ◽

Reasonable Accuracy ◽

Bayes Method ◽

Inaccurate Information ◽

Public Sentiment ◽

Research Article ◽

Textual Data ◽

Data Visualizations

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

Download Full-text

Kernel Based Data-Adaptive Support Vector Machines for Multi-Class Classification

Mathematics ◽

10.3390/math9090936 ◽

2021 ◽

Vol 9 (9) ◽

pp. 936

Author(s):

Jianli Shao ◽

Xin Liu ◽

Wenqing He

Keyword(s):

Machine Learning ◽

Spatial Association ◽

Class Imbalance ◽

Imbalanced Data ◽

Real Data ◽

Kernel Functions ◽

Support Vector ◽

Classification Problems ◽

Rare Class ◽

Data Adaptive

Imbalanced data exist in many classification problems. The classification of imbalanced data has remarkable challenges in machine learning. The support vector machine (SVM) and its variants are popularly used in machine learning among different classifiers thanks to their flexibility and interpretability. However, the performance of SVMs is impacted when the data are imbalanced, which is a typical data structure in the multi-category classification problem. In this paper, we employ the data-adaptive SVM with scaled kernel functions to classify instances for a multi-class population. We propose a multi-class data-dependent kernel function for the SVM by considering class imbalance and the spatial association among instances so that the classification accuracy is enhanced. Simulation studies demonstrate the superb performance of the proposed method, and a real multi-class prostate cancer image dataset is employed as an illustration. Not only does the proposed method outperform the competitor methods in terms of the commonly used accuracy measures such as the F-score and G-means, but also successfully detects more than 60% of instances from the rare class in the real data, while the competitors can only detect less than 20% of the rare class instances. The proposed method will benefit other scientific research fields, such as multiple region boundary detection.

Download Full-text

Machine Learning Comparison and Parameter Setting Methods for the Detection of Dump Sites for Construction and Demolition Waste Using the Google Earth Engine

Remote Sensing ◽

10.3390/rs13040787 ◽

2021 ◽

Vol 13 (4) ◽

pp. 787

Author(s):

Lei Zhou ◽

Ting Luo ◽

Mingyi Du ◽

Qiang Chen ◽

Yang Liu ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Google Earth ◽

Construction And Demolition Waste ◽

Parameterization Scheme ◽

Classification Methods ◽

Demolition Waste ◽

Optimal Method ◽

Identification Method ◽

Google Earth Engine

Machine learning has been successfully used for object recognition within images. Due to the complexity of the spectrum and texture of construction and demolition waste (C&DW), it is difficult to construct an automatic identification method for C&DW based on machine learning and remote sensing data sources. Machine learning includes many types of algorithms; however, different algorithms and parameters have different identification effects on C&DW. Exploring the optimal method for automatic remote sensing identification of C&DW is an important approach for the intelligent supervision of C&DW. This study investigates the megacity of Beijing, which is facing high risk of C&DW pollution. To improve the classification accuracy of C&DW, buildings, vegetation, water, and crops were selected as comparative training samples based on the Google Earth Engine (GEE), and Sentinel-2 was used as the data source. Three classification methods of typical machine learning algorithms (classification and regression trees (CART), random forest (RF), and support vector machine (SVM)) were selected to classify the C&DW from remote sensing images. Using empirical methods, the experimental trial method, and the grid search method, the optimal parameterization scheme of the three classification methods was studied to determine the optimal method of remote sensing identification of C&DW based on machine learning. Through accuracy evaluation and ground verification, the overall recognition accuracies of CART, RF, and SVM for C&DW were 73.12%, 98.05%, and 85.62%, respectively, under the optimal parameterization scheme determined in this study. Among these algorithms, RF was a better C&DW identification method than were CART and SVM when the number of decision trees was 50. This study explores the robust machine learning method for automatic remote sensing identification of C&DW and provides a scientific basis for intelligent supervision and resource utilization of C&DW.

Download Full-text