Induced Subgraph Game for Ensemble Selection

Ensemble methodology has proved to be one of the strongest machine learning techniques. In spite of its huge success, most ensemble methods tend to generate unnecessarily large number of classifiers, which entails an increase in memory storage, computational cost, and even a reduction in the generalization performance of the ensemble. Ensemble selection addresses these shortcomings by searching for a fraction of individual classifiers that performs as good as, or better than the entire ensemble. In this paper, we formulate ensemble selection problem as a coalitional game played on a graph. The proposed game aims at capturing two crucial concepts that affect the performance of an ensemble: accuracy and diversity. Most importantly, it ranks every classifier based on its contribution in keeping a proper balance between these two notions using Shapley value. To demonstrate the validity and the effectiveness of the proposed approach, we carried out experimental comparisons with some major selection techniques based on 35 UCI benchmark datasets. The results reveal that our approach significantly improves the original ensemble and performs better than the other methods in terms of classification accuracy, pruning ratio, and computational cost.

Download Full-text

Evaluation of Ensemble Machines in Breast Cancer Prediction

Intelligent Systems and Computer Technology - Advances in Parallel Computing ◽

10.3233/apc200173 ◽

2020 ◽

Author(s):

LeenaNesamani S ◽

NirmalaSugirthaRajini S

Keyword(s):

Breast Cancer ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Prediction System ◽

Weighted Voting ◽

Cancer Prediction ◽

Learning Techniques ◽

Deadly Disease ◽

The Individual ◽

Better Than

Breast cancer is one of the most deadly diseases encountered among women for which the cause is not clearly defined yet. Early diagnosis may help the physicians in the treatment of this deadly disease which could turn out fatal otherwise. Machine Learning techniques are employed in the process of detecting breast cancer with greater accuracy. Individual classifiers employed in this process, predicted the disease with less accuracy when compared with ensemble models. Ensemble methods employ a group of classifiers to individually classify the data. It then combines the result of the individual classifiers using weighted voting of their predictions. Ensemble machines perform better than individual models and show improved levels in the accuracy of the prediction system. This paper examines and evaluates different ensemble machines that are used in the prediction of breast cancer and tries to identify the combinations that prove to be better than the existing ones.

Download Full-text

Multirate Processing with Selective Subbands and Machine Learning for Efficient Arrhythmia Classification

Sensors ◽

10.3390/s21041511 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1511

Author(s):

Saeed Mian Qaisar ◽

Alaeddine Mihoub ◽

Moez Krichen ◽

Humaira Nisar

Keyword(s):

Machine Learning ◽

Signal Reconstruction ◽

Computational Cost ◽

Machine Learning Techniques ◽

Features Selection ◽

Frequency Content ◽

Fixed Rate ◽

Ecg Signals ◽

Learning Techniques ◽

Multirate Processing

The usage of wearable gadgets is growing in the cloud-based health monitoring systems. The signal compression, computational and power efficiencies play an imperative part in this scenario. In this context, we propose an efficient method for the diagnosis of cardiovascular diseases based on electrocardiogram (ECG) signals. The method combines multirate processing, wavelet decomposition and frequency content-based subband coefficient selection and machine learning techniques. Multirate processing and features selection is used to reduce the amount of information processed thus reducing the computational complexity of the proposed system relative to the equivalent fixed-rate solutions. Frequency content-dependent subband coefficient selection enhances the compression gain and reduces the transmission activity and computational cost of the post cloud-based classification. We have used MIT-BIH dataset for our experiments. To avoid overfitting and biasness, the performance of considered classifiers is studied by using five-fold cross validation (5CV) and a novel proposed partial blind protocol. The designed method achieves more than 12-fold computational gain while assuring an appropriate signal reconstruction. The compression gain is 13 times compared to fixed-rate counterparts and the highest classification accuracies are 97.06% and 92.08% for the 5CV and partial blind cases, respectively. Results suggest the feasibility of detecting cardiac arrhythmias using the proposed approach.

Download Full-text

Evolutionary Algorithm for Improving Decision Tree with Global Discretization in Manufacturing

Sensors ◽

10.3390/s21082849 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2849

Author(s):

Sungbum Jun

Keyword(s):

Decision Tree ◽

Evolutionary Algorithm ◽

Decision Trees ◽

Manufacturing Systems ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Industrial Internet ◽

Tree Models ◽

Real World Datasets

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.

Download Full-text

Detection of Botnet Based Attacks on Network

Handbook of Research on Network Forensics and Analysis Techniques - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-4100-4.ch007 ◽

2018 ◽

pp. 101-116

Author(s):

Prachi

Keyword(s):

Large Scale ◽

Flow Analysis ◽

Traffic Analysis ◽

High Accuracy ◽

Machine Learning Techniques ◽

Botnet Detection ◽

Learning Techniques ◽

Proposed Model ◽

Benchmark Datasets ◽

Traffic Flow Analysis

This chapter describes how with Botnets becoming more and more the leading cyber threat on the web nowadays, they also serve as the key platform for carrying out large-scale distributed attacks. Although a substantial amount of research in the fields of botnet detection and analysis, bot-masters inculcate new techniques to make them more sophisticated, destructive and hard to detect with the help of code encryption and obfuscation. This chapter proposes a new model to detect botnet behavior on the basis of traffic analysis and machine learning techniques. Traffic analysis behavior does not depend upon payload analysis so the proposed technique is immune to code encryption and other evasion techniques generally used by bot-masters. This chapter analyzes the benchmark datasets as well as real-time generated traffic to determine the feasibility of botnet detection using traffic flow analysis. Experimental results clearly indicate that a proposed model is able to classify the network traffic as a botnet or as normal traffic with a high accuracy and low false-positive rates.

Download Full-text

Intelligent Neural Network Schemes for Multi-Class Classification

Applied Sciences ◽

10.3390/app9194036 ◽

2019 ◽

Vol 9 (19) ◽

pp. 4036 ◽

Cited By ~ 1

Author(s):

You ◽

Wu ◽

Lee ◽

Liu

Keyword(s):

Neural Network ◽

Clustering Algorithm ◽

Classification Problem ◽

Machine Learning Techniques ◽

Training Dataset ◽

Reduction Techniques ◽

Learning Techniques ◽

Benchmark Datasets ◽

Dimensionality Reduction Techniques ◽

Multi Class Classification

Multi-class classification is a very important technique in engineering applications, e.g., mechanical systems, mechanics and design innovations, applied materials in nanotechnologies, etc. A large amount of research is done for single-label classification where objects are associated with a single category. However, in many application domains, an object can belong to two or more categories, and multi-label classification is needed. Traditionally, statistical methods were used; recently, machine learning techniques, in particular neural networks, have been proposed to solve the multi-class classification problem. In this paper, we develop radial basis function (RBF)-based neural network schemes for single-label and multi-label classification, respectively. The number of hidden nodes and the parameters involved with the basis functions are determined automatically by applying an iterative self-constructing clustering algorithm to the given training dataset, and biases and weights are derived optimally by least squares. Dimensionality reduction techniques are adopted and integrated to help reduce the overfitting problem associated with the RBF networks. Experimental results from benchmark datasets are presented to show the effectiveness of the proposed schemes.

Download Full-text

Ensemble Learning to Improve the Prediction of Fetal Macrosomia and Large-for-Gestational Age

Journal of Clinical Medicine ◽

10.3390/jcm9020380 ◽

2020 ◽

Vol 9 (2) ◽

pp. 380 ◽

Cited By ~ 1

Author(s):

Shangyuan Ye ◽

Hui Zhang ◽

Fuyan Shi ◽

Jing Guo ◽

Suzhen Wang ◽

...

Keyword(s):

Ensemble Learning ◽

Gestational Age ◽

Prediction Accuracy ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Fetal Macrosomia ◽

Large For Gestational Age ◽

Empirical Formulas ◽

Learning Techniques ◽

The Individual

Background: The objective of this study was to investigate the use of ensemble methods to improve the prediction of fetal macrosomia and large for gestational age from prenatal ultrasound imaging measurements. Methods: We evaluated and compared the prediction accuracies of nonlinear and quadratic mixed-effects models coupled with 26 different empirical formulas for estimating fetal weights in predicting large fetuses at birth. The data for the investigation were taken from the Successive Small-for-Gestational-Age-Births study. Ensemble methods, a class of machine learning techniques, were used to improve the prediction accuracies by combining the individual models and empirical formulas. Results: The prediction accuracy of individual statistical models and empirical formulas varied considerably in predicting macrosomia but varied less in predicting large for gestational age. Two ensemble methods, voting and stacking, with model selection, can combine the strengths of individual models and formulas and can improve the prediction accuracy. Conclusions: Ensemble learning can improve the prediction of fetal macrosomia and large for gestational age and have the potential to assist obstetricians in clinical decisions.

Download Full-text

Using Semi-Supervised Learning in Cellular Automata for Edge Detection of Image Segmentation

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v7i2.16 ◽

2018 ◽

Vol 7 (02) ◽

pp. 23613-23619

Author(s):

Draiya A. Alaswad ◽

Yasser F. Hassan

Keyword(s):

Image Segmentation ◽

Cellular Automata ◽

Edge Detection ◽

Supervised Learning ◽

Unlabeled Data ◽

Machine Learning Techniques ◽

Natural Image ◽

Clustering Method ◽

Learning Techniques ◽

Better Than

Semi-Supervised Learning is an area of increasing importance in Machine Learning techniques that make use of both labeled and unlabeled data. The goal of using both labeled and unlabeled data is to build better learners instead of using each one alone. Semi-supervised learning investigates how to use the information of both labeled and unlabeled examples to perform better than supervised learning. In this paper we present a new method for edge detection of image segmentation using cellular automata with modification for game of life rules and K-means algorithm. We use the semi-supervised clustering method, which can jointly learn to fusion by making use of the unlabeled data. The learning aim consists in distinguishing between edge and no edge for each pixel in image. We have applied the semi-supervised method for finding edge detection in natural image and measured its performance using the Berkeley Segmentation Dataset and Benchmark dataset. The results and experiments showed the accuracy and efficiency of the proposed method.

Download Full-text

Identification of Plant Leaf Disease using Machine Learning Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5621.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 6077-6081 ◽

Cited By ~ 1

Keyword(s):

Plant Disease ◽

Digital Camera ◽

Machine Learning Techniques ◽

Support Vector ◽

Leaf Sample ◽

Learning Techniques ◽

Artificial Neural Network Ann ◽

Identification Techniques ◽

Processing Techniques ◽

Better Than

Plant disease identification and classification is major area of research as majority of people in India depend on agriculture for their main source of income and for food. Identification of the diseases in any crops is challenging since manual identification techniques being used in this are based on the experts advises which may not be efficient. Based on leaf features decisions about variety of diseases are taken. In this paper an automated framework is introduced which can be used to detect and classify the diseases in the leaf accurately. Leaf images are acquired by using digital camera. Pre-processing techniques, segmentation and feature extraction are performed on the acquired images. The features are passed on to the classifiers to classify the diseases. This work has been proposed to classify and distinguish the leaf sample based on its features. The proposed work is carried out with Artificial Neural Network (ANN), Support Vector Machine (SVM) and Naive Bayes classifiers to analyze the result. For given dataset ANN performed better than the other two classifiers

Download Full-text

Iterative machine learning applied to annotation of text datasets

10.5753/eniac.2021.18268 ◽

2021 ◽

Author(s):

Thiago Abdo ◽

Fabiano Silva

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Algorithms ◽

Computational Cost ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Learning Techniques ◽

The Creation ◽

The Impact ◽

High Computational Cost

The purpose of this paper is to analyze the use of different machine learning approaches and algorithms to be integrated as an automated assistance on a tool to aid the creation of new annotated datasets. We evaluate how they scale in an environment without dedicated machine learning hardware. In particular, we study the impact over a dataset with few examples and one that is being constructed. We experiment using deep learning algorithms (Bert) and classical learning algorithms with a lower computational cost (W2V and Glove combined with RF and SVM). Our experiments show that deep learning algorithms have a performance advantage over classical techniques. However, deep learning algorithms have a high computational cost, making them inadequate to an environment with reduced hardware resources. Simulations using Active and Iterative machine learning techniques to assist the creation of new datasets are conducted. For these simulations, we use the classical learning algorithms because of their computational cost. The knowledge gathered with our experimental evaluation aims to support the creation of a tool for building new text datasets.

Download Full-text

Application of Machine Learning Algorithms in Stock Market Prediction

Handbook of Research on Smart Technology Models for Business and Industry - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-3645-2.ch007 ◽

2020 ◽

pp. 153-180

Author(s):

Sumit Kumar ◽

Sanlap Acharya

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Stock Prices ◽

Stock Price ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Better Than

The prediction of stock prices has always been a very challenging problem for investors. Using machine learning techniques to predict stock prices is also one of the favourite topics for academics working in this domain. This chapter discusses five supervised learning techniques and two unsupervised learning techniques to solve the problem of stock price prediction and has compared the performances of all the algorithms. Among the supervised learning techniques, Long Short-Term Memory (LSTM) algorithm performed better than the others whereas, among the unsupervised learning techniques, Restricted Boltzmann Machine (RBM) performed better. RBM is found to be performing even better than LSTM.

Download Full-text