scholarly journals Creating Ensemble Classifiers with Information Entropy Diversity Measure

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jiangbo Zou ◽  
Xiaokang Fu ◽  
Lingling Guo ◽  
Chunhua Ju ◽  
Jingjing Chen

Ensemble classifiers improve the classification accuracy by incorporating the decisions made by its component classifiers. Basically, there are two steps to create an ensemble classifier: one is to generate base classifiers and the other is to align the base classifiers to achieve maximum accuracy integrally. One of the major problems in creating ensemble classifiers is the classification accuracy and diversity of the component classifiers. In this paper, we propose an ensemble classifier generating algorithm to improve the accuracy of an ensemble classification and to maximize the diversity of its component classifiers. In this algorithm, information entropy is introduced to measure the diversity of component classifiers, and a cyclic iterative optimization selection tactic is applied to select component classifiers from base classifiers, in which the number of component classifiers is dynamically adjusted to minimize system cost. It is demonstrated that our method has an obvious lower memory cost with higher classification accuracy compared with existing classifier methods.

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Jyoti Godara ◽  
Isha Batra ◽  
Rajni Aron ◽  
Mohammad Shabaz

Cognitive science is a technology which focuses on analyzing the human brain using the application of DM. The databases are utilized to gather and store the large volume of data. The authenticated information is extracted using measures. This research work is based on detecting the sarcasm from the text data. This research work introduces a scheme to detect sarcasm based on PCA algorithm, K -means algorithm, and ensemble classification. The four ensemble classifiers are designed with the objective of detecting the sarcasm. The first ensemble classification algorithm (SKD) is the combination of SVM, KNN, and decision tree. In the second ensemble classifier (SLD), SVM, logistic regression, and decision tree classifiers are combined for the sarcasm detection. In the third ensemble model (MLD), MLP, logistic regression, and decision tree are combined, and the last one (SLM) is the combination of MLP, logistic regression, and SVM. The proposed model is implemented in Python and tested on five datasets of different sizes. The performance of the models is tested with regard to various metrics.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Nasrin Ostvar ◽  
Amir Masoud Eftekhari Moghadam

In recent years, ensemble classification methods have been widely investigated in both industry and literature in the field of machine learning and artificial intelligence. The main advantage of this approach is to benefit from a set of classifiers instead of using a single classifier with the aim of improving the prediction performance, such as accuracy. Selecting the base classifiers and the method for combining them are the most challenging issues in the ensemble classifiers. In this paper, we propose a heterogeneous dynamic ensemble classifier (HDEC) which uses multiple classification algorithms. The main advantage of using heterogeneous algorithms is increasing the diversity among the base classifiers as it is a key point for an ensemble system to be successful. In this method, we first train many classifiers with the original data. Then, they are separated based on their strength in recognizing either positive or negative instances. For doing this, we consider the true positive rate and true negative rate, respectively. In the next step, the classifiers are categorized into two groups according to their efficiency in the mentioned measures. Finally, the outputs of the two groups are compared with each other to generate the final prediction. For evaluating the proposed approach, it has been applied to 12 datasets from the UCI and LIBSVM repositories and calculated two popular prediction performance metrics, including accuracy and geometric mean. The experimental results show the superiority of the proposed approach in comparison to other state-of-the-art methods.


2020 ◽  
Vol 13 (2) ◽  
pp. 37 ◽  
Author(s):  
Tomasz Pisula

This publication presents the methodological aspects of designing of a scoring model for an early prediction of bankruptcy by using ensemble classifiers. The main goal of the research was to develop a scoring model (with good classification properties) that can be applied in practice to assess the risk of bankruptcy of enterprises in various sectors. For the data sample, which included 1739 Polish businesses (of which 865 were bankrupt and 875 had no risk of bankruptcy), a genetic algorithm was applied to select the optimum set of 19 bankruptcy indicators, on the basis of which the classification accuracy of a number of ensemble classifier model variants (boosting, bagging and stacking) was estimated and verified. The classification effectiveness of ensemble models was compared with eight classical individual models which made use of single classifiers. A GBM-based ensemble classifier model offering superior classification capabilities was used in practice to design a scoring model, which was applied in comparative evaluation and bankruptcy risk analysis for businesses from various sectors and of different sizes from the Podkarpackie Voivodeship in 2018 (over a time horizon of up to two years). The approach applied can also be used to assess credit risk for corporate borrowers.


2020 ◽  
Author(s):  
H.M.Fazlul Haque ◽  
Fariha Arifin ◽  
Sheikh Adilina ◽  
Muhammod Rafsanjani ◽  
Swakkhar Shatabda

AbstractThe information of a cell is primarily contained in Deoxyribonucleic Acid (DNA). There is a flow of information of DNA to protein sequences via Ribonucleic acids (RNA) through transcription and translation. These entities are vital for the genetic process. Recent developments in epigenetic also show the importance of the genetic material and knowledge of their attributes and functions. However, the growth in known attributes or functionalities of these entities are still in slow progression due to the time consuming and expensive in vitro experimental methods. In this paper, we have proposed an ensemble classification algorithm called SubFeat to predict the functionalities of biological entities from different types of datasets. Our model uses a feature subspace based novel ensemble method. It divides the feature space into sub-spaces which are then passed to learn individual classifier models and the ensemble is built on this base classifiers that uses a weighted majority voting mechanism. SubFeat tested on four datasets comprising two DNA, one RNA and one protein dataset and it outperformed all the existing single classifiers and as well as the ensemble classifiers. SubFeat is made availalbe as a Python-based tool. We have made the package SubFeat available online along with a user manual. It is freely accessible from here: https://github.com/fazlulhaquejony/SubFeat.


Day to Day the amount of data was increasing rapidly. Due to analyzing the huge amount of data various technologies are also introduced. Traditional data mining approaches can be used to perform data analysis through classification algorithms. In data mining a single classifier can be used to perform data analysis. Sometimes, multiple or combined classifier can also be used to perform data analysis. But, the performance of ensemble classifier is better than single classifier. Based on improved accuracy the various number of ensemble classifiers are introduced. Now, this paper can reviews various ensemble classifiers based on their accuracy.


2019 ◽  
Vol 8 (3) ◽  
pp. 3686-3694

Tumour detection medical applications utilize classification techniques to categorize malicious and nonmalicious tumour features to provide an efficient medical diagnosis of the human individual under investigation. One way to enable efficient classification, Feature extraction methods are used to eliminate the redundant features and obtain the most relevant features. However, the challenges concerning the dimension and quantum of tumour dataset persist. Toward this goal, this paper aims to maximize the malicious tumour classification accuracy using two reliable ensemble classifiers namely Bootstrap Aggregation and k-nearest neighbour. Tumour features extracted by Aggregate Linear Discriminate Analysis (LDA) and the feature distance is calculated with iterative scattering matrix algorithm. The extracted features are further refined by aggregation to select most effective feature values. After this, an ensemble classifier technique is employed to construct malicious and non-malicious tumour classes. The tumour classification based on an ensemble of bagging and knearest neighbour. Simulation is carried out on Tumour Repository data set to show that proposed ensemble classifiers have considerably better tumour detection accuracy than existing conventional techniques. Numerical performance evaluations show that 8% improvement by proposed method in tumour classification accuracy for malicious tumour detection in human individuals.


2018 ◽  
Vol 7 (1) ◽  
pp. 57-72
Author(s):  
H.P. Vinutha ◽  
Poornima Basavaraju

Day by day network security is becoming more challenging task. Intrusion detection systems (IDSs) are one of the methods used to monitor the network activities. Data mining algorithms play a major role in the field of IDS. NSL-KDD'99 dataset is used to study the network traffic pattern which helps us to identify possible attacks takes place on the network. The dataset contains 41 attributes and one class attribute categorized as normal, DoS, Probe, R2L and U2R. In proposed methodology, it is necessary to reduce the false positive rate and improve the detection rate by reducing the dimensionality of the dataset, use of all 41 attributes in detection technology is not good practices. Four different feature selection methods like Chi-Square, SU, Gain Ratio and Information Gain feature are used to evaluate the attributes and unimportant features are removed to reduce the dimension of the data. Ensemble classification techniques like Boosting, Bagging, Stacking and Voting are used to observe the detection rate separately with three base algorithms called Decision stump, J48 and Random forest.


Author(s):  
Antonio Giovannetti ◽  
Gianluca Susi ◽  
Paola Casti ◽  
Arianna Mencattini ◽  
Sandra Pusil ◽  
...  

AbstractIn this paper, we present the novel Deep-MEG approach in which image-based representations of magnetoencephalography (MEG) data are combined with ensemble classifiers based on deep convolutional neural networks. For the scope of predicting the early signs of Alzheimer’s disease (AD), functional connectivity (FC) measures between the brain bio-magnetic signals originated from spatially separated brain regions are used as MEG data representations for the analysis. After stacking the FC indicators relative to different frequency bands into multiple images, a deep transfer learning model is used to extract different sets of deep features and to derive improved classification ensembles. The proposed Deep-MEG architectures were tested on a set of resting-state MEG recordings and their corresponding magnetic resonance imaging scans, from a longitudinal study involving 87 subjects. Accuracy values of 89% and 87% were obtained, respectively, for the early prediction of AD conversion in a sample of 54 mild cognitive impairment subjects and in a sample of 87 subjects, including 33 healthy controls. These results indicate that the proposed Deep-MEG approach is a powerful tool for detecting early alterations in the spectral–temporal connectivity profiles and in their spatial relationships.


Author(s):  
Ahlam Fuad ◽  
Amany bin Gahman ◽  
Rasha Alenezy ◽  
Wed Ateeq ◽  
Hend Al-Khalifa

Plural of paucity is one type of broken plural used in the classical Arabic. It is used when the number of people or objects ranges from three to 10. Based on our evaluation of four current state-of-the-art Arabic morphological analyzers, there is a lack of identification of broken plural words, specifically the plural of paucity. Therefore, this paper presents “[Formula: see text]” Qillah (paucity), a morphological extension that is built on top of other morphological analyzers and uses a hybrid rule-based and lexicon-based approach to enhance the identification of plural of paucity. Two versions of the Qillah were developed, one is based on FARASA morphological analyzer and the other is based on CALIMA Star analyzer, as these are some of the best-performing morphological analyzers. We designed two experiments to evaluate the effectiveness of our proposed solution based on a collection of 402 different Arabic words. The version based on CALIMA Star achieved a maximum accuracy of 93% in identifying the plural-of-paucity words compared to the baselines. It also achieved a maximum accuracy of 98% compared to the baselines in identifying the plurality of the words.


Author(s):  
Snehlata Sewakdas Dongre ◽  
Latesh G. Malik

A data stream is giant amount of data which is generated uncontrollably at a rapid rate from many applications like call detail records, log records, sensors applications etc. Data stream mining has grasped the attention of so many researchers. A rising problem in Data Streams is the handling of concept drift. To be a good algorithm it should adapt the changes and handle the concept drift properly. Ensemble classification method is the group of classifiers which works in collaborative manner. Overall this chapter will cover all the aspects of the data stream classification. The mission of this chapter is to discuss various techniques which use collaborative filtering for the data stream mining. The main concern of this chapter is to make reader familiar with the data stream domain and data stream mining. Instead of single classifier the group of classifiers is used to enhance the accuracy of classification. The collaborative filtering will play important role here how the different classifiers work collaborative within the ensemble to achieve a goal.


Sign in / Sign up

Export Citation Format

Share Document