Hybrid Ensemble Learning Methods for Classification of Microarray Data

2020 ◽  
pp. 707-725
Author(s):  
Sujata Dash

Efficient classification and feature extraction techniques pave an effective way for diagnosing cancers from microarray datasets. It has been observed that the conventional classification techniques have major limitations in discriminating the genes accurately. However, such kind of problems can be addressed by an ensemble technique to a great extent. In this paper, a hybrid RotBagg ensemble framework has been proposed to address the problem specified above. This technique is an integration of Rotation Forest and Bagging ensemble which in turn preserves the basic characteristics of ensemble architecture i.e., diversity and accuracy. Three different feature selection techniques are employed to select subsets of genes to improve the effectiveness and generalization of the RotBagg ensemble. The efficiency is validated through five microarray datasets and also compared with the results of base learners. The experimental results show that the correlation based FRFR with PCA-based RotBagg ensemble form a highly efficient classification model.

Author(s):  
Sujata Dash

Efficient classification and feature extraction techniques pave an effective way for diagnosing cancers from microarray datasets. It has been observed that the conventional classification techniques have major limitations in discriminating the genes accurately. However, such kind of problems can be addressed by an ensemble technique to a great extent. In this paper, a hybrid RotBagg ensemble framework has been proposed to address the problem specified above. This technique is an integration of Rotation Forest and Bagging ensemble which in turn preserves the basic characteristics of ensemble architecture i.e., diversity and accuracy. Three different feature selection techniques are employed to select subsets of genes to improve the effectiveness and generalization of the RotBagg ensemble. The efficiency is validated through five microarray datasets and also compared with the results of base learners. The experimental results show that the correlation based FRFR with PCA-based RotBagg ensemble form a highly efficient classification model.


2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Zena M. Hira ◽  
Duncan F. Gillies

We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources.


Healthcare ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 884
Author(s):  
Antonio García-Domínguez ◽  
Carlos E. Galván-Tejada ◽  
Ramón F. Brena ◽  
Antonio A. Aguileta ◽  
Jorge I. Galván-Tejada ◽  
...  

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.


Author(s):  
VLADIMIR NIKULIN ◽  
TIAN-HSIANG HUANG ◽  
GEOFFREY J. MCLACHLAN

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.


2021 ◽  
Vol 5 (3) ◽  
pp. 527-533
Author(s):  
Yoga Religia ◽  
Amali Amali

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.


Lung cancer is a serious illness which leads to increased mortality rate globally. The identification of lung cancer at the beginning stage is the probable method of improving the survival rate of the patients. Generally, Computed Tomography (CT) scan is applied for finding the location of the tumor and determines the stage of cancer. Existing works has presented an effective diagnosis classification model for CT lung images. This paper designs an effective diagnosis and classification model for CT lung images. The presented model involves different stages namely pre-processing, segmentation, feature extraction and classification. The initial stage includes an adaptive histogram based equalization (AHE) model for image enhancement and bilateral filtering (BF) model for noise removal. The pre-processed images are fed into the second stage of watershed segmentation model for effectively segment the images. Then, a deep learning based Xception model is applied for prominent feature extraction and the classification takes place by the use of logistic regression (LR) classifier. A comprehensive simulation is carried out to ensure the effective classification of the lung CT images using a benchmark dataset. The outcome implied the outstanding performance of the presented model on the applied test images.


Sensor Review ◽  
2019 ◽  
Vol 39 (1) ◽  
pp. 107-120 ◽  
Author(s):  
Deepika Kishor Nagthane ◽  
Archana M. Rajurkar

PurposeOne of the main reasons for increase in mortality rate in woman is breast cancer. Accurate early detection of breast cancer seems to be the only solution for diagnosis. In the field of breast cancer research, many new computer-aided diagnosis systems have been developed to reduce the diagnostic test false positives because of the subtle appearance of breast cancer tissues. The purpose of this study is to develop the diagnosis technique for breast cancer using LCFS and TreeHiCARe classifier model.Design/methodology/approachThe proposed diagnosis methodology initiates with the pre-processing procedure. Subsequently, feature extraction is performed. In feature extraction, the image features which preserve the characteristics of the breast tissues are extracted. Consequently, feature selection is performed by the proposed least-mean-square (LMS)-Cuckoo search feature selection (LCFS) algorithm. The feature selection from the vast range of the features extracted from the images is performed with the help of the optimal cut point provided by the LCS algorithm. Then, the image transaction database table is developed using the keywords of the training images and feature vectors. The transaction resembles the itemset and the association rules are generated from the transaction representation based ona priorialgorithm with high conviction ratio and lift. After association rule generation, the proposed TreeHiCARe classifier model emanates in the diagnosis methodology. In TreeHICARe classifier, a new feature index is developed for the selection of a central feature for the decision tree centered on which the classification of images into normal or abnormal is performed.FindingsThe performance of the proposed method is validated over existing works using accuracy, sensitivity and specificity measures. The experimentation of proposed method on Mammographic Image Analysis Society database resulted in classification of normal and abnormal cancerous mammogram images with an accuracy of 0.8289, sensitivity of 0.9333 and specificity of 0.7273.Originality/valueThis paper proposes a new approach for the breast cancer diagnosis system by using mammogram images. The proposed method uses two new algorithms: LCFS and TreeHiCARe. LCFS is used to select optimal feature split points, and TreeHiCARe is the decision tree classifier model based on association rule agreements.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4749
Author(s):  
Shaorong Zhang ◽  
Zhibin Zhu ◽  
Benxin Zhang ◽  
Bao Feng ◽  
Tianyou Yu ◽  
...  

The common spatial pattern (CSP) is a very effective feature extraction method in motor imagery based brain computer interface (BCI), but its performance depends on the selection of the optimal frequency band. Although a lot of research works have been proposed to improve CSP, most of these works have the problems of large computation costs and long feature extraction time. To this end, three new feature extraction methods based on CSP and a new feature selection method based on non-convex log regularization are proposed in this paper. Firstly, EEG signals are spatially filtered by CSP, and then three new feature extraction methods are proposed. We called them CSP-wavelet, CSP-WPD and CSP-FB, respectively. For CSP-Wavelet and CSP-WPD, the discrete wavelet transform (DWT) or wavelet packet decomposition (WPD) is used to decompose the spatially filtered signals, and then the energy and standard deviation of the wavelet coefficients are extracted as features. For CSP-FB, the spatially filtered signals are filtered into multiple bands by a filter bank (FB), and then the logarithm of variances of each band are extracted as features. Secondly, a sparse optimization method regularized with a non-convex log function is proposed for the feature selection, which we called LOG, and an optimization algorithm for LOG is given. Finally, ensemble learning is used for secondary feature selection and classification model construction. Combing feature extraction and feature selection methods, a total of three new EEG decoding methods are obtained, namely CSP-Wavelet+LOG, CSP-WPD+LOG, and CSP-FB+LOG. Four public motor imagery datasets are used to verify the performance of the proposed methods. Compared to existing methods, the proposed methods achieved the highest average classification accuracy of 88.86, 83.40, 81.53, and 80.83 in datasets 1–4, respectively. The feature extraction time of CSP-FB is the shortest. The experimental results show that the proposed methods can effectively improve the classification accuracy and reduce the feature extraction time. With comprehensive consideration of classification accuracy and feature extraction time, CSP-FB+LOG has the best performance and can be used for the real-time BCI system.


Sign in / Sign up

Export Citation Format

Share Document