Correlation-Based Ensemble Feature Selection Using Bioinspired Algorithms and Classification Using Backpropagation Neural Network

A framework for clinical diagnosis which uses bioinspired algorithms for feature selection and gradient descendant backpropagation neural network for classification has been designed and implemented. The clinical data are subjected to data preprocessing, feature selection, and classification. Hot deck imputation has been used for handling missing values and min-max normalization is used for data transformation. Wrapper approach that employs bioinspired algorithms, namely, Differential Evolution, Lion Optimization, and Glowworm Swarm Optimization with accuracy of AdaBoostSVM classifier as fitness function has been used for feature selection. Each bioinspired algorithm selects a subset of features yielding three feature subsets. Correlation-based ensemble feature selection is performed to select the optimal features from the three feature subsets. The optimal features selected through correlation-based ensemble feature selection are used to train a gradient descendant backpropagation neural network. Ten-fold cross-validation technique has been used to train and test the performance of the classifier. Hepatitis dataset and Wisconsin Diagnostic Breast Cancer (WDBC) dataset from University of California Irvine (UCI) Machine Learning repository have been used to evaluate the classification accuracy. An accuracy of 98.47% is obtained for Wisconsin Diagnostic Breast Cancer dataset, and 95.51% is obtained for Hepatitis dataset. The proposed framework can be tailored to develop clinical decision-making systems for any health disorders to assist physicians in clinical diagnosis.

Download Full-text

Genetic Algorithm-Based Feature Selection and Optimization of Backpropagation Neural Network Parameters for Classification of Breast Cancer Using MicroRNA Profiles

2019 3rd International Conference on Informatics and Computational Sciences (ICICoS) ◽

10.1109/icicos48119.2019.8982530 ◽

2019 ◽

Cited By ~ 1

Author(s):

Amazona Adorada ◽

Adi Wibowo

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Genetic Algorithm ◽

Feature Selection ◽

Backpropagation Neural Network ◽

Network Parameters

Download Full-text

Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network

Computational and Mathematical Methods in Medicine ◽

10.1155/2015/460189 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 41

Author(s):

Kindie Biredagn Nahato ◽

Khanna Nehemiah Harichandran ◽

Kannan Arputharaj

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Heart Disease ◽

Missing Values ◽

Classification Model ◽

Backpropagation Neural Network ◽

Data Set ◽

Knowledge Mining ◽

Clinical Dataset ◽

Two Stages

The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.

Download Full-text

Optimization of Neural Network using Nelder Mead in Breast Cancer Classification

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2020.1231.29 ◽

2020 ◽

Vol 13 (6) ◽

pp. 330-337

Author(s):

Edi Kusuma ◽

◽

Guruh Shidik ◽

Ricardus Pramunendar ◽

◽

...

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Statistical Techniques ◽

Breast Cancer Dataset ◽

Backpropagation Neural Network ◽

Cancer Dataset ◽

Average Performance ◽

Classification Technique ◽

Average Accuracy ◽

Hidden Layer

Classification is one of the data mining techniques which considered as supervised learning. Classification technique such as Backpropagation Neural Network (BPNN) has been utilized in several fields to increase human productivity. BPNN can give better results (more natural) compared with other statistical techniques. However, the learning process of BPNN could give an inefficient synapse weight of each hidden layer. This ineffective weight can affect the performance of the network. In this research, BPNN optimization using Nelder Mead to identifying the appearance of breast cancer is proposed. The datasets used are Breast Cancer Coimbra Dataset (BCCD), and Wisconsin Breast Cancer Dataset (WBCD). The testing result using accuracy and k-fold validation presents better performance compared with the original BPNN. Best average performance can be seen in the fifth fold of BCCD with 76.5217% of accuracy. Moreover, the highest average result of WBCD presented in the fourth fold with 91.1765% of average accuracy.

Download Full-text

Detection of breast cancer using the infinite feature selection with genetic algorithm and deep neural network

Distributed and Parallel Databases ◽

10.1007/s10619-021-07355-w ◽

2021 ◽

Author(s):

S. S. Ittannavar ◽

R. H. Havaldar

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Genetic Algorithm ◽

Feature Selection ◽

Deep Neural Network

Download Full-text

Integrating ensemble systems biology feature selection and bimodal deep neural network for breast cancer prognosis prediction

Scientific Reports ◽

10.1038/s41598-021-92864-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Li-Hsin Cheng ◽

Te-Cheng Hsu ◽

Che Lin

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Feature Selection ◽

Systems Biology ◽

Ensemble Learning ◽

Microarray Data ◽

Deep Neural Network ◽

Prediction Models ◽

Biological Knowledge ◽

Prognosis Prediction

AbstractBreast cancer is a heterogeneous disease. To guide proper treatment decisions for each patient, robust prognostic biomarkers, which allow reliable prognosis prediction, are necessary. Gene feature selection based on microarray data is an approach to discover potential biomarkers systematically. However, standard pure-statistical feature selection approaches often fail to incorporate prior biological knowledge and select genes that lack biological insights. Besides, due to the high dimensionality and low sample size properties of microarray data, selecting robust gene features is an intrinsically challenging problem. We hence combined systems biology feature selection with ensemble learning in this study, aiming to select genes with biological insights and robust prognostic predictive power. Moreover, to capture breast cancer's complex molecular processes, we adopted a multi-gene approach to predict the prognosis status using deep learning classifiers. We found that all ensemble approaches could improve feature selection robustness, wherein the hybrid ensemble approach led to the most robust result. Among all prognosis prediction models, the bimodal deep neural network (DNN) achieved the highest test performance, further verified by survival analysis. In summary, this study demonstrated the potential of combining ensemble learning and bimodal DNN in guiding precision medicine.

Download Full-text

Design of novel multi filter union feature selection framework for breast cancer dataset

Concurrent Engineering ◽

10.1177/1063293x211016046 ◽

2021 ◽

pp. 1063293X2110160

Author(s):

Dinesh Morkonda Gunasekaran ◽

Prabha Dhandayudam

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Care Center ◽

Feature Selection Method ◽

Selection Method ◽

Cancer Center ◽

Breast Cancer Dataset ◽

Data Set ◽

Health Care Center ◽

Cancer Data

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.

Download Full-text

Identification of Bio-Markers for Breast Cancer Detection through Data Mining Methods

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1141.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 763-769

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Logistic Regression ◽

Feature Selection ◽

Discriminant Analysis ◽

Classification Tree ◽

Partial Least Square ◽

Diagnostic Methods ◽

Support Vector ◽

Breast Cancer Dataset

Worldwide, breast cancer is the leading type of cancer in women accounting for 25% of all cases. Survival rates in the developed countries are comparatively higher with that of developing countries. This had led to the importance of computer aided diagnostic methods for early detection of breast cancer disease. This eventually reduces the death rate. This paper intents the scope of the biomarker that can be used to predict the breast cancer from the anthropometric data. This experimental study aims at computing and comparing various classification models (Binary Logistic Regression, Ball Vector Machine (BVM), C4.5, Partial Least Square (PLS) for Classification, Classification Tree, Cost sensitive Classification Tree, Cost sensitive Decision Tree, Support Vector Machine for Classification, Core Vector Machine, ID3, K-Nearest Neighbor, Linear Discriminant Analysis (LDA), Log-Reg TRIRLS, Multi Layer Perceptron (MLP), Multinomial Logistic Regression (MLR), Naïve Bayes (NB), PLS for Discriminant Analysis, PLS for LDA, Random Tree (RT), Support Vector Machine SVM) for the UCI Coimbra breast cancer dataset. The feature selection algorithms (Backward Logit, Fisher Filtering, Forward Logit, ReleifF, Step disc) are worked out to find out the minimum attributes that can achieve a better accuracy. To ascertain the accuracy results, the Jack-knife cross validation method for the algorithms is conducted and validated. The Core vector machine classification algorithm outperforms the other nineteen algorithms with an accuracy of 82.76%, sensitivity of 76.92% and specificity of 87.50% for the selected three attributes, Age, Glucose and Resistin using ReleifF feature selection algorithm.

Download Full-text

Backpropagation neural network for processing of missing data in breast cancer detection

IRBM ◽

10.1016/j.irbm.2021.06.010 ◽

2021 ◽

Author(s):

L. Zhang ◽

Hongyan Cui ◽

Bingqing Liu ◽

Chao Zhang ◽

Berthold K.P. Horn

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Missing Data ◽

Cancer Detection ◽

Breast Cancer Detection ◽

Backpropagation Neural Network

Download Full-text

HIOC: a hybrid imputation method to predict missing values in medical datasets

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-03-2021-0042 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Pooja Rani ◽

Rajneesh Kumar ◽

Anurag Jain

Keyword(s):

Breast Cancer ◽

Heart Disease ◽

Missing Values ◽

Imputation Method ◽

Support Vector ◽

Correct Prediction ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Content Type ◽

Imputation Methods

PurposeDecision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.Design/methodology/approachThe proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.FindingsThe results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.Originality/valueThe proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.

Download Full-text

Efficient feature selection using one-pass generalized classifier neural network and binary bat algorithm with a novel fitness function

Soft Computing ◽

10.1007/s00500-019-04218-6 ◽

2019 ◽

Vol 24 (6) ◽

pp. 4575-4587 ◽

Cited By ~ 2

Author(s):

Akshata K. Naik ◽

Venkatanareshbabu Kuppili ◽

Damodar Reddy Edla

Keyword(s):

Neural Network ◽

Feature Selection ◽

Fitness Function ◽

Bat Algorithm

Download Full-text