Multi-Machine Learning Binary Classification, Feature Selection and Comparison Technique for Predicting Death Events Related to Heart Disease

AbstractWe investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension of feature selection methods from a binary classification setting to a multi-class problem. We show that an inherently multi-class approach, multi-class information gain, outperforms ensembles of binary methods. We also investigate how to mitigate the effects of extreme skewing in our data set by making our features more robust and by using review and recipe sampling. We show that over-sampling is the best method for boosting performance on the minority classes, but it also results in a severe drop in overall accuracy of at least 6 per cent points.

Download Full-text

Multiple similarly effective solutions exist for biomedical feature selection and classification problems

Scientific Reports ◽

10.1038/s41598-017-13184-8 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 9

Author(s):

Jiamei Liu ◽

Cheng Xu ◽

Weifeng Yang ◽

Yayun Shu ◽

Weiwei Zheng ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Association Studies ◽

Binary Classification ◽

Learning Algorithms ◽

Optimal Solution ◽

Machine Learning Algorithms ◽

Disease Classification ◽

Genome Wide Association Studies ◽

Classification Problems

Abstract Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.

Download Full-text

AN ANALYSIS ON FEATURE SELECTION METHODS, CLUSTERING AND CLASSIFICATION USED IN HEART DISEASE PREDICTION –A MACHINE LEARNING APPROACH

Journal of Critical Reviews ◽

10.31838/jcr.07.06.27 ◽

2020 ◽

Vol 7 (06) ◽

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Learning Approach ◽

Disease Prediction ◽

Selection Methods ◽

Machine Learning Approach ◽

Clustering And Classification

Download Full-text

Consensus of Feature Selection Methods and Reduced Generalization Gap Model to Improve Diagnosis of Heart Disease

Journal of Scientific Research ◽

10.3329/jsr.v13i3.53290 ◽

2021 ◽

Vol 13 (3) ◽

pp. 901-913

Author(s):

S. Gupta ◽

R. R. Sedamkar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Missing Values ◽

Performance Metrics ◽

Model Performance ◽

Regression Tree ◽

Classification And Regression Tree ◽

Proposed Model ◽

Time Required

Enhancing the diagnostic ability of Machine Learning models for acceptable prediction in the healthcare community is still a concern. There are critical care disease datasets available online on which researchers have experimented with a different number of instances and features for similar disease prediction. Further, different Machine Learning (ML) models have different preprocessing requirements. Framingham heart disease data is multicollinear and has missing values. Thus, the proposed model aims to explore the differential preprocessing needs of ML models followed by feature selection in consensus with domain experts and feature extraction to resolve multicollinearity issues. Missing values have been imputed differently for each feature. The work also identifies optimal train set size by plotting a learning curve that provides a minimum generalization gap. When testing is done on this hyperparameter tuned model, performance is enhanced with respect to the F score weighted by support and stratification since the data is imbalanced. Experimental results demonstrate improvement in performance metrics, i.e., weighted F score, precision, recall, accuracy up to 3 %, and F1 score by 8 % for Logistic Regression Classifier with the proposed model. Further, the time required for hyperparameter tuning is reduced by 50% for tree-based models, particularly Classification and Regression Tree (CART).

Download Full-text

Sequential Feature Selection and Machine Learning Algorithm-Based Patient’s Death Events Prediction and Diagnosis in Heart Disease

SN Computer Science ◽

10.1007/s42979-020-00370-1 ◽

2020 ◽

Vol 1 (6) ◽

Author(s):

Ritu Aggrawal ◽

Saurabh Pal

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Sequential Feature Selection

Download Full-text

A Machine Learning Framework for Feature Selection in Heart Disease Classification Using Improved Particle Swarm Optimization with Support Vector Machine Classifier

Programming and Computer Software ◽

10.1134/s0361768818060129 ◽

2018 ◽

Vol 44 (6) ◽

pp. 388-397 ◽

Cited By ~ 7

Author(s):

J. Vijayashree ◽

H. Parveen Sultana

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Particle Swarm Optimization ◽

Heart Disease ◽

Support Vector Machine Classifier ◽

Disease Classification ◽

Support Vector ◽

Swarm Optimization ◽

Learning Framework

Download Full-text

Review on Feature Selection and Classification using Neuro-Fuzzy Approaches

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2016100102 ◽

2016 ◽

Vol 7 (4) ◽

pp. 28-44 ◽

Cited By ~ 4

Author(s):

Saroj Biswas ◽

Monali Bordoloi ◽

Biswajit Purkayastha

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Computation Time ◽

Recent Survey ◽

Learning Problems ◽

Fuzzy Approach ◽

Redundant Data ◽

Classification Feature ◽

Research Article ◽

Neuro Fuzzy

This research article attempts to provide a recent survey on neuro-fuzzy approaches for feature selection and classification. Feature selection acts as a catalyst in reducing computation time and dimensionality, enhancing prediction performance or accuracy and curtailing irrelevant or redundant data. The neuro-fuzzy approach is used for feature selection and for providing some insight to the user about the symbolic knowledge embedded within the network. The neuro–fuzzy approach combines the merits of neural network and fuzzy logic to solve many complex machine learning problems. The objective of this article is to provide a generic introduction and a recent survey to neuro-fuzzy approaches for feature selection and classification in a wide area of machine learning problems. Some of the existing neuro-fuzzy models are also applied on standard datasets to demonstrate the applicability of neuro-fuzzy approaches.

Download Full-text

Coronary Heart Disease Interpretation Based on Deep Neural Network

Computer Engineering and Applications Journal ◽

10.18495/comengapp.v8i1.288 ◽

2019 ◽

Vol 8 (1) ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Annisa Darmawahyuni ◽

Siti Nurmaini ◽

Firdaus Firdaus

Keyword(s):

Neural Network ◽

Machine Learning ◽

Coronary Heart Disease ◽

Heart Disease ◽

Predictive Accuracy ◽

Intelligent System ◽

Learning Algorithm ◽

Binary Classification ◽

Confusion Matrix ◽

Diagnostic Methods

Coronary heart disease (CHD) population increases every year with a significant number of deaths. Moreover, the mortality from coronary heart disease gets the highest prevalence in Indonesia at 1.5 percent. The misdiagnosis of coronary heart disease is a crucial fundamental that is the major factor that caused death. To prevent misdiagnosis of CHD, an intelligent system has been designed. This paper proposed a simulation which can be used to diagnose the coronary heart disease in better performance than the traditional diagnostic methods. Some researches have developed a system using conventional neural network or other machine learning algorithm, but the results are not a good performance. Based on a conventional neural network, deeper neural network (DNN) is proposed to our model in this work. As known as, the neural network is a supervised learning algorithm that good in the classification task. In DNN model, the implementation of binary classification was implemented to diagnose CHD present (representative “1”) or CHD absent (representative “0”). To help performance analysis using the UCI machine learning repository heart disease dataset, ROC Curve and its confusion matrix were implemented in this work. The overall predictive accuracy, sensitivity, and specificity acquired was 96%, 99%, 92%, respectively.

Download Full-text

An improved feature selection approach for chronic heart disease detection

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i6.3001 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3501-3506

Author(s):

S. J. Sushma ◽

Tsehay Admassu Assegie ◽

D. C. Vinutha ◽

S. Padmashree

Keyword(s):

Feature Selection ◽

Heart Disease ◽

Binary Classification ◽

Classification Model ◽

Computational Time ◽

Disease Detection ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Detection Model ◽

Sequential Feature Selection

Irrelevant feature in heart disease dataset affects the performance of binary classification model. Consequently, eliminating irrelevant and redundant feature (s) from training set with feature selection algorithm significantly improves the performance of classification model on heart disease detection. Sequential feature selection (SFS) is successful algorithm to improve the performance of classification model on heart disease detection and reduces the computational time complexity. In this study, sequential feature selection (SFS) algorithm is implemented for improving the classifier performance on heart disease detection by removing irrelevant features and training a model on optimal features. Furthermore, exhaustive and permutation based feature selection algorithm are implemented and compared with SFS algorithm. The implemented and existing feature selection algorithms are evaluated using real world Pima Indian heart disease dataset and result appears to prove that the SFS algorithm outperforms as compared to exhaustive and permutation based feature selection algorithm. Overall, the result looks promising and more effective heart disease detection model is developed with accuracy of 99.3%.

Download Full-text