Hepatitis Detection using Random Forest based on SVM-RFE (Recursive Feature Elimination) Feature Selection and SMOTE

Feature selection has predominant importance in various kinds of applications. However, it is still considered as a cumbersome process to identify the vital features among the available set for the problem taken for study. The researchers proposed wide variety of techniques over the period of time which concentrate on its own. Some of the existing familiar methods include Particle Swarm Optimisation (PSO), Genetic Algorithm (GA) and Simulated Annealing (SA). While some of the methods are existing, the emerging methods provide promising results compared with them. This article analyses such methods like LASSO, Boruta, Recursive Feature Elimination (RFE), Regularised Random Forest (RRF) and DALEX. The dataset of variant sizes is considered to assess the importance of feature selection out of the available features. The results are also discussed from the obtained features and the selected features with respect to the method chosen for study.

Download Full-text

Recursive Feature Elimination with Ridge Regression (L2) Machine Learning Hybrid Feature Selection Algorithm for Diabetic Prediction using Random Forest Classifer.

10.21203/rs.3.rs-742641/v1 ◽

2021 ◽

Author(s):

K venkatachalam ◽

P Prabhu ◽

B saravana Balaji ◽

Mohamed Abouhawwash ◽

R Rajadevi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Ridge Regression ◽

Feature Selection Method ◽

Selection Method ◽

Recursive Feature Elimination ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Data Set

Abstract In day today life, diabetes illness is increasing in count due to the body not able to metabolize the glucose level. The prediction of the right diabetes patients is an important research area that many researchers are proposing the techniques to predict this disease through data mining and machine learning methods. In prediction, feature selection is one of the key concept in preprocessing so that the features that are relevant to the disease will be used for prediction. This will improve the prediction accuracy. Selecting right features among the whole feature set is a complicated process and many researchers are concentrating on it to produce the predictive model with high accuracy. In this proposed work, the wrapper based feature selection method called Recursive Feature Elimination (RFE) is combined with Ridge regression (L2) to form a hybrid L2 regulated feature selection algorithm to overcome the overfilling problem of the data set. Over fitting is the major problem in feature selection which means that the new data are not fit to the model since the training data is small. Ridge regression is mainly used to overcome the overfitting problem. Once the features are selected using the proposed feature selection method, random forest classifier is used to classify the data based on the selected features. The proposed work is experimented in PIDD data set and the evaluated results are compared with the existing algorithms to prove the accuracy effect of the proposed algorithm. From the results obtained by proposed algorithm, the accuracy of predicting the diabetes disease is high compared to other existing algorithms.

Download Full-text

An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN

10.21437/interspeech.2017-1794 ◽

2017 ◽

Cited By ~ 3

Author(s):

Tin Lay Nwe ◽

Huy Dat Tran ◽

Wen Zheng Terence Ng ◽

Bin Ma

Keyword(s):

Feature Selection ◽

Random Forest ◽

Bhattacharyya Distance ◽

Sound Classification

Download Full-text

MetalExplorer, a Bioinformatics Tool for the Improved Prediction of Eight Types of Metal-Binding Sites Using a Random Forest Algorithm with Two- Step Feature Selection

Current Bioinformatics ◽

10.2174/2468422806666160618091522 ◽

2017 ◽

Vol 12 (6) ◽

Cited By ~ 6

Author(s):

Jiangning Song ◽

Chen Li ◽

Cheng Zheng ◽

Jerico Revote ◽

Ziding Zhang ◽

...

Keyword(s):

Feature Selection ◽

Random Forest ◽

Metal Binding ◽

Binding Sites ◽

Random Forest Algorithm ◽

Bioinformatics Tool ◽

Metal Binding Sites

Download Full-text

Application of GA Feature Selection on Naive Bayes, Random Forest and SVM for Credit Card Fraud Detection

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317228 ◽

2020 ◽

Author(s):

Yakub K. Saheed ◽

Moshood A. Hambali ◽

Micheal O. Arowolo ◽

Yinusa A. Olasupo

Keyword(s):

Feature Selection ◽

Random Forest ◽

Credit Card ◽

Naive Bayes ◽

Fraud Detection ◽

Naïve Bayes ◽

Credit Card Fraud

Download Full-text

PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features

International Journal of Molecular Sciences ◽

10.3390/ijms22052704 ◽

2021 ◽

Vol 22 (5) ◽

pp. 2704

Author(s):

Andi Nur Nilamyani ◽

Firda Nurul Auliah ◽

Mohammad Ali Moni ◽

Watshara Shoombuatong ◽

Md Mehedi Hasan ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Web Application ◽

Computational Prediction ◽

Vital Role ◽

Machine Learning Algorithms ◽

Recursive Feature Elimination ◽

Post Translational Modification ◽

Multiple Sequence ◽

Sequence Features

Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.

Download Full-text

Train delays prediction based on feature selection and random forest

2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) ◽

10.1109/itsc45102.2020.9294653 ◽

2020 ◽

Author(s):

Yuanyuan Ji ◽

Wei Zheng ◽

Hairong Dong ◽

Pengfei Gao

Keyword(s):

Feature Selection ◽

Random Forest

Download Full-text

Radiogenomic modeling predicts survival-associated prognostic groups in glioblastoma

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab004 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

Abdullah Feroze ◽

Eric C Holland ◽

Linda Shapiro ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Area Under The Curve ◽

Selection Method ◽

Recursive Feature Elimination ◽

Signal Abnormality ◽

Mri Features ◽

Mri Scans

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.

Download Full-text