Parkinson disease prediction using feature selection technique in machine learning

Background: Anti-inflammatory peptides (AIPs) are potent therapeutic agents for inflammatory and autoimmune disorders due to their high specificity and minimal toxicity under normal conditions. Therefore, it is greatly significant and beneficial to identify AIPs for further discovering novel and efficient AIPs-based therapeutics. Recently, three computational approaches, which can effectively identify potential AIPs, have been developed based on machine learning algorithms. However, there are several challenges with the existing three predictors. Objective: A novel machine learning algorithm needs to be proposed to improve the AIPs prediction accuracy. Methods: This study attempts to improve the recognition of AIPs by employing multiple primary sequence-based feature descriptors and an efficient feature selection strategy. By sorting features through four enhanced minimal redundancy maximal relevance (emRMR) methods, and then attaching seven different classifiers wrapper methods based on the sequential forward selection algorithm (SFS), we proposed a hybrid feature selection technique emRMR-SFS to optimize feature vectors. Furthermore, by evaluating seven classifiers trained with the optimal feature subset, we developed the extremely randomized tree (ERT) based predictor named PREDAIP for identifying AIPs. Results: We systematically compared the performance of PREDAIP with the existing tools on an independent test dataset. It demonstrates the effectiveness and power of the PREDAIP. The correlation criteria used in emRMR would affect the selection results of the optimal feature subset at the SFS-wrapper stage, which justifies the necessity for considering different correlation criteria in emRMR. Conclusion: We expect that PREDAIP will be useful for the high-throughput prediction of AIPs and the development of AIPs therapeutics.

Download Full-text

Skin disease prediction using ensemble methods and a new hybrid feature selection technique

Iran Journal of Computer Science ◽

10.1007/s42044-020-00058-y ◽

2020 ◽

Vol 3 (4) ◽

pp. 207-216

Author(s):

Anurag Kumar Verma ◽

Saurabh Pal ◽

B. B. Tiwari

Keyword(s):

Feature Selection ◽

Skin Disease ◽

Ensemble Methods ◽

Disease Prediction ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms

Electronics ◽

10.3390/electronics10172099 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2099

Author(s):

Paweł Ziemba ◽

Jarosław Becker ◽

Aneta Becker ◽

Aleksandra Radomska-Zalas ◽

Mateusz Pawluk ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Decision Support ◽

Binary Classification ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Feature Selection Technique ◽

Selection Technique ◽

Feature Discretization ◽

The Impact

One of the important research problems in the context of financial institutions is the assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine learning based methods are increasingly employed to solve such problems. However, the selection of appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision support is very challenging, and can affect the quality of the loan recommendations. To address this challenging task, this article examines the effectiveness of various data science techniques in issue of credit decision support. In particular, processing pipeline was designed, which consists of methods for data resampling, feature discretization, feature selection, and binary classification. We suggest building appropriate decision models leveraging pertinent methods for binary classification, feature selection, as well as data resampling and feature discretization. The selected models’ feasibility analysis was performed through rigorous experiments on real data describing the client’s ability for loan repayment. During experiments, we analyzed the impact of feature selection on the results of binary classification, and the impact of data resampling with feature discretization on the results of feature selection and binary classification. After experimental evaluation, we found that correlation-based feature selection technique and random forest classifier yield the superior performance in solving underlying problem.

Download Full-text

Analysis of Machine Learning Algorithms with Feature Selection for Intrusion Detection using UNSW-NB15 Dataset

International Journal of Network Security & Its Applications ◽

10.5121/ijnsa.2021.13102 ◽

2021 ◽

Vol 13 (1) ◽

pp. 21-31

Author(s):

Geeta Kocher ◽

Gulshan Kumar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Stochastic Gradient Descent ◽

Feature Selection Technique ◽

Selection Technique ◽

Machine Learning Classifiers ◽

Network Intrusion ◽

Learning Classifiers ◽

Positive Rate

In recent times, various machine learning classifiers are used to improve network intrusion detection. The researchers have proposed many solutions for intrusion detection in the literature. The machine learning classifiers are trained on older datasets for intrusion detection, which limits their detection accuracy. So, there is a need to train the machine learning classifiers on the latest dataset. In this paper, UNSW-NB15, the latest dataset is used to train machine learning classifiers. The selected classifiers such as K-Nearest Neighbors (KNN), Stochastic Gradient Descent (SGD), Random Forest (RF), Logistic Regression (LR), and Naïve Bayes (NB) classifiers are used for training from the taxonomy of classifiers based on lazy and eager learners. In this paper, Chi-Square, a filter-based feature selection technique, is applied to the UNSW-NB15 dataset to reduce the irrelevant and redundant features. The performance of classifiers is measured in terms of Accuracy, Mean Squared Error (MSE), Precision, Recall, F1-Score, True Positive Rate (TPR) and False Positive Rate (FPR) with or without feature selection technique and comparative analysis of these machine learning classifiers is carried out.

Download Full-text

Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique

Cybersecurity ◽

10.1186/s42400-021-00103-8 ◽

2022 ◽

Vol 5 (1) ◽

Author(s):

Raisa Abedin Disha ◽

Sajjad Waheed

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Performance Analysis ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Experimental Result ◽

Feature Selection Technique ◽

Selection Technique

AbstractTo protect the network, resources, and sensitive data, the intrusion detection system (IDS) has become a fundamental component of organizations that prevents cybercriminal activities. Several approaches have been introduced and implemented to thwart malicious activities so far. Due to the effectiveness of machine learning (ML) methods, the proposed approach applied several ML models for the intrusion detection system. In order to evaluate the performance of models, UNSW-NB 15 and Network TON_IoT datasets were used for offline analysis. Both datasets are comparatively newer than the NSL-KDD dataset to represent modern-day attacks. However, the performance analysis was carried out by training and testing the Decision Tree (DT), Gradient Boosting Tree (GBT), Multilayer Perceptron (MLP), AdaBoost, Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) for the binary classification task. As the performance of IDS deteriorates with a high dimensional feature vector, an optimum set of features was selected through a Gini Impurity-based Weighted Random Forest (GIWRF) model as the embedded feature selection technique. This technique employed Gini impurity as the splitting criterion of trees and adjusted the weights for two different classes of the imbalanced data to make the learning algorithm understand the class distribution. Based upon the importance score, 20 features were selected from UNSW-NB 15 and 10 features from the Network TON_IoT dataset. The experimental result revealed that DT performed well with the feature selection technique than other trained models of this experiment. Moreover, the proposed GIWRF-DT outperformed other existing methods surveyed in the literature in terms of the F1 score.

Download Full-text