A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models

Each Android application requires accumulations of permissions in installation time and they are considered as the features which can be utilized in permission-based identification of Android malwares. Recently, ensemble feature selection techniques have received increasing attention over conventional techniques in different applications. In this work, a cluster based voted ensemble voted feature selection technique combining five base wrapper approaches of R libraries is projected for identifying most prominent set of features in the predictive modeling of Android malwares. The proposed method preserves both the desirable features of an ensemble feature selector, accuracy and diversity. Moreover, in this work, five different data partitioning ratios are considered and the impact of those ratios on predictive model are measured using coefficient of determination (r-square) and root mean square error. The proposed strategy has created significant better outcome in term of the number of selected features and classification accuracy.

Download Full-text

Using Textual and Economic Features to Predict the RMB Exchange Rate

10.47260/amae/1168 ◽

2021 ◽

pp. 139-158

Author(s):

Yi-Chen Chung ◽

Hsien-Ming Chou ◽

Chih-Neng Hung ◽

Chihli Hung

Keyword(s):

Feature Selection ◽

Exchange Rate ◽

Ensemble Learning ◽

Prediction Models ◽

Gaussian Process Regression ◽

Support Vector ◽

Rmb Exchange Rate ◽

The Exchange Rate ◽

Basic Prediction ◽

Feature Selection Techniques

Abstract This research proposes an integrated framework for the use of textual and economic features to predict the exchange rate of the TWD (Taiwan dollar) against the RMB (Chinese Renminbi). The exchange rate is affected by the current economic situation and expectations for the future economic climate. Exchange rate forecasting studies focus mainly on overall economic indices and the actual exchange rate, but overlook the influence of news. This research considers both textual and economic factors and builds three basic prediction models, i.e. multiple linear regression (MLR), support vector regression (SVR), and Gaussian process regression (GPR) for the prediction of the RMB exchange rate. In addition to the three basic prediction models, this research uses ensemble learning and feature selection techniques to improve prediction performance. Our experiments demonstrate that textual features also play an important role in predicting the RMB exchange rate. The SVR model is shown to outperform the other models and the MLR model is shown to perform worst. The ensemble of three basic models performs better than its individual counterparts. Finally, the models which use feature selection techniques demonstrate improved results in general, and different feature selection techniques are shown to be more suitable for different prediction models. JEL classification numbers: D80, F31, F47. Keywords: Exchange rate prediction, Text mining, Ensemble learning, Time series forecasting.

Download Full-text

Feature Selection Techniques for the Analysis of Discriminative Features in Temporal and Frontal Lobe Epilepsy: A Comparative Study

The Open Biomedical Engineering Journal ◽

10.2174/1874120702115010001 ◽

2021 ◽

Vol 15 (1) ◽

pp. 1-15

Author(s):

Behrooz Abbaszadeh ◽

Cesar Alexandre Domingues Teixeira ◽

Mustapha C.E. Yagoub

Keyword(s):

Feature Selection ◽

Frontal Lobe ◽

Time Domain ◽

Life Quality ◽

Epileptic Seizures ◽

Ar Model ◽

Model Parameters ◽

Frontal Lobe Epilepsy ◽

The Impact ◽

Feature Selection Techniques

Background: Because about 30% of epileptic patients suffer from refractory epilepsy, an efficient automatic seizure prediction tool is in great demand to improve their life quality. Methods: In this work, time-domain discriminating preictal and interictal features were efficiently extracted from the intracranial electroencephalogram of twelve patients, i.e., six with temporal and six with frontal lobe epilepsy. The performance of three types of feature selection methods was compared using Matthews’s correlation coefficient (MCC). Results: Kruskal Wallis, a non-parametric approach, was found to perform better than the other approaches due to a simple and less resource consuming strategy as well as maintaining the highest MCC score. The impact of dividing the electroencephalogram signals into various sub-bands was investigated as well. The highest performance of Kruskal Wallis may suggest considering the importance of univariate features like complexity and interquartile ratio (IQR), along with autoregressive (AR) model parameters and the maximum (MAX) cross-correlation to efficiently predict epileptic seizures. Conclusion: The proposed approach has the potential to be implemented on a low power device by considering a few simple time domain characteristics for a specific sub-band. It should be noted that, as there is not a great deal of literature on frontal lobe epilepsy, the results of this work can be considered promising.

Download Full-text

The Impact of Feature Selection Techniques on the Performance of Predicting Parkinson’s Disease

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2018.11.02 ◽

2018 ◽

Vol 10 (11) ◽

pp. 14-29

Author(s):

Abdullah Al Imran ◽

◽

Ananya Rahman ◽

Humayoun Kabir ◽

Shamsur Rahim

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Feature Selection ◽

The Impact ◽

Feature Selection Techniques

Download Full-text

A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models

2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) ◽

10.1109/msr.2017.18 ◽

2017 ◽

Cited By ~ 17

Author(s):

Baljinder Ghotra ◽

Shane McIntosh ◽

Ahmed E. Hassan

Keyword(s):

Feature Selection ◽

Large Scale ◽

Defect Classification ◽

Classification Models ◽

Large Scale Study ◽

The Impact ◽

Feature Selection Techniques

Download Full-text

Supervised Machine Learning Algorithms for Bioelectromagnetics: Prediction Models and Feature Selection Techniques Using Data from Weak Radiofrequency Radiation Effect on Human and Animals Cells

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17124595 ◽

2020 ◽

Vol 17 (12) ◽

pp. 4595

Author(s):

Malka N. Halgamuge

Keyword(s):

Machine Learning ◽

Experimental Data ◽

Feature Selection ◽

Exposure Time ◽

New Technologies ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Animal Cells ◽

Feature Selection Techniques

The emergence of new technologies to incorporate and analyze data with high-performance computing has expanded our capability to accurately predict any incident. Supervised Machine learning (ML) can be utilized for a fast and consistent prediction, and to obtain the underlying pattern of the data better. We develop a prediction strategy, for the first time, using supervised ML to observe the possible impact of weak radiofrequency electromagnetic field (RF-EMF) on human and animal cells without performing in-vitro laboratory experiments. We extracted laboratory experimental data from 300 peer-reviewed scientific publications (1990–2015) describing 1127 experimental case studies of human and animal cells response to RF-EMF. We used domain knowledge, Principal Component Analysis (PCA), and the Chi-squared feature selection techniques to select six optimal features for computation and cost-efficiency. We then develop grouping or clustering strategies to allocate these selected features into five different laboratory experiment scenarios. The dataset has been tested with ten different classifiers, and the outputs are estimated using the k-fold cross-validation method. The assessment of a classifier’s prediction performance is critical for assessing its suitability. Hence, a detailed comparison of the percentage of the model accuracy (PCC), Root Mean Squared Error (RMSE), precision, sensitivity (recall), 1 − specificity, Area under the ROC Curve (AUC), and precision-recall (PRC Area) for each classification method were observed. Our findings suggest that the Random Forest algorithm exceeds in all groups in terms of all performance measures and shows AUC = 0.903 where k-fold = 60. A robust correlation was observed in the specific absorption rate (SAR) with frequency and cumulative effect or exposure time with SAR×time (impact of accumulated SAR within the exposure time) of RF-EMF. In contrast, the relationship between frequency and exposure time was not significant. In future, with more experimental data, the sample size can be increased, leading to more accurate work.

Download Full-text

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Symmetry ◽

10.3390/sym12071147 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1147 ◽

Cited By ~ 2

Author(s):

Abdullateef O. Balogun ◽

Shuib Basri ◽

Saipunidzam Mahamad ◽

Said J. Abdulkadir ◽

Malek A. Almomani ◽

...

Keyword(s):

Feature Selection ◽

Empirical Study ◽

Prediction Models ◽

Empirical Studies ◽

Experimental Results ◽

Defect Prediction ◽

Software Defect Prediction ◽

Search Methods ◽

Software Defect ◽

The Impact

Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in the respective scope of studies. It is hence critical to conduct an extensive empirical study to address these contradictions to guide researchers and buttress the scientific tenacity of experimental conclusions. In this study, we investigated the impact of 46 FS methods using Naïve Bayes and Decision Tree classifiers over 25 software defect datasets from 4 software repositories (NASA, PROMISE, ReLink, and AEEEM). The ensuing prediction models were evaluated based on accuracy and AUC values. Scott–KnottESD and the novel Double Scott–KnottESD rank statistical methods were used for statistical ranking of the studied FS methods. The experimental results showed that there is no one best FS method as their respective performances depends on the choice of classifiers, performance evaluation metrics, and dataset. However, we recommend the use of statistical-based, probability-based, and classifier-based filter feature ranking (FFR) methods, respectively, in SDP. For filter subset selection (FSS) methods, correlation-based feature selection (CFS) with metaheuristic search methods is recommended. For wrapper feature selection (WFS) methods, the IWSS-based WFS method is recommended as it outperforms the conventional SFS and LHS-based WFS methods.

Download Full-text

Impact of restricted forward greedy feature selection technique on bug prediction

10.7287/peerj.preprints.1411v1 ◽

2015 ◽

Author(s):

Muthukumaran Kasinathan ◽

Lalita Bhanu Murthy Neti

Keyword(s):

Feature Selection ◽

Prediction Models ◽

Source Code ◽

Feature Selection Technique ◽

Code Metrics ◽

Change Metrics ◽

Misclassification Rates ◽

The Individual ◽

The Impact ◽

Source Code Metrics

Several change metrics and source code metrics have been introduced and proved to be effective in bug prediction. Researchers performed comparative studies of bug prediction models built using the individual metrics as well as combination of these metrics. In this paper, we investigate the impact of feature selection in bug prediction models by analyzing the misclassification rates of these models with and without feature selection in place. We conduct our experiments on five open source projects by considering numerous change metrics and source code metrics. And this study aims to figure out the reliable subset of metrics that are common amongst all projects.

Download Full-text

A STUDY OF SOFTWARE METRIC SELECTION TECHNIQUES: STABILITY ANALYSIS AND DEFECT PREDICTION MODEL PERFORMANCE

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013600105 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1360010 ◽

Cited By ~ 6

Author(s):

HUANJING WANG ◽

TAGHI M. KHOSHGOFTAAR ◽

QIANHUI (ALTHEA) LIANG

Keyword(s):

Feature Selection ◽

Prediction Model ◽

Prediction Models ◽

Model Performance ◽

Classification Model ◽

Defect Prediction ◽

Feature Selection Technique ◽

Selection Technique ◽

Metric Selection ◽

Feature Selection Techniques

Software metrics (features or attributes) are collected during the software development cycle. Metric selection is one of the most important preprocessing steps in the process of building defect prediction models and may improve the final prediction result. However, the addition or removal of program modules (instances or samples) can alter the subsets chosen by a feature selection technique, rendering the previously-selected feature sets invalid. Very limited research have been done considering both stability (or robustness) and defect prediction model performance together in the software engineering domain, despite the importance of both aspects when choosing a feature selection technique. In this paper, we test the stability and classification model performance of eighteen feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on sixteen datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability while two different versions of ReliefF show the most stability, followed by the PRC- and AUC-based threshold-based feature selection techniques. Results also show that the signal-to-noise ranker performed moderately in terms of robustness and was the best ranker in terms of model performance. Finally, we conclude that while for some rankers, stability and classification performance are correlated, this is not true for other rankers, and therefore performance according to one scheme (stability or model performance) cannot be used to predict performance according to the other.

Download Full-text

Empirical assessment of feature selection techniques in defect prediction models using web applications

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-18473 ◽

2019 ◽

Vol 36 (6) ◽

pp. 6567-6578

Author(s):

Ruchika Malhotra ◽

Anjali Sharma

Keyword(s):

Feature Selection ◽

Web Applications ◽

Prediction Models ◽

Defect Prediction ◽

Empirical Assessment ◽

Defect Prediction Models ◽

Feature Selection Techniques

Download Full-text

A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models

An Ensemble Voted Feature Selection Technique for Predictive Modeling of Malwares of Android

Using Textual and Economic Features to Predict the RMB Exchange Rate

Feature Selection Techniques for the Analysis of Discriminative Features in Temporal and Frontal Lobe Epilepsy: A Comparative Study

The Impact of Feature Selection Techniques on the Performance of Predicting Parkinson’s Disease

A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models

Supervised Machine Learning Algorithms for Bioelectromagnetics: Prediction Models and Feature Selection Techniques Using Data from Weak Radiofrequency Radiation Effect on Human and Animals Cells

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Impact of restricted forward greedy feature selection technique on bug prediction

A STUDY OF SOFTWARE METRIC SELECTION TECHNIQUES: STABILITY ANALYSIS AND DEFECT PREDICTION MODEL PERFORMANCE

Empirical assessment of feature selection techniques in defect prediction models using web applications

Export Citation Format