The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text

A Hybrid Feature Selection Method for Software Fault Prediction

IEICE Transactions on Information and Systems ◽

10.1587/transinf.2019edp7033 ◽

2019 ◽

Vol E102.D (10) ◽

pp. 1966-1975

Author(s):

Yiheng JIAN ◽

Xiao YU ◽

Zhou XU ◽

Ziyi MA

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Fault Prediction ◽

Software Fault Prediction ◽

Software Fault

Download Full-text

FECS: A Cluster Based Feature Selection Method for Software Fault Prediction with Noises

2015 IEEE 39th Annual Computer Software and Applications Conference ◽

10.1109/compsac.2015.66 ◽

2015 ◽

Cited By ~ 9

Author(s):

Wangshu Liu ◽

Shulong Liu ◽

Qing Gu ◽

Xiang Chen ◽

Daoxu Chen

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Fault Prediction ◽

Software Fault Prediction ◽

Software Fault

Download Full-text

A Novel Feature Selection Method for Software Fault Prediction Model

2019 Annual Reliability and Maintainability Symposium (RAMS) ◽

10.1109/rams.2019.8768923 ◽

2019 ◽

Author(s):

Can Cui ◽

Bin Liu ◽

Guoqi Li

Keyword(s):

Feature Selection ◽

Prediction Model ◽

Feature Selection Method ◽

Selection Method ◽

Fault Prediction ◽

Software Fault Prediction ◽

Software Fault

Download Full-text

An AIS based feature selection method for software fault prediction

2014 Iranian Conference on Intelligent Systems (ICIS) ◽

10.1109/iraniancis.2014.6802598 ◽

2014 ◽

Cited By ~ 1

Author(s):

A. Soleimani ◽

F. Asdaghi

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Fault Prediction ◽

Software Fault Prediction ◽

Software Fault

Download Full-text

A Comparative Analysis of Filter-Based Feature Selection Methods for Software Fault Prediction

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research-vn.v2021.n1.969 ◽

2021 ◽

pp. 1-7

Author(s):

Thị Minh Phương Hà ◽

Thi My Hanh Le ◽

Thanh Binh Nguyen

Keyword(s):

Feature Selection ◽

Prediction Models ◽

Information Gain ◽

Feature Selection Method ◽

Computation Time ◽

Fault Prediction ◽

Software Systems ◽

Selection Methods ◽

Software Fault Prediction

The rapid growth of data has become a huge challenge for software systems. The quality of fault predictionmodel depends on the quality of software dataset. High-dimensional data is the major problem that affects the performance of the fault prediction models. In order to deal with dimensionality problem, feature selection is proposed by various researchers. Feature selection method provides an effective solution by eliminating irrelevant and redundant features, reducing computation time and improving the accuracy of the machine learning model. In this study, we focus on research and synthesis of the Filter-based feature selection with several search methods and algorithms. In addition, five filter-based feature selection methods are analyzed using five different classifiers over datasets obtained from National Aeronautics and Space Administration (NASA) repository. The experimental results show that Chi-Square and Information Gain methods had the best influence on the results of predictive models over other filter ranking methods.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Combining feature selection, feature learning and ensemble learning for software fault prediction

2019 11th International Conference on Knowledge and Systems Engineering (KSE) ◽

10.1109/kse.2019.8919292 ◽

2019 ◽

Author(s):

Hung Duy Tran ◽

LE Thi My Hanh ◽

Nguyen Thanh Binh

Keyword(s):

Feature Selection ◽

Ensemble Learning ◽

Feature Learning ◽

Fault Prediction ◽

Software Fault Prediction ◽

Software Fault

Download Full-text

Taxonomy of machine learning algorithms in software fault prediction using object oriented metrics

Procedia Computer Science ◽

10.1016/j.procs.2018.05.115 ◽

2018 ◽

Vol 132 ◽

pp. 993-1001 ◽

Cited By ~ 7

Author(s):

Ajmer Singh ◽

Rajesh Bhatia ◽

Anita Singhrova

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Object Oriented ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Software Fault Prediction ◽

Software Fault ◽

Object Oriented Metrics

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text

Diagnostic Performance of 2D and 3D T2WI-Based Radiomics Features With Machine Learning Algorithms to Distinguish Solid Solitary Pulmonary Lesion

Frontiers in Oncology ◽

10.3389/fonc.2021.683587 ◽

2021 ◽

Vol 11 ◽

Author(s):

Qi Wan ◽

Jiaxuan Zhou ◽

Xiaoying Xia ◽

Jianfeng Hu ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Diagnostic Performance ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

Selection Methods ◽

Linear Discriminant ◽

2D And 3D

ObjectiveTo evaluate the performance of 2D and 3D radiomics features with different machine learning approaches to classify SPLs based on magnetic resonance(MR) T2 weighted imaging (T2WI).Material and MethodsA total of 132 patients with pathologically confirmed SPLs were examined and randomly divided into training (n = 92) and test datasets (n = 40). A total of 1692 3D and 1231 2D radiomics features per patient were extracted. Both radiomics features and clinical data were evaluated. A total of 1260 classification models, comprising 3 normalization methods, 2 dimension reduction algorithms, 3 feature selection methods, and 10 classifiers with 7 different feature numbers (confined to 3–9), were compared. The ten-fold cross-validation on the training dataset was applied to choose the candidate final model. The area under the receiver operating characteristic curve (AUC), precision-recall plot, and Matthews Correlation Coefficient were used to evaluate the performance of machine learning approaches.ResultsThe 3D features were significantly superior to 2D features, showing much more machine learning combinations with AUC greater than 0.7 in both validation and test groups (129 vs. 11). The feature selection method Analysis of Variance(ANOVA), Recursive Feature Elimination(RFE) and the classifier Logistic Regression(LR), Linear Discriminant Analysis(LDA), Support Vector Machine(SVM), Gaussian Process(GP) had relatively better performance. The best performance of 3D radiomics features in the test dataset (AUC = 0.824, AUC-PR = 0.927, MCC = 0.514) was higher than that of 2D features (AUC = 0.740, AUC-PR = 0.846, MCC = 0.404). The joint 3D and 2D features (AUC=0.813, AUC-PR = 0.926, MCC = 0.563) showed similar results as 3D features. Incorporating clinical features with 3D and 2D radiomics features slightly improved the AUC to 0.836 (AUC-PR = 0.918, MCC = 0.620) and 0.780 (AUC-PR = 0.900, MCC = 0.574), respectively.ConclusionsAfter algorithm optimization, 2D feature-based radiomics models yield favorable results in differentiating malignant and benign SPLs, but 3D features are still preferred because of the availability of more machine learning algorithmic combinations with better performance. Feature selection methods ANOVA and RFE, and classifier LR, LDA, SVM and GP are more likely to demonstrate better diagnostic performance for 3D features in the current study.

Download Full-text