Software Bug Prediction Employing Feature Selection and Deep Learning

We recently noticed the advancement and growth in the field of artificial intelligence and in its various branches such as Machine Learning (ML) and Deep Learning in various vital fields such as robotics, smart cars, smart cities, health care, software engineering and many other fields. Software bug prediction are one of the most important ML uses in software engineering. In addition, the feature selection is one of ML methods that aim to reduce a feature set that are used for building models. In this paper, we propose to use the Chi-Square feature selection method to calculate features importance, then to build a ML models, first by using top ten important features and second by using top five important features, based on three of well-known ML classifications algorithms, Support Vector Machine, Naïve Bayes and Linear Discriminant Analysis, with adding and exploring more about the effeteness of new metric of code smell intensity, the performance results of our approach against baseline achieved an improvements as average accuracy among nine datasets reaching up to 5.12%, 4.15% and 1% on the NB, SVM and LDA classifiers respectively.

Download Full-text

A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection

Neural Computing and Applications ◽

10.1007/s00521-021-06047-x ◽

2021 ◽

Author(s):

Canan Batur Şahin ◽

Laith Abualigah

Keyword(s):

Feature Selection ◽

Deep Learning ◽

Static Analysis ◽

Selection Model ◽

Vulnerability Detection

Download Full-text

Enhancement of email spam detection using improved deep learning algorithms for cyber security

Journal of Computer Security ◽

10.3233/jcs-200111 ◽

2021 ◽

pp. 1-34

Author(s):

Kadam Vikas Samarthrao ◽

Vandana M. Rohokale

Keyword(s):

Feature Selection ◽

Deep Learning ◽

Visual Features ◽

Spam Detection ◽

Learning Approaches ◽

Learning Technique ◽

Text Features ◽

Optimal Feature Selection ◽

Optimal Feature ◽

Email Spam

Email has sustained to be an essential part of our lives and as a means for better communication on the internet. The challenge pertains to the spam emails residing a large amount of space and bandwidth. The defect of state-of-the-art spam filtering methods like misclassification of genuine emails as spam (false positives) is the rising challenge to the internet world. Depending on the classification techniques, literature provides various algorithms for the classification of email spam. This paper tactics to develop a novel spam detection model for improved cybersecurity. The proposed model involves several phases like dataset acquisition, feature extraction, optimal feature selection, and detection. Initially, the benchmark dataset of email is collected that involves both text and image datasets. Next, the feature extraction is performed using two sets of features like text features and visual features. In the text features, Term Frequency-Inverse Document Frequency (TF-IDF) is extracted. For the visual features, color correlogram and Gray-Level Co-occurrence Matrix (GLCM) are determined. Since the length of the extracted feature vector seems to the long, the optimal feature selection process is done. The optimal feature selection is performed by a new meta-heuristic algorithm called Fitness Oriented Levy Improvement-based Dragonfly Algorithm (FLI-DA). Once the optimal features are selected, the detection is performed by the hybrid learning technique that is composed of two deep learning approaches named Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). For improving the performance of existing deep learning approaches, the number of hidden neurons of RNN and CNN is optimized by the same FLI-DA. Finally, the optimized hybrid learning technique having CNN and RNN classifies the data into spam and ham. The experimental outcomes show the ability of the proposed method to perform the spam email classification based on improved deep learning.

Download Full-text

Forecasting air pollutant concentration using a novel spatiotemporal deep learning model based on clustering, feature selection and empirical wavelet transform

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.149654 ◽

2021 ◽

pp. 149654

Author(s):

Jusong Kim ◽

Xiaoli Wang ◽

Chollyong Kang ◽

Jinwon Yu ◽

Penghui Li

Keyword(s):

Feature Selection ◽

Deep Learning ◽

Wavelet Transform ◽

Learning Model ◽

Air Pollutant ◽

Pollutant Concentration ◽

Model Based ◽

Empirical Wavelet Transform ◽

Deep Learning Model

Download Full-text

Feature Selection and Deep Learning for Deterioration Prediction of the Bridges

Journal of Performance of Constructed Facilities ◽

10.1061/(asce)cf.1943-5509.0001653 ◽

2021 ◽

Vol 35 (6) ◽

pp. 04021078

Author(s):

Jinsong Zhu ◽

Yanlei Wang

Keyword(s):

Feature Selection ◽

Deep Learning

Download Full-text

Continuous Software Bug Prediction

10.1145/3475716.3475790 ◽

2021 ◽

Author(s):

Song Wang ◽

Junjie Wang ◽

Jaechang Nam ◽

Nachiappan Nagappan

Keyword(s):

Software Bug Prediction ◽

Software Bug

Download Full-text

Developing Software Bug Prediction Models Using Various Software Metrics as the Bug Indicators

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2015.060209 ◽

2015 ◽

Vol 6 (2) ◽

Author(s):

Varuna Gupta ◽

Dr. N. ◽

Dr. Tarun

Keyword(s):

Software Metrics ◽

Prediction Models ◽

Software Bug Prediction ◽

Software Bug

Download Full-text

Wave2Vec: Vectorizing Electroencephalography Bio-Signal for Prediction of Brain Disease

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph15081750 ◽

2018 ◽

Vol 15 (8) ◽

pp. 1750 ◽

Cited By ~ 4

Author(s):

Seonho Kim ◽

Jungjoon Kim ◽

Hong-Woo Chun

Keyword(s):

Artificial Intelligence ◽

Time Series ◽

Feature Selection ◽

Deep Learning ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Real Number ◽

Real Time ◽

Language Processing

Interest in research involving health-medical information analysis based on artificial intelligence, especially for deep learning techniques, has recently been increasing. Most of the research in this field has been focused on searching for new knowledge for predicting and diagnosing disease by revealing the relation between disease and various information features of data. These features are extracted by analyzing various clinical pathology data, such as EHR (electronic health records), and academic literature using the techniques of data analysis, natural language processing, etc. However, still needed are more research and interest in applying the latest advanced artificial intelligence-based data analysis technique to bio-signal data, which are continuous physiological records, such as EEG (electroencephalography) and ECG (electrocardiogram). Unlike the other types of data, applying deep learning to bio-signal data, which is in the form of time series of real numbers, has many issues that need to be resolved in preprocessing, learning, and analysis. Such issues include leaving feature selection, learning parts that are black boxes, difficulties in recognizing and identifying effective features, high computational complexities, etc. In this paper, to solve these issues, we provide an encoding-based Wave2vec time series classifier model, which combines signal-processing and deep learning-based natural language processing techniques. To demonstrate its advantages, we provide the results of three experiments conducted with EEG data of the University of California Irvine, which are a real-world benchmark bio-signal dataset. After converting the bio-signals (in the form of waves), which are a real number time series, into a sequence of symbols or a sequence of wavelet patterns that are converted into symbols, through encoding, the proposed model vectorizes the symbols by learning the sequence using deep learning-based natural language processing. The models of each class can be constructed through learning from the vectorized wavelet patterns and training data. The implemented models can be used for prediction and diagnosis of diseases by classifying the new data. The proposed method enhanced data readability and intuition of feature selection and learning processes by converting the time series of real number data into sequences of symbols. In addition, it facilitates intuitive and easy recognition, and identification of influential patterns. Furthermore, real-time large-capacity data analysis is facilitated, which is essential in the development of real-time analysis diagnosis systems, by drastically reducing the complexity of calculation without deterioration of analysis performance by data simplification through the encoding process.

Download Full-text

Review on Deep Learning in Feature Selection

Advances in Intelligent Systems and Computing - The 10th International Conference on Computer Engineering and Networks ◽

10.1007/978-981-15-8462-6_49 ◽

2020 ◽

pp. 439-447

Author(s):

Yizhuo Zhang ◽

Yiwei Liu ◽

Chi-Hua Chen

Keyword(s):

Feature Selection ◽

Deep Learning

Download Full-text