scholarly journals Bidirectional Recurrent Neural Network Language Model: Cross Entropy Churn Metrics for Defect Prediction Modelling

Software Defect Prediction (SDP) plays an active area in many research domain of Software Quality of Assurance (SQA). Many existing research studies are based on software traditional metric sets and defect prediction models are built in machine language to detect the bug for limited source code line. Inspired by the above existing system. In this paper, defect prediction is focused on predicting defects in source code. The aim of this dissertation is to enhance the quality of the software for precise prediction of defects. So, that it helps the developer to find the bug and fix the issue, to make better use of a resource which reduces the test effort, minimize the cost and improve the quality of software. A new approach is introduced to improve the prediction performance of Bidirectional RNNLM in Deep Neural Network. To build the defect prediction model a defect learner framework is proposed and first it need to build a Neural Language Model. Using this Language Model it helps to learn to deep semantic features in source code and it train & test the model. Based on language model it combined with software traditional metric sets to measure the code and find the defect. The probability of language model and metric set Cross-Entropy with Abstract Syntax Tree (CE-AST) metric is used to evaluate the defect proneness and set as a metric label. For classification the metric label K-NN classifier is used. BPTT algorithm for learning RNN will provide additional improvement, it improves the predictions performance to find the dynamic error.

2019 ◽  
Vol 9 (13) ◽  
pp. 2764 ◽  
Author(s):  
Abdullateef Oluwagbemiga Balogun ◽  
Shuib Basri ◽  
Said Jadid Abdulkadir ◽  
Ahmad Sobri Hashim

Software Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the performance of SDP models. Feature selection (FS) is a proven method for addressing the dimensionality problem. However, the choice of FS method for SDP is still a problem, as most of the empirical studies on FS methods for SDP produce contradictory and inconsistent quality outcomes. Those FS methods behave differently due to different underlining computational characteristics. This could be due to the choices of search methods used in FS because the impact of FS depends on the choice of search method. It is hence imperative to comparatively analyze the FS methods performance based on different search methods in SDP. In this paper, four filter feature ranking (FFR) and fourteen filter feature subset selection (FSS) methods were evaluated using four different classifiers over five software defect datasets obtained from the National Aeronautics and Space Administration (NASA) repository. The experimental analysis showed that the application of FS improves the predictive performance of classifiers and the performance of FS methods can vary across datasets and classifiers. In the FFR methods, Information Gain demonstrated the greatest improvements in the performance of the prediction models. In FSS methods, Consistency Feature Subset Selection based on Best First Search had the best influence on the prediction models. However, prediction models based on FFR proved to be more stable than those based on FSS methods. Hence, we conclude that FS methods improve the performance of SDP models, and that there is no single best FS method, as their performance varied according to datasets and the choice of the prediction model. However, we recommend the use of FFR methods as the prediction models based on FFR are more stable in terms of predictive performance.


Defect prediction performances are significant to attain quality of the software and to understand previous errors. In this work, for assessing the classification accuracy, precision, and recall and F measure for various classifiers are used. The artificial neural network optimizations make the assumption that more than two algorithms for one optimization have been implemented. The optimization makes use of a heuristic for choosing the best of the algorithms for being applied in a particular situation. An approach of hybrid optimization for designing of the linkages method and is used for the dimensional synthesis of the mechanism. The ANN models are assisted in their convergence towards a global minimum by the multi-directional search algorithm that is incorporated in the GA. The results have shown an accuracy of classification of the NN-hybrid shuffled from algorithm to perform better by about 5.94% than that of the fuzzy classifiers and by about 3.59% of the NN-Lm training and by about 1.42% of the NN-shuffled frog algorithm..


2019 ◽  
Vol 9 (19) ◽  
pp. 4182 ◽  
Author(s):  
Pu Yan ◽  
Li Zhuo ◽  
Jiafeng Li ◽  
Hui Zhang ◽  
Jing Zhang

Pedestrian attributes (such as gender, age, hairstyle, and clothing) can effectively represent the appearance of pedestrians. These are high-level semantic features that are robust to illumination, deformation, etc. Therefore, they can be widely used in person re-identification, video structuring analysis and other applications. In this paper, a pedestrian attributes recognition method for surveillance scenarios using a multi-task lightweight convolutional neural network is proposed. Firstly, the labels of the attributes for each pedestrian image are integrated into a label vector. Then, a multi-task lightweight Convolutional Neural Network (CNN) is designed, which consists of five convolutional layers, three pooling layers and two fully connected layers to extract the deep features of pedestrian images. Considering that the data distribution of the datasets is unbalanced, the loss function is improved based on the sigmoid cross-entropy, and the scale factor is added to balance the amount of various attributes data. Through training the network, the mapping relationship model between the deep features of pedestrian images and the integration label vector of their attributes is established, which can be used to predict each attribute of the pedestrian. The experiments were conducted on two public pedestrian attributes datasets in surveillance scenarios, namely PETA and RAP. The results show that, compared with the state-of-the-art pedestrian attributes recognition methods, the proposed method can achieve a superior accuracy by 91.88% on PETA and 87.44% on RAP respectively.


Author(s):  
Taku Matsumoto ◽  
Yutaka Watanobe ◽  
Keita Nakamura ◽  
Yunosuke Teshima

Logical errors in source code can be detected by probabilities obtained from a language model trained by the recurrent neural network (RNN). Using the probabilities and determining thresholds, places that are likely to be logic errors can be enumerated. However, when the threshold is set inappropriately, user may miss true logical errors because of passive extraction or unnecessary elements obtained from excessive extraction. Moreover, the probabilities of output from the language model are different for each task, so the threshold should be selected properly. In this paper, we propose a logic error detection algorithm using an RNN and an automatic threshold determination method. The proposed method selects thresholds using incorrect codes and can enhance the detection performance of the trained language model. For evaluating the proposed method, experiments with data from an online judge system, which is one of the educational systems that provide the automated judge for many programming tasks, are conducted. The experimental results show that the selected thresholds can be used to improve the logic error detection performance of the trained language model.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1536
Author(s):  
Yiping Yang ◽  
Xiaohui Cui

Text classification is a fundamental research direction, aims to assign tags to text units. Recently, graph neural networks (GNN) have exhibited some excellent properties in textual information processing. Furthermore, the pre-trained language model also realized promising effects in many tasks. However, many text processing methods cannot model a single text unit’s structure or ignore the semantic features. To solve these problems and comprehensively utilize the text’s structure information and semantic information, we propose a Bert-Enhanced text Graph Neural Network model (BEGNN). For each text, we construct a text graph separately according to the co-occurrence relationship of words and use GNN to extract text features. Moreover, we employ Bert to extract semantic features. The former part can take into account the structural information, and the latter can focus on modeling the semantic information. Finally, we interact and aggregate these two features of different granularity to get a more effective representation. Experiments on standard datasets demonstrate the effectiveness of BEGNN.


2019 ◽  
Vol 7 (1) ◽  
pp. 22-28
Author(s):  
V. Nirmala ◽  
◽  
A. Rajagopal ◽  

implemented a working prototype of a Deep Learning module that seem to understand Newton’s third law of motion. The networks. In this paper, a Google BERT neural network model was trained using transfer learning technique on a synthetic dataset of simple physics problems within the scope of solving Newton’s third law problems that requires understanding of concepts such as action and reaction, magnitude and direction forces, simple concepts of vectors in physics problems. The of Netwon’s third law assuming certain boundaries on the language model of the word problems. A working prototype of this AI can be accessed at the given website. This paper also contributes the source code for reproducible results. This novel idea can be extended to more science topics. Applications of this interdisciplinary area of AI and physics have impact not just in areas of robotics and computational physics, but also in how science uses AI in the future. In future, more areas of .


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Guisheng Fan ◽  
Xuyang Diao ◽  
Huiqun Yu ◽  
Kang Yang ◽  
Liqiong Chen

In order to improve software reliability, software defect prediction is applied to the process of software maintenance to identify potential bugs. Traditional methods of software defect prediction mainly focus on designing static code metrics, which are input into machine learning classifiers to predict defect probabilities of the code. However, the characteristics of these artificial metrics do not contain the syntactic structures and semantic information of programs. Such information is more significant than manual metrics and can provide a more accurate predictive model. In this paper, we propose a framework called defect prediction via attention-based recurrent neural network (DP-ARNN). More specifically, DP-ARNN first parses abstract syntax trees (ASTs) of programs and extracts them as vectors. Then it encodes vectors which are used as inputs of DP-ARNN by dictionary mapping and word embedding. After that, it can automatically learn syntactic and semantic features. Furthermore, it employs the attention mechanism to further generate significant features for accurate defect prediction. To validate our method, we choose seven open-source Java projects in Apache, using F1-measure and area under the curve (AUC) as evaluation criteria. The experimental results show that, in average, DP-ARNN improves the F1-measure by 14% and AUC by 7% compared with the state-of-the-art methods, respectively.


2021 ◽  
Vol 28 (2) ◽  
Author(s):  
Aftab Ali ◽  
Naveed Khan ◽  
Mamun Abu-Tair ◽  
Joost Noppen ◽  
Sally McClean ◽  
...  

AbstractCorrelated quality metrics extracted from a source code repository can be utilized to design a model to automatically predict defects in a software system. It is obvious that the extracted metrics will result in a highly unbalanced data, since the number of defects in a good quality software system should be far less than the number of normal instances. It is also a fact that the selection of the best discriminating features significantly improves the robustness and accuracy of a prediction model. Therefore, the contribution of this paper is twofold, first it selects the best discriminating features that help in accurately predicting a defect in a software component. Secondly, a cost-sensitive logistic regression and decision tree ensemble-based prediction models are applied to the best discriminating features for precisely predicting a defect in a software component. The proposed models are compared with the most recent schemes in the literature in terms of accuracy, area under the curve, and recall. The models are evaluated using 11 datasets and it is evident from the results and analysis that the performance of the proposed prediction models outperforms the schemes in the literature.


Sign in / Sign up

Export Citation Format

Share Document