Search of Similar Programs Using Code Metrics and Big Data-Based Assessment of Software Reliability

Towards a software defect proneness model: feature selection

Applied Aspects of Information Technology ◽

10.15276/aait.04.2021.5 ◽

2021 ◽

Vol 4 (4) ◽

pp. 354-365

Author(s):

Vitaliy S. Yakovyna ◽

◽

Ivan I. Symets

Keyword(s):

Principal Component Analysis ◽

Feature Selection ◽

Random Forest ◽

Software Reliability ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Tree Classifier ◽

Code Metrics ◽

Software Code

This article is focused on improving static models of software reliability based on using machine learning methods to select the software code metrics that most strongly affect its reliability. The study used a merged dataset from the PROMISE Software Engineering repository, which contained data on testing software modules of five programs and twenty-one code metrics. For the prepared sampling, the most important features that affect the quality of software code have been selected using the following methods of feature selection: Boruta, Stepwise selection, Exhaustive Feature Selection, Random Forest Importance, LightGBM Importance, Genetic Algorithms, Principal Component Analysis, Xverse python. Basing on the voting on the results of the work of the methods of feature selection, a static (deterministic) model of software reliability has been built, which establishes the relationship between the probability of a defect in the software module and the metrics of its code. It has been shown that this model includes such code metrics as branch count of a program, McCabe’s lines of code and cyclomatic complexity, Halstead’s total number of operators and operands, intelligence, volume, and effort value. A comparison of the effectiveness of different methods of feature selection has been put into practice, in particular, a study of the effect of the method of feature selection on the accuracy of classification using the following classifiers: Random Forest, Support Vector Machine, k-Nearest Neighbors, Decision Tree classifier, AdaBoost classifier, Gradient Boosting for classification. It has been shown that the use of any method of feature selection increases the accuracy of classification by at least ten percent compared to the original dataset, which confirms the importance of this procedure for predicting software defects based on metric datasets that contain a significant number of highly correlated software code metrics. It has been found that the best accuracy of the forecast for most classifiers was reached using a set of features obtained from the proposed static model of software reliability. In addition, it has been shown that it is also possible to use separate methods, such as Autoencoder, Exhaustive Feature Selection and Principal Component Analysis with an insignificant loss of classification and prediction accuracy

Download Full-text

Big data and similarity-based software reliability assessment: The technique and applied tools

2018 IEEE 9th International Conference on Dependable Systems, Services and Technologies (DESSERT) ◽

10.1109/dessert.2018.8409182 ◽

2018 ◽

Author(s):

Svitlana Yaremchuk ◽

Vyacheslav Kharchenko

Keyword(s):

Big Data ◽

Software Reliability ◽

Reliability Assessment

Download Full-text

A study of software reliability on big data open source software

International Journal of Systems Assurance Engineering and Management ◽

10.1007/s13198-019-00777-x ◽

2019 ◽

Vol 10 (2) ◽

pp. 242-250 ◽

Cited By ~ 1

Author(s):

Ranjan Kumar ◽

Subhash Kumar ◽

Sanjay K. Tiwari

Keyword(s):

Big Data ◽

Open Source ◽

Open Source Software ◽

Software Reliability

Download Full-text

Software reliability assessment tool based on fault data clustering and hazard rate model considering cloud computing with big data

2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions) ◽

10.1109/icrito.2015.7359208 ◽

2015 ◽

Author(s):

Yoshinobu Tamura ◽

Shigeru Yamada

Keyword(s):

Cloud Computing ◽

Big Data ◽

Software Reliability ◽

Data Clustering ◽

Hazard Rate ◽

Assessment Tool ◽

Reliability Assessment ◽

Rate Model ◽

Hazard Rate Model

Download Full-text

Software Reliability and Cost Analysis Considering Service User for Cloud with Big Data

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539317500097 ◽

2017 ◽

Vol 24 (02) ◽

pp. 1750009 ◽

Cited By ~ 1

Author(s):

Yoshinobu Tamura ◽

Tomoya Takeuchi ◽

Shigeru Yamada

Keyword(s):

Cloud Computing ◽

Big Data ◽

Software Reliability ◽

Process Model ◽

Reliability Assessment ◽

Jump Diffusion ◽

Data Partitioning ◽

Software Cost ◽

Jump Diffusion Model ◽

Cloud User

At present, the cloud computing with big data is known as a next-generation software service paradigm. However, the effective methods of software reliability analysis considering the big data and cloud computing have been only few presented. In particular, it is important to consider the optimal data partitioning in terms of cloud computing with big data. Considering the cloud computing with big data, it will be useful for the software managers to estimate the total software cost in order to make allocations the optimal data area to the cloud user. We propose the method of component-oriented reliability assessment based on neural network in order to the optimal data partitioning for cloud computing with big data in this paper. Moreover, we propose the method of system-wide reliability assessment based on the jump diffusion process model considering the big data on cloud computing. Furthermore, we propose the optimal maintenance problem based on the jump diffusion model. Considering the contract cost for the maximum number of subscriber as the cloud user, we find the optimum maintenance time by minimizing the total software cost.

Download Full-text

Software Reliability Analysis Considering the Fault Detection Trends for Big Data on Cloud Computing

Lecture Notes in Electrical Engineering - Industrial Engineering, Management Science and Applications 2015 ◽

10.1007/978-3-662-47200-2_106 ◽

2015 ◽

pp. 1021-1030 ◽

Cited By ~ 4

Author(s):

Yoshinobu Tamura ◽

Shigeru Yamada

Keyword(s):

Cloud Computing ◽

Big Data ◽

Fault Detection ◽

Reliability Analysis ◽

Software Reliability

Download Full-text

Development of software reliability models using a hybrid approach and validation of the proposed models using big data

The Journal of Supercomputing ◽

10.1007/s11227-018-2457-8 ◽

2018 ◽

Vol 76 (4) ◽

pp. 2252-2265 ◽

Cited By ~ 1

Author(s):

P. Govindasamy ◽

R. Dillibabu

Keyword(s):

Big Data ◽

Software Reliability ◽

Hybrid Approach ◽

Reliability Models ◽

Software Reliability Models

Download Full-text

Find Out About 'Big Data' to Track Outcomes

ASHA Leader ◽

10.1044/leader.an5.18022013.59 ◽

2013 ◽

Vol 18 (2) ◽

pp. 59-59

Keyword(s):

Big Data

Find Out About 'Big Data' to Track Outcomes

Download Full-text

U.S., Canada Collaborate on Big Data in ASD Research

ASHA Leader ◽

10.1044/leader.nib4.20122015.16 ◽

2015 ◽

Vol 20 (12) ◽

pp. 16-16

Keyword(s):

Big Data

Download Full-text

Correlating Personality and Actual Phone Usage

Journal of Individual Differences ◽

10.1027/1614-0001/a000139 ◽

2014 ◽

Vol 35 (3) ◽

pp. 158-165 ◽

Cited By ~ 38

Author(s):

Christian Montag ◽

Konrad Błaszkiewicz ◽

Bernd Lachmann ◽

Ionut Andone ◽

Rayna Sariyska ◽

...

Keyword(s):

Big Data ◽

Social Network ◽

Mobile Phone ◽

Mobile Phones ◽

Self Report ◽

Psychological Variables ◽

New Approach ◽

Report Data ◽

Self Report Data ◽

Voice Calls

In the present study we link self-report-data on personality to behavior recorded on the mobile phone. This new approach from Psychoinformatics collects data from humans in everyday life. It demonstrates the fruitful collaboration between psychology and computer science, combining Big Data with psychological variables. Given the large number of variables, which can be tracked on a smartphone, the present study focuses on the traditional features of mobile phones – namely incoming and outgoing calls and SMS. We observed N = 49 participants with respect to the telephone/SMS usage via our custom developed mobile phone app for 5 weeks. Extraversion was positively associated with nearly all related telephone call variables. In particular, Extraverts directly reach out to their social network via voice calls.

Download Full-text