Applying Variant Variable Regularized Logistic Regression for Modeling Software Defect Predictor

Software defects are one of the main contributors to information technology waste and lead to rework, thus consuming a lot of time and money. Software defect prediction has the objective of defect prevention by classifying certain modules as defective or not defective. Many researchers have conducted research in the field of software defect prediction using NASA MDP public datasets, but these datasets still have shortcomings such as class imbalance and noise attribute. The class imbalance problem can be overcome by utilizing SMOTE (Synthetic Minority Over-sampling Technique) and the noise attribute problem can be solved by selecting features using Particle Swarm Optimization (PSO), So in this research, the integration between SMOTE and PSO is applied to the classification technique machine learning naïve Bayes and logistic regression. From the results of experiments that have been carried out on 8 NASA MDP datasets by dividing the dataset into training and testing data, it is found that the SMOTE + PSO integration in each classification technique can improve classification performance with the highest AUC (Area Under Curve) value on average 0,89 on logistic regression and 0,86 in naïve Bayes in the training and at the same time better than without combining the two.

Download Full-text

A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Learning in Software Defect Prediction

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/5/5 ◽

2020 ◽

Vol 17 (5) ◽

pp. 721-730

Author(s):

Kamal Bashir ◽

Tianrui Li ◽

Mahama Yahaya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Feature Selection ◽

Maximum Likelihood ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Software Defect ◽

Optimal Feature Subset ◽

Optimal Feature

The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data. Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data. In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR). We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP). The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets. The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data. The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process

Download Full-text

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

10.1007/978-981-16-4641-6_20 ◽

2021 ◽

pp. 233-248

Author(s):

Jayanti Goyal ◽

Ripu Ranjan Sinha

Keyword(s):

Logistic Regression ◽

Software Defect

Download Full-text

Outcomes Research in Hydrocephalus Treatment

Cases on Health Outcomes and Clinical Data Mining ◽

10.4018/978-1-61520-723-7.ch011 ◽

2010 ◽

pp. 225-244

Author(s):

Damien Wilburn

Keyword(s):

Cerebrospinal Fluid ◽

Logistic Regression ◽

Kernel Density ◽

Mortality Rates ◽

Healthcare Research ◽

National Inpatient Sample ◽

Modeling Software ◽

Hydrocephalus Treatment ◽

One Year ◽

The Brain

Hydrocephalus is a disorder where cerebrospinal fluid (CSF) is unable to drain efficiently from the brain. This paper presents a set of exploratory analyses comparing attributes of inpatients under one-year old diagnosed with hydrocephalus provided by the Agency for Healthcare Research and Quality (AHRQ) as part of the National Inpatient Sample (NIS). The general methods include calculation of summary statistics, kernel density estimation, logistic regression, linear regression, and the production of figures and charts using the statistical data modeling software, SAS. It was determined that younger infants show higher mortality rates; additionally, males are more likely to present hydrocephalus and cost slightly more on average than females despite the distribution curves for length of stay appearing virtually identical between genders. Diagnoses and procedures expected for non-hydrocephalic infants showed a negative correlation in the logistic model. The study overall validates much of the literature and expands it with a cost analysis approach.

Download Full-text

A Hybrid Data Preprocessing Technique based on Maximum Likelihood Logistic Regression with Filtering for Enhancing Software Defect Prediction

2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE) ◽

10.1109/iske47853.2019.9170328 ◽

2019 ◽

Author(s):

Kamal Bashir ◽

Tayseer Ali ◽

Mahama Yahaya ◽

Ahmed Saad Hussein

Keyword(s):

Logistic Regression ◽

Maximum Likelihood ◽

Data Preprocessing ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Hybrid Data ◽

Preprocessing Technique

Download Full-text

On modeling software defect repair time

Empirical Software Engineering ◽

10.1007/s10664-008-9064-x ◽

2008 ◽

Vol 14 (2) ◽

pp. 165-186 ◽

Cited By ~ 19

Author(s):

Rattikorn Hewett ◽

Phongphun Kijsanayothin

Keyword(s):

Repair Time ◽

Software Defect ◽

Modeling Software ◽

Defect Repair

Download Full-text

Bayesian Logistic Regression for software defect prediction (S)

Proceedings of the 30th International Conference on Software Engineering and Knowledge Engineering ◽

10.18293/seke2018-181 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jinu M Sunil ◽

Lov Kumar ◽

Lalita Bhanu Murthy Neti

Keyword(s):

Logistic Regression ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Bayesian Logistic Regression

Download Full-text

How Useful is the Power Law of Practice for Recognizing Practice in Concentration Tests?

European Journal of Psychological Assessment ◽

10.1027/1015-5759.23.3.157 ◽

2007 ◽

Vol 23 (3) ◽

pp. 157-165 ◽

Cited By ~ 4

Author(s):

Carmen Hagemeister

Keyword(s):

Logistic Regression ◽

Reaction Time ◽

Power Law ◽

Error Rate ◽

Reaction Times ◽

Practice Effect ◽

Practice Effects ◽

Rate Decrease ◽

Small Practice

Abstract. When concentration tests are completed repeatedly, reaction time and error rate decrease considerably, but the underlying ability does not improve. In order to overcome this validity problem this study aimed to test if the practice effect between tests and within tests can be useful in determining whether persons have already completed this test. The power law of practice postulates that practice effects are greater in unpracticed than in practiced persons. Two experiments were carried out in which the participants completed the same tests at the beginning and at the end of two test sessions set about 3 days apart. In both experiments, the logistic regression could indeed classify persons according to previous practice through the practice effect between the tests at the beginning and at the end of the session, and, less well but still significantly, through the practice effect within the first test of the session. Further analyses showed that the practice effects correlated more highly with the initial performance than was to be expected for mathematical reasons; typically persons with long reaction times have larger practice effects. Thus, small practice effects alone do not allow one to conclude that a person has worked on the test before.

Download Full-text

Perceived Effectiveness of Random Testing for Alcohol and Drugs in the Australian Aviation Community

Aviation Psychology and Applied Human Factors ◽

10.1027/2192-0923/a000031 ◽

2012 ◽

Vol 2 (2) ◽

pp. 72-81

Author(s):

Christina M. Rudin-Brown ◽

Eve Mitsopoulos-Rubens ◽

Michael G. Lenné

Keyword(s):

Logistic Regression ◽

Online Survey ◽

Random Testing ◽

Regression Analyses ◽

Perceived Effectiveness ◽

Student Pilots ◽

Alcohol And Drugs ◽

Positive Attitudes ◽

One Year ◽

Alcohol And Other Drugs

Random testing for alcohol and other drugs (AODs) in individuals who perform safety-sensitive activities as part of their aviation role was introduced in Australia in April 2009. One year later, an online survey (N = 2,226) was conducted to investigate attitudes, behaviors, and knowledge regarding random testing and to gauge perceptions regarding its effectiveness. Private, recreational, and student pilots were less likely than industry personnel to report being aware of the requirement (86.5% versus 97.1%), to have undergone testing (76.5% versus 96.1%), and to know of others who had undergone testing (39.9% versus 84.3%), and they had more positive attitudes toward random testing than industry personnel. However, logistic regression analyses indicated that random testing is more effective at deterring AOD use among industry personnel.

Download Full-text