scholarly journals Mining Quantitative Rules in a Software Project Data Set

2007 ◽  
Vol 3 ◽  
pp. 518-527 ◽  
Author(s):  
Shuji Morisaki ◽  
Akito Monden ◽  
Haruaki Tamada ◽  
Tomoko Matsumura ◽  
Ken-ichi Matsumoto
2018 ◽  
Vol 232 ◽  
pp. 03017
Author(s):  
Jie Zhang ◽  
Gang Wang ◽  
Haobo Jiang ◽  
Fangzheng Zhao ◽  
Guilin Tian

Software Defect Prediction has been an important part of Software engineering research since the 1970s. This technique is used to calculate and analyze the measurement and defect information of the historical software module to complete the defect prediction of the new software module. Currently, most software defect prediction model is established on the basis of the same software project data set. The training date sets used to construct the model and the test data sets used to validate the model are from the same software projects. But in practice, for those has less historical data of a software project or new projects, the defect of traditional prediction method shows lower forecast performance. For the traditional method, when the historical data is insufficient, the software defect prediction model cannot be fully studied. It is difficult to achieve high prediction accuracy. In the process of cross-project prediction, the problem that we will faced is data distribution differences. For the above problems, this paper presents a software defect prediction model based on migration learning and traditional software defect prediction model. This model uses the existing project data sets to predict software defects across projects. The main work of this article includes: 1) Data preprocessing. This section includes data feature correlation analysis, noise reduction and so on, which effectively avoids the interference of over-fitting problem and noise data on prediction results. 2) Migrate learning. This section analyzes two different but related project data sets and reduces the impact of data distribution differences. 3) Artificial neural networks. According to class imbalance problems of the data set, using artificial neural network and dynamic selection training samples reduce the influence of prediction results because of the positive and negative samples data. The data set of the Relink project and AEEEM is studied to evaluate the performance of the f-measure and the ROC curve and AUC calculation. Experiments show that the model has high predictive performance.


Author(s):  
Nuthan Munaiah ◽  
Steven Kroh ◽  
Craig Cabrey ◽  
Meiyappan Nagappan

Software forges like GitHub host millions of repositories. Software engineering researchers have been able to take advantage of such a large corpora of potential study subjects with the help of tools like GHTorrent and Boa. However, the simplicity in querying comes with a caveat: there are limited means of separating the signal (e.g. repositories containing engineered software projects) from the noise (e.g. repositories containing home work assignments). The proportion of noise in a random sample of repositories could skew the study and may lead to researchers reaching unrealistic, potentially inaccurate, conclusions. We argue that it is imperative to have the ability to sieve out the noise in such large repository forges. We propose a framework, and present a reference implementation of the framework as a tool called reaper, to enable researchers to select GitHub repositories that contain evidence of an engineered software project. We identify software engineering practices (called dimensions) and propose means for validating their existence in a GitHub repository. We used reaper to measure the dimensions of 1,994,977 GitHub repositories. We then used the data set train classifiers capable of predicting if a given GitHub repository contains an engineered software project. The performance of the classifiers was evaluated using a set of 200 repositories with known ground truth classification. We also compared the performance of the classifiers to other approaches to classification (e.g. number of GitHub Stargazers) and found our classifiers to outperform existing approaches. We found stargazers-based classifier to exhibit high precision (96%) but an inversely proportional recall (27%). On the other hand, our best classifier exhibited a high precision (82%) and a high recall (83%). The stargazer-based criteria offers precision but fails to recall a significant potion of the population.


2021 ◽  
Vol 108 (Supplement_7) ◽  
Author(s):  
Nandu Nair ◽  
Vasileios Kalatzis ◽  
Madhavi Gudipati ◽  
Anne Gaunt ◽  
Vishnu Machineni

Abstract Aims During the period December-2018 to November-2019 a total of 84 cases were entered on the NELA website, corresponding to HES data suggesting 392 laparotomies. This suggests a possible case acquisition of 21% prompting us to look at our data acquisition in detail. Methods Interrogation of the NELA data from January–March 2020 was done from NELA website and hospital records. Results Analysis revealed that during this period 45 patients had laparotomy recorded whereas hospital database recorded 68 laparotomies. Of the 45 cases entered on the NELA database, only 1 patient had a complete data set entered.  22 cases had 87% data entry and 22 cases had <50% of the data fields completed. Firstly, we were not capturing all patients who underwent an emergency laparotomy and secondly our data entry for the patients we did report was incomplete.  This led us to engage in a quality improvement project with following measures - Conclusions We re-assessed the case ascertainment and completeness of data collection in the period April 2020 – June 2020 and case ascertainment rate increased to 54% and all the entries were complete and locked.


Author(s):  
Lei Shi ◽  
Linda Newnes ◽  
Steve Culley ◽  
Bruce Allen

AbstractAn engineering service project can be highly interactive, collaborative, and distributed. The implementation of such projects needs to generate, utilize, and share large amounts of data and heterogeneous digital objects. The information overload prevents the effective reuse of project data and knowledge, and makes the understanding of project characteristics difficult. Toward solving these issues, this paper emphasized the using of data mining and machine learning techniques to improve the project characteristic understanding process. The work presented in this paper proposed an automatic model and some analytical approaches for learning and predicting the characteristics of engineering service projects. To evaluate the model and demonstrate its functionalities, an industrial data set from the aerospace sector is considered as a the case study. This work shows that the proposed model could enable the project members to gain comprehensive understanding of project characteristics from a multidimensional perspective, and it has the potential to support them in implementing evidence-based design and decision making.


Author(s):  
George Chatzikonstantinou ◽  
Kostas Kontogiannis ◽  
Ioanna-Maria Attarian

Sign in / Sign up

Export Citation Format

Share Document