A new imputation method for small software project data sets

2007 ◽  
Vol 80 (1) ◽  
pp. 51-62 ◽  
Author(s):  
Qinbao Song ◽  
Martin Shepperd
2018 ◽  
Vol 232 ◽  
pp. 03017
Author(s):  
Jie Zhang ◽  
Gang Wang ◽  
Haobo Jiang ◽  
Fangzheng Zhao ◽  
Guilin Tian

Software Defect Prediction has been an important part of Software engineering research since the 1970s. This technique is used to calculate and analyze the measurement and defect information of the historical software module to complete the defect prediction of the new software module. Currently, most software defect prediction model is established on the basis of the same software project data set. The training date sets used to construct the model and the test data sets used to validate the model are from the same software projects. But in practice, for those has less historical data of a software project or new projects, the defect of traditional prediction method shows lower forecast performance. For the traditional method, when the historical data is insufficient, the software defect prediction model cannot be fully studied. It is difficult to achieve high prediction accuracy. In the process of cross-project prediction, the problem that we will faced is data distribution differences. For the above problems, this paper presents a software defect prediction model based on migration learning and traditional software defect prediction model. This model uses the existing project data sets to predict software defects across projects. The main work of this article includes: 1) Data preprocessing. This section includes data feature correlation analysis, noise reduction and so on, which effectively avoids the interference of over-fitting problem and noise data on prediction results. 2) Migrate learning. This section analyzes two different but related project data sets and reduces the impact of data distribution differences. 3) Artificial neural networks. According to class imbalance problems of the data set, using artificial neural network and dynamic selection training samples reduce the influence of prediction results because of the positive and negative samples data. The data set of the Relink project and AEEEM is studied to evaluate the performance of the f-measure and the ROC curve and AUC calculation. Experiments show that the model has high predictive performance.


Author(s):  
SUMANTH YENDURI ◽  
S. S. IYENGAR

In this study, we compare the performance of four different imputation strategies ranging from the commonly used Listwise Deletion to model based approaches such as the Maximum Likelihood on enhancing completeness in incomplete software project data sets. We evaluate the impact of each of these methods by implementing them on six different real-time software project data sets which are classified into different categories based on their inherent properties. The reliability of the constructed data sets using these techniques are further tested by building prediction models using stepwise regression. The experimental results are noted and the findings are finally discussed.


Neurosurgery ◽  
2011 ◽  
Vol 68 (2) ◽  
pp. 496-505 ◽  
Author(s):  
Alexandra J. Golby ◽  
Gordon Kindlmann ◽  
Isaiah Norton ◽  
Alexander Yarmarkovich ◽  
Steven Pieper ◽  
...  

Abstract BACKGROUND: Diffusion tensor imaging (DTI) infers the trajectory and location of large white matter tracts by measuring the anisotropic diffusion of water. DTI data may then be analyzed and presented as tractography for visualization of the tracts in 3 dimensions. Despite the important information contained in tractography images, usefulness for neurosurgical planning has been limited by the inability to define which are critical structures within the mass of demonstrated fibers and to clarify their relationship to the tumor. OBJECTIVE: To develop a method to allow the interactive querying of tractography data sets for surgical planning and to provide a working software package for the research community. METHODS: The tool was implemented within an open source software project. Echo-planar DTI at 3 T was performed on 5 patients, followed by tensor calculation. Software was developed that allowed the placement of a dynamic seed point for local selection of fibers and for fiber display around a segmented structure, both with tunable parameters. A neurosurgeon was trained in the use of software in < 1 hour and used it to review cases. RESULTS: Tracts near tumor and critical structures were interactively visualized in 3 dimensions to determine spatial relationships to lesion. Tracts were selected using 3 methods: anatomical and functional magnetic resonance imaging-defined regions of interest, distance from the segmented tumor volume, and dynamic seed-point spheres. CONCLUSION: Interactive tractography successfully enabled inspection of white matter structures that were in proximity to lesions, critical structures, and functional cortical areas, allowing the surgeon to explore the relationships between them.


2019 ◽  
Vol 6 (339) ◽  
pp. 73-98
Author(s):  
Małgorzata Aleksandra Misztal

The problem of incomplete data and its implications for drawing valid conclusions from statistical analyses is not related to any particular scientific domain, it arises in economics, sociology, education, behavioural sciences or medicine. Almost all standard statistical methods presume that every object has information on every variable to be included in the analysis and the typical approach to missing data is simply to delete them. However, this leads to ineffective and biased analysis results and is not recommended in the literature. The state of the art technique for handling missing data is multiple imputation. In the paper, some selected multiple imputation methods were taken into account. Special attention was paid to using principal components analysis (PCA) as an imputation method. The goal of the study was to assess the quality of PCA‑based imputations as compared to two other multiple imputation techniques: multivariate imputation by chained equations (MICE) and missForest. The comparison was made by artificially simulating different proportions (10–50%) and mechanisms of missing data using 10 complete data sets from the UCI repository of machine learning databases. Then, missing values were imputed with the use of MICE, missForest and the PCA‑based method (MIPCA). The normalised root mean square error (NRMSE) was calculated as a measure of imputation accuracy. On the basis of the conducted analyses, missForest can be recommended as a multiple imputation method providing the lowest rates of imputation errors for all types of missingness. PCA‑based imputation does not perform well in terms of accuracy.


Author(s):  
George Chatzikonstantinou ◽  
Kostas Kontogiannis ◽  
Ioanna-Maria Attarian

Sign in / Sign up

Export Citation Format

Share Document