scholarly journals Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

2008 ◽  
Vol 81 (12) ◽  
pp. 2361-2370 ◽  
Author(s):  
Qinbao Song ◽  
Martin Shepperd ◽  
Xiangru Chen ◽  
Jun Liu
2018 ◽  
Vol 232 ◽  
pp. 03017
Author(s):  
Jie Zhang ◽  
Gang Wang ◽  
Haobo Jiang ◽  
Fangzheng Zhao ◽  
Guilin Tian

Software Defect Prediction has been an important part of Software engineering research since the 1970s. This technique is used to calculate and analyze the measurement and defect information of the historical software module to complete the defect prediction of the new software module. Currently, most software defect prediction model is established on the basis of the same software project data set. The training date sets used to construct the model and the test data sets used to validate the model are from the same software projects. But in practice, for those has less historical data of a software project or new projects, the defect of traditional prediction method shows lower forecast performance. For the traditional method, when the historical data is insufficient, the software defect prediction model cannot be fully studied. It is difficult to achieve high prediction accuracy. In the process of cross-project prediction, the problem that we will faced is data distribution differences. For the above problems, this paper presents a software defect prediction model based on migration learning and traditional software defect prediction model. This model uses the existing project data sets to predict software defects across projects. The main work of this article includes: 1) Data preprocessing. This section includes data feature correlation analysis, noise reduction and so on, which effectively avoids the interference of over-fitting problem and noise data on prediction results. 2) Migrate learning. This section analyzes two different but related project data sets and reduces the impact of data distribution differences. 3) Artificial neural networks. According to class imbalance problems of the data set, using artificial neural network and dynamic selection training samples reduce the influence of prediction results because of the positive and negative samples data. The data set of the Relink project and AEEEM is studied to evaluate the performance of the f-measure and the ROC curve and AUC calculation. Experiments show that the model has high predictive performance.


Author(s):  
SUMANTH YENDURI ◽  
S. S. IYENGAR

In this study, we compare the performance of four different imputation strategies ranging from the commonly used Listwise Deletion to model based approaches such as the Maximum Likelihood on enhancing completeness in incomplete software project data sets. We evaluate the impact of each of these methods by implementing them on six different real-time software project data sets which are classified into different categories based on their inherent properties. The reliability of the constructed data sets using these techniques are further tested by building prediction models using stepwise regression. The experimental results are noted and the findings are finally discussed.


Neurosurgery ◽  
2011 ◽  
Vol 68 (2) ◽  
pp. 496-505 ◽  
Author(s):  
Alexandra J. Golby ◽  
Gordon Kindlmann ◽  
Isaiah Norton ◽  
Alexander Yarmarkovich ◽  
Steven Pieper ◽  
...  

Abstract BACKGROUND: Diffusion tensor imaging (DTI) infers the trajectory and location of large white matter tracts by measuring the anisotropic diffusion of water. DTI data may then be analyzed and presented as tractography for visualization of the tracts in 3 dimensions. Despite the important information contained in tractography images, usefulness for neurosurgical planning has been limited by the inability to define which are critical structures within the mass of demonstrated fibers and to clarify their relationship to the tumor. OBJECTIVE: To develop a method to allow the interactive querying of tractography data sets for surgical planning and to provide a working software package for the research community. METHODS: The tool was implemented within an open source software project. Echo-planar DTI at 3 T was performed on 5 patients, followed by tensor calculation. Software was developed that allowed the placement of a dynamic seed point for local selection of fibers and for fiber display around a segmented structure, both with tunable parameters. A neurosurgeon was trained in the use of software in < 1 hour and used it to review cases. RESULTS: Tracts near tumor and critical structures were interactively visualized in 3 dimensions to determine spatial relationships to lesion. Tracts were selected using 3 methods: anatomical and functional magnetic resonance imaging-defined regions of interest, distance from the segmented tumor volume, and dynamic seed-point spheres. CONCLUSION: Interactive tractography successfully enabled inspection of white matter structures that were in proximity to lesions, critical structures, and functional cortical areas, allowing the surgeon to explore the relationships between them.


Author(s):  
George Chatzikonstantinou ◽  
Kostas Kontogiannis ◽  
Ioanna-Maria Attarian

Entropy ◽  
2020 ◽  
Vol 22 (3) ◽  
pp. 351
Author(s):  
Nezamoddin N. Kachouie ◽  
Meshal Shutaywi

Background: A common task in machine learning is clustering data into different groups based on similarities. Clustering methods can be divided in two groups: linear and nonlinear. A commonly used linear clustering method is K-means. Its extension, kernel K-means, is a non-linear technique that utilizes a kernel function to project the data to a higher dimensional space. The projected data will then be clustered in different groups. Different kernels do not perform similarly when they are applied to different datasets. Methods: A kernel function might be relevant for one application but perform poorly to project data for another application. In turn choosing the right kernel for an arbitrary dataset is a challenging task. To address this challenge, a potential approach is aggregating the clustering results to obtain an impartial clustering result regardless of the selected kernel function. To this end, the main challenge is how to aggregate the clustering results. A potential solution is to combine the clustering results using a weight function. In this work, we introduce Weighted Mutual Information (WMI) for calculating the weights for different clustering methods based on their performance to combine the results. The performance of each method is evaluated using a training set with known labels. Results: We applied the proposed Weighted Mutual Information to four data sets that cannot be linearly separated. We also tested the method in different noise conditions. Conclusions: Our results show that the proposed Weighted Mutual Information method is impartial, does not rely on a single kernel, and performs better than each individual kernel specially in high noise.


2007 ◽  
Vol 3 ◽  
pp. 518-527 ◽  
Author(s):  
Shuji Morisaki ◽  
Akito Monden ◽  
Haruaki Tamada ◽  
Tomoko Matsumura ◽  
Ken-ichi Matsumoto

Sign in / Sign up

Export Citation Format

Share Document