Cross-Project Change Prediction Using Meta-Heuristic Techniques

Author(s):  
Ankita Bansal ◽  
Sourabh Jajoria

Changes in software systems are inevitable. Identification of change-prone modules can help developers to focus efforts and resources on them. In this article, the authors conduct various intra-project and cross-project change predictions. The authors use distributional characteristics of dataset to generate rules which can be used for successful change prediction. The authors analyze the effectiveness of meta-heuristic decision trees in generating rules for successful cross-project change prediction. The employed meta-heuristic algorithms are hybrid decision tree genetic algorithms and oblique decision trees with evolutionary learning. The authors compare the performance of these meta-heuristic algorithms with C4.5 decision tree model. The authors observe that the accuracy of C4.5 decision tree is 73.33%, whereas the accuracy of the hybrid decision tree genetic algorithm and oblique decision tree are 75.00% and 75.56%, respectively. These values indicate that distributional characteristics are helpful in identifying suitable training set for cross-project change prediction.

2019 ◽  
Vol 10 (1) ◽  
pp. 43-61 ◽  
Author(s):  
Ankita Bansal ◽  
Sourabh Jajoria

Changes in software systems are inevitable. Identification of change-prone modules can help developers to focus efforts and resources on them. In this article, the authors conduct various intra-project and cross-project change predictions. The authors use distributional characteristics of dataset to generate rules which can be used for successful change prediction. The authors analyze the effectiveness of meta-heuristic decision trees in generating rules for successful cross-project change prediction. The employed meta-heuristic algorithms are hybrid decision tree genetic algorithms and oblique decision trees with evolutionary learning. The authors compare the performance of these meta-heuristic algorithms with C4.5 decision tree model. The authors observe that the accuracy of C4.5 decision tree is 73.33%, whereas the accuracy of the hybrid decision tree genetic algorithm and oblique decision tree are 75.00% and 75.56%, respectively. These values indicate that distributional characteristics are helpful in identifying suitable training set for cross-project change prediction.


2019 ◽  
Vol 2019 (1) ◽  
pp. 266-286 ◽  
Author(s):  
Anselme Tueno ◽  
Florian Kerschbaum ◽  
Stefan Katzenbeisser

Abstract Decision trees are widespread machine learning models used for data classification and have many applications in areas such as healthcare, remote diagnostics, spam filtering, etc. In this paper, we address the problem of privately evaluating a decision tree on private data. In this scenario, the server holds a private decision tree model and the client wants to classify its private attribute vector using the server’s private model. The goal is to obtain the classification while preserving the privacy of both – the decision tree and the client input. After the computation, only the classification result is revealed to the client, while nothing is revealed to the server. Many existing protocols require a constant number of rounds. However, some of these protocols perform as many comparisons as there are decision nodes in the entire tree and others transform the whole plaintext decision tree into an oblivious program, resulting in higher communication costs. The main idea of our novel solution is to represent the tree as an array. Then we execute only d – the depth of the tree – comparisons. Each comparison is performed using a small garbled circuit, which output secret-shares of the index of the next node. We get the inputs to the comparison by obliviously indexing the tree and the attribute vector. We implement oblivious array indexing using either garbled circuits, Oblivious Transfer or Oblivious RAM (ORAM). Using ORAM, this results in the first protocol with sub-linear cost in the size of the tree. We implemented and evaluated our solution using the different array indexing procedures mentioned above. As a result, we are not only able to provide the first protocol with sublinear cost for large trees, but also reduce the communication cost for the large real-world data set “Spambase” from 18 MB to 1[triangleright]2 MB and the computation time from 17 seconds to less than 1 second in a LAN setting, compared to the best related work.


Probability estimations of decision trees may not be useful directly because their poor probability estimations but the best probability estimations are desired in many useful applications. Many techniques have been proposed for obtaining good probability estimations of decision trees. Two such optical techniques are identified and the first one is single tree based aggregation of mismatched attribute values of instances. The second one is bagging technique but it is costly and less comprehensible. So, in this paper a single aggregated probability estimation decision tree model technique is proposed for improving the performance of probability estimations of decision trees and the performance of new technique is evaluated using area under the curve (AUC) evaluation technique. The proposed technique computes aggregate scores based on matched attribute values of test tuples.


Author(s):  
Bin-Bin Yang ◽  
Song-Qing Shen ◽  
Wei Gao

Decision trees have attracted much attention during the past decades. Previous decision trees include axis-parallel and oblique decision trees; both of them try to find the best splits via exhaustive search or heuristic algorithms in each iteration. Oblique decision trees generally simplify tree structure and take better performance, but are always accompanied with higher computation, as well as the initialization with the best axis-parallel splits. This work presents the Weighted Oblique Decision Tree (WODT) based on continuous optimization with random initialization. We consider different weights of each instance for child nodes at all internal nodes, and then obtain a split by optimizing the continuous and differentiable objective function of weighted information entropy. Extensive experiments show the effectiveness of the proposed algorithm.


2017 ◽  
Author(s):  
Robbi Rahim ◽  
Efori Buulolo ◽  
Natalia Silalahi ◽  
Fadlina

One of the impacts of the quake was heavily damaged, the even tsunami killed at no less. One cause many deaths is because many can not predict the impact of earthquakes. Data earthquakes that occurred earlier can be used to predict the incidence of the quake will probably happen someday. One algorithm that can be used to predict is the algorithm C4.5. The results of the algorithm C4.5 decision tree form, decision trees characteristic or condition of the earthquake and the decision, where the decision is a fruit of the earthquake that occurred modeling


Author(s):  
Sujuan Jia ◽  
Yajing Pang

Vast data in the higher education system are used to analyse and evaluate the teaching quality, so that the key factors that affect the quality of teaching can be predicted. Besides, the learner’s personalized behaviour can also become the data source for teaching result prediction. This paper proposes a decision tree model by taking the teaching quality data and the statistical analysis results of the learn-er’s personalized behaviour as inputs. This model was based on the improved C4.5 decision tree algorithm, which used the FAYYAD boundary point decision theorem for effectively reducing the computation time to the most threshold. In this algorithm, the iterative analysis mechanism was introduced in combination with the data change of the learner’s personalized behaviour, so as to dynamically adjust the final teaching evaluation result. Finally, according to the actual statisti-cal data of one academic year, the teaching quality evaluation was effectively completed and the direction of future teaching prediction was proposed.


2018 ◽  
Vol 10 (3) ◽  
pp. 106
Author(s):  
Mirza Suljic ◽  
Edin Osmanbegovic ◽  
Željko Dobrović

The subject of this paper is metamodeling and its application in the field of scientific research. The main goal is to explore the possibilities of integration of two methods: questionnaires and decision trees. The questionnaire method was established as one of the methods for data collecting, while the decision tree method represents an alternative way of presenting and analyzing decision making situations. These two methods are not completely independent, but on the contrary, there is a strong natural bond between them. Therefore, the result reveals a common meta-model that over common concepts and with the use of metamodeling connects the methods: questionnaires and decision trees. The obtained results can be used to create a CASE tool or create repository that can be suitable for exchange between different systems. The proposed meta-model is not necessarily the final product. It could be further developed by adding more entities that will keep some other data.


2019 ◽  
Vol 5 (1) ◽  
pp. 23-28
Author(s):  
Astrid Noviriandini ◽  
Nurajijah Nurajijah

This research informs students and teachers to anticipate early in following the learning period in order to get maximum learning outcomes. The method used is C4.5 decision tree algorithm and Naïve Bayes algorithm. The purpose of this study was to compare and evaluate the decision tree model C4.5 as the selected algorithm and Naïve Bayes to find out algorithms that have higher accuracy in predicting student achievement. Learning achievement can be measured by the value of report cards. After comparison of the two algorithms, the results of the learning achievement prediction are obtained. The results showed that the Naïve Bayes algorithm had an accuracy value of 95.67% and the AUC value of 0.999 was included in Excellent Clasification, for the C4.5 algorithm the accuracy value was 90.91% and the AUC value of 0.639 was included in the state of Poor Clasification. Thus the Naïve Bayes algorithm can better predict student achievement.


Sign in / Sign up

Export Citation Format

Share Document