scholarly journals A New Aggregated Attribute Values Match Technique for Improving the Quality of Probability Estimated Decision Trees

Probability estimations of decision trees may not be useful directly because their poor probability estimations but the best probability estimations are desired in many useful applications. Many techniques have been proposed for obtaining good probability estimations of decision trees. Two such optical techniques are identified and the first one is single tree based aggregation of mismatched attribute values of instances. The second one is bagging technique but it is costly and less comprehensible. So, in this paper a single aggregated probability estimation decision tree model technique is proposed for improving the performance of probability estimations of decision trees and the performance of new technique is evaluated using area under the curve (AUC) evaluation technique. The proposed technique computes aggregate scores based on matched attribute values of test tuples.

2017 ◽  
Vol 2017 ◽  
pp. 1-6 ◽  
Author(s):  
Zhong Xin ◽  
Lin Hua ◽  
Xu-Hong Wang ◽  
Dong Zhao ◽  
Cai-Guo Yu ◽  
...  

We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension) with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC) for detecting diabetes were calculated. The AUC values in internal and external validation groups were 0.708 and 0.629, respectively. Subjects with high risk of diabetes had significantly higher HOMA-IR, but no significant difference in HOMA-B was observed. This simple tool will help general practitioners and residents assess the risk of diabetes quickly and easily. This study also validates the strong associations of insulin resistance and early stage of diabetes, suggesting that more attention should be paid to the current model in rural Chinese adult populations.


2019 ◽  
Vol 2019 (1) ◽  
pp. 266-286 ◽  
Author(s):  
Anselme Tueno ◽  
Florian Kerschbaum ◽  
Stefan Katzenbeisser

Abstract Decision trees are widespread machine learning models used for data classification and have many applications in areas such as healthcare, remote diagnostics, spam filtering, etc. In this paper, we address the problem of privately evaluating a decision tree on private data. In this scenario, the server holds a private decision tree model and the client wants to classify its private attribute vector using the server’s private model. The goal is to obtain the classification while preserving the privacy of both – the decision tree and the client input. After the computation, only the classification result is revealed to the client, while nothing is revealed to the server. Many existing protocols require a constant number of rounds. However, some of these protocols perform as many comparisons as there are decision nodes in the entire tree and others transform the whole plaintext decision tree into an oblivious program, resulting in higher communication costs. The main idea of our novel solution is to represent the tree as an array. Then we execute only d – the depth of the tree – comparisons. Each comparison is performed using a small garbled circuit, which output secret-shares of the index of the next node. We get the inputs to the comparison by obliviously indexing the tree and the attribute vector. We implement oblivious array indexing using either garbled circuits, Oblivious Transfer or Oblivious RAM (ORAM). Using ORAM, this results in the first protocol with sub-linear cost in the size of the tree. We implemented and evaluated our solution using the different array indexing procedures mentioned above. As a result, we are not only able to provide the first protocol with sublinear cost for large trees, but also reduce the communication cost for the large real-world data set “Spambase” from 18 MB to 1[triangleright]2 MB and the computation time from 17 seconds to less than 1 second in a LAN setting, compared to the best related work.


2018 ◽  
Vol 10 (3) ◽  
pp. 106
Author(s):  
Mirza Suljic ◽  
Edin Osmanbegovic ◽  
Željko Dobrović

The subject of this paper is metamodeling and its application in the field of scientific research. The main goal is to explore the possibilities of integration of two methods: questionnaires and decision trees. The questionnaire method was established as one of the methods for data collecting, while the decision tree method represents an alternative way of presenting and analyzing decision making situations. These two methods are not completely independent, but on the contrary, there is a strong natural bond between them. Therefore, the result reveals a common meta-model that over common concepts and with the use of metamodeling connects the methods: questionnaires and decision trees. The obtained results can be used to create a CASE tool or create repository that can be suitable for exchange between different systems. The proposed meta-model is not necessarily the final product. It could be further developed by adding more entities that will keep some other data.


2020 ◽  
Vol 34 (04) ◽  
pp. 6413-6421
Author(s):  
Mike Wu ◽  
Sonali Parbhoo ◽  
Michael Hughes ◽  
Ryan Kindle ◽  
Leo Celi ◽  
...  

The lack of interpretability remains a barrier to adopting deep neural networks across many safety-critical domains. Tree regularization was recently proposed to encourage a deep neural network's decisions to resemble those of a globally compact, axis-aligned decision tree. However, it is often unreasonable to expect a single tree to predict well across all possible inputs. In practice, doing so could lead to neither interpretable nor performant optima. To address this issue, we propose regional tree regularization – a method that encourages a deep model to be well-approximated by several separate decision trees specific to predefined regions of the input space. Across many datasets, including two healthcare applications, we show our approach delivers simpler explanations than other regularization schemes without compromising accuracy. Specifically, our regional regularizer finds many more “desirable” optima compared to global analogues.


Author(s):  
Linfeng He ◽  
Shuo Wang

Nowadays, with the improvement in people’s quality of life, more and more people choose to travel abroad in leisure time. Therefore, the great difference in lifestyles could cause culture shock, which make tourists’ trip a relative awful experience, especially in the popular resorts of some countries such as Europe and America, where tips are given whenever there is a service being provided, and this behaviour confused lots of tourists from Asia, Africa or even European countries. Based on decision tree, this study aimed to share troubles of those tourists with needs by taking the New York City Taxi and Limousine Commission’s database as research’s target, thereby analysing factors that may cause variation in tipping rates. Finally, it is found that distance and duration contributed most to trip fare. Also, besides accurately predicting the tip amount, how each factor contributes to the trip needs to be known; thus, linear regression is used to check the validity of the result, and a 14.1% rate of all distance as a fee is achieved. This study provides a model – which can also be used in many different places – that tremendously improve people’s quality of life, and the main idea is relative to the fuzzy system in solving social problems.


Author(s):  
Dariusz AMPUŁA

The article presents a brief history of creation of decision trees and defines the purpose of the undertaken works. The process of building a classification tree, according to the CHAID method, is shown paying particular attention to the disadvantages, advantages, and characteristics features of this method, as well as to the formal requirements that are necessary to build this model. The tree’s building method for UZRGM (Universal Modernised Fuze of Hand Grenades) fuzes was characterized, specifying the features of the tested hand grenade fuzes and the predictors used that are necessary to create the correct tree model. A classification tree was built basing on the test results, assuming the accepted post-diagnostic decision as a qualitative dependent variable. A schema of the designed tree for the first diagnostic tests, its full structure and the size of individual classes of the node are shown. The matrix of incorrect classifications was determined, which determines the accuracy of incorrect predictions, i.e., correctness of the performed classification. A sheet with risk assessment and standard error for the learning sample and the v-fold cross-check were presented. On the selected examples, the quality of the resulting predictive model was assessed by means of a graph of the cumulative value of the lift coefficient and the "ROC" curve


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0255033
Author(s):  
Bohao Wang ◽  
Zhiquan He ◽  
Zhijie Yi ◽  
Chun Yuan ◽  
Wenshuai Suo ◽  
...  

Background Severe fever with thrombocytopenia syndrome (SFTS) is a serious infectious disease with a fatality of up to 30%. To identify the severity of SFTS precisely and quickly is important in clinical practice. Methods From June to July 2020, 71 patients admitted to the Infectious Department of Joint Logistics Support Force No. 990 Hospital were enrolled in this study. The most frequently observed symptoms and laboratory parameters on admission were collected by investigating patients’ electronic records. Decision trees were built to identify the severity of SFTS. Accuracy and Youden’s index were calculated to evaluate the identification capacity of the models. Results Clinical characteristics, including body temperature (p = 0.011), the size of the lymphadenectasis (p = 0.021), and cough (p = 0.017), and neurologic symptoms, including lassitude (p<0.001), limb tremor (p<0.001), hypersomnia (p = 0.009), coma (p = 0.018) and dysphoria (p = 0.008), were significantly different between the mild and severe groups. As for laboratory parameters, PLT (p = 0.006), AST (p<0.001), LDH (p<0.001), and CK (p = 0.003) were significantly different between the mild and severe groups of SFTS patients. A decision tree based on laboratory parameters and one based on demographic and clinical characteristics were built. Comparing with the decision tree based on demographic and clinical characteristics, the decision tree based on laboratory parameters had a stronger prediction capacity because of its higher accuracy and Youden’s index. Conclusion Decision trees can be applied to predict the severity of SFTS.


Author(s):  
Ankita Bansal ◽  
Sourabh Jajoria

Changes in software systems are inevitable. Identification of change-prone modules can help developers to focus efforts and resources on them. In this article, the authors conduct various intra-project and cross-project change predictions. The authors use distributional characteristics of dataset to generate rules which can be used for successful change prediction. The authors analyze the effectiveness of meta-heuristic decision trees in generating rules for successful cross-project change prediction. The employed meta-heuristic algorithms are hybrid decision tree genetic algorithms and oblique decision trees with evolutionary learning. The authors compare the performance of these meta-heuristic algorithms with C4.5 decision tree model. The authors observe that the accuracy of C4.5 decision tree is 73.33%, whereas the accuracy of the hybrid decision tree genetic algorithm and oblique decision tree are 75.00% and 75.56%, respectively. These values indicate that distributional characteristics are helpful in identifying suitable training set for cross-project change prediction.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 5979 ◽  
Author(s):  
Piotr Lipinski ◽  
Edyta Brzychczy ◽  
Radoslaw Zimroz

Monitoring the condition of rotating machinery, especially planetary gearboxes, is a challenging problem. In most of the available approaches, diagnostic procedures are related to advanced signal pre-processing/feature extraction methods or advanced data (features) analysis by using artificial intelligence. In this paper, the second approach is explored, so an application of decision trees for the classification of spectral-based 15D vectors of diagnostic data is proposed. The novelty of this paper is that by a combination of spectral analysis and the application of decision trees to a set of spectral features, we are able to take advantage of the multidimensionality of diagnostic data and classify/recognize the gearbox condition almost faultlessly even in non-stationary operating conditions. The diagnostics of time-varying systems are a complicated issue due to time-varying probability densities estimated for features. Using multidimensional data instead of an aggregated 1D feature, it is possible to improve the efficiency of diagnostics. It can be underlined that in comparison to previous work related to the same data, where the aggregated 1D variable was used, the efficiency of the proposed approach is around 99% (ca. 19% better). We tested several algorithms: classification and regression trees with the Gini index and entropy, as well as the random tree. We compare the obtained results with the K-nearest neighbors classification algorithm and meta-classifiers, namely: random forest and AdaBoost. As a result, we created the decision tree model with 99.74% classification accuracy on the test dataset.


2019 ◽  
Vol 10 (1) ◽  
pp. 43-61 ◽  
Author(s):  
Ankita Bansal ◽  
Sourabh Jajoria

Changes in software systems are inevitable. Identification of change-prone modules can help developers to focus efforts and resources on them. In this article, the authors conduct various intra-project and cross-project change predictions. The authors use distributional characteristics of dataset to generate rules which can be used for successful change prediction. The authors analyze the effectiveness of meta-heuristic decision trees in generating rules for successful cross-project change prediction. The employed meta-heuristic algorithms are hybrid decision tree genetic algorithms and oblique decision trees with evolutionary learning. The authors compare the performance of these meta-heuristic algorithms with C4.5 decision tree model. The authors observe that the accuracy of C4.5 decision tree is 73.33%, whereas the accuracy of the hybrid decision tree genetic algorithm and oblique decision tree are 75.00% and 75.56%, respectively. These values indicate that distributional characteristics are helpful in identifying suitable training set for cross-project change prediction.


Sign in / Sign up

Export Citation Format

Share Document