Improving process operations using support vector machines and decision trees

AIChE Journal ◽  
2005 ◽  
Vol 51 (2) ◽  
pp. 526-543 ◽  
Author(s):  
Gorden T. Jemwa ◽  
Chris Aldrich
2014 ◽  
Vol 16 (6) ◽  
pp. 1265-1279 ◽  
Author(s):  
Robert Richard Harvey ◽  
Edward Arthur McBean

Closed-circuit television inspection technology is traditionally used to identify aging sewer pipes requiring rehabilitation. While these inspections provide essential information on the condition of pipes hidden from day-to-day view, they are expensive and often limited to small portions of an entire sewer system. Municipalities may benefit from utilizing predictive analytics to leverage existing inspection datasets so that reliable predictions of condition are available for pipes that have not yet been inspected. The predictive capabilities of data mining systems, namely support vector machines (SVMs) and decision tree classifiers, are demonstrated using a case study of sanitary sewer pipe inspection data collected by the municipality of Guelph, Ontario, Canada. The modeling algorithms are implemented using open-source software and are tuned to counteract the negative impact on predictive performance resulting from class imbalance common within pipe inspection datasets. The decision tree classifier outperforms SVM for this classification task – achieving an acceptable area under the receiver operating characteristic curve of 0.77 and an overall accuracy of 76% on a stratified test set. Although predicting individual pipe condition is a notoriously difficult task, decision trees are found to be a useful screening tool for planning future inspection-related activities.


2009 ◽  
Vol 15 (2) ◽  
pp. 215-239 ◽  
Author(s):  
ADRIANA BADULESCU ◽  
DAN MOLDOVAN

AbstractAn important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper addresses the automatic classification of thesemantic relationsexpressed by English genitives. A learning model is introduced based on the statistical analysis of the distribution of genitives' semantic relations in a corpus. The semantic and contextual features of the genitive's noun phrase constituents play a key role in the identification of the semantic relation. The algorithm was trained and tested on a corpus of approximately 20,000 sentences and achieved an f-measure of 79.80 per cent for of-genitives, far better than the 40.60 per cent obtained using a Decision Trees algorithm, the 50.55 per cent obtained using a Naive Bayes algorithm, or the 72.13 per cent obtained using a Support Vector Machines algorithm on the same corpus using the same features. The results were similar for s-genitives: 78.45 per cent using Semantic Scattering, 47.00 per cent using Decision Trees, 43.70 per cent using Naive Bayes, and 70.32 per cent using a Support Vector Machines algorithm. The results demonstrate the importance of word sense disambiguation and semantic generalization/specialization for this task. They also demonstrate that different patterns (in our case the two types of genitive constructions) encode different semantic information and should be treated differently in the sense that different models should be built for different patterns.


2020 ◽  
Vol 10 (19) ◽  
pp. 6979
Author(s):  
Minho Ryu ◽  
Kichun Lee

Support vector machines (SVMs) are a well-known classifier due to their superior classification performance. They are defined by a hyperplane, which separates two classes with the largest margin. In the computation of the hyperplane, however, it is necessary to solve a quadratic programming problem. The storage cost of a quadratic programming problem grows with the square of the number of training sample points, and the time complexity is proportional to the cube of the number in general. Thus, it is worth studying how to reduce the training time of SVMs without compromising the performance to prepare for sustainability in large-scale SVM problems. In this paper, we proposed a novel data reduction method for reducing the training time by combining decision trees and relative support distance. We applied a new concept, relative support distance, to select good support vector candidates in each partition generated by the decision trees. The selected support vector candidates improved the training speed for large-scale SVM problems. In experiments, we demonstrated that our approach significantly reduced the training time while maintaining good classification performance in comparison with existing approaches.


2012 ◽  
pp. 414-427 ◽  
Author(s):  
Marco Vannucci ◽  
Valentina Colla ◽  
Silvia Cateni ◽  
Mirko Sgarbi

In this chapter a survey on the problem of classification tasks in unbalanced datasets is presented. The effect of the imbalance of the distribution of target classes in databases is analyzed with respect to the performance of standard classifiers such as decision trees and support vector machines, and the main approaches to improve the generally not satisfactory results obtained by such methods are described. Finally, two typical applications coming from real world frameworks are introduced, and the uses of the techniques employed for the related classification tasks are shown in practice.


Sign in / Sign up

Export Citation Format

Share Document