Improving process operations using support vector machines and decision trees

Closed-circuit television inspection technology is traditionally used to identify aging sewer pipes requiring rehabilitation. While these inspections provide essential information on the condition of pipes hidden from day-to-day view, they are expensive and often limited to small portions of an entire sewer system. Municipalities may benefit from utilizing predictive analytics to leverage existing inspection datasets so that reliable predictions of condition are available for pipes that have not yet been inspected. The predictive capabilities of data mining systems, namely support vector machines (SVMs) and decision tree classifiers, are demonstrated using a case study of sanitary sewer pipe inspection data collected by the municipality of Guelph, Ontario, Canada. The modeling algorithms are implemented using open-source software and are tuned to counteract the negative impact on predictive performance resulting from class imbalance common within pipe inspection datasets. The decision tree classifier outperforms SVM for this classification task – achieving an acceptable area under the receiver operating characteristic curve of 0.77 and an overall accuracy of 76% on a stratified test set. Although predicting individual pipe condition is a notoriously difficult task, decision trees are found to be a useful screening tool for planning future inspection-related activities.

Download Full-text

A Semantic Scattering model for the automatic interpretation of English genitives

Natural Language Engineering ◽

10.1017/s1351324908004798 ◽

2009 ◽

Vol 15 (2) ◽

pp. 215-239 ◽

Cited By ~ 1

Author(s):

ADRIANA BADULESCU ◽

DAN MOLDOVAN

Keyword(s):

Support Vector Machines ◽

Decision Trees ◽

Naive Bayes ◽

Word Sense Disambiguation ◽

Naïve Bayes ◽

Semantic Relations ◽

Support Vector ◽

Word Sense ◽

Vector Machines ◽

Bayes Algorithm

AbstractAn important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper addresses the automatic classification of thesemantic relationsexpressed by English genitives. A learning model is introduced based on the statistical analysis of the distribution of genitives' semantic relations in a corpus. The semantic and contextual features of the genitive's noun phrase constituents play a key role in the identification of the semantic relation. The algorithm was trained and tested on a corpus of approximately 20,000 sentences and achieved an f-measure of 79.80 per cent for of-genitives, far better than the 40.60 per cent obtained using a Decision Trees algorithm, the 50.55 per cent obtained using a Naive Bayes algorithm, or the 72.13 per cent obtained using a Support Vector Machines algorithm on the same corpus using the same features. The results were similar for s-genitives: 78.45 per cent using Semantic Scattering, 47.00 per cent using Decision Trees, 43.70 per cent using Naive Bayes, and 70.32 per cent using a Support Vector Machines algorithm. The results demonstrate the importance of word sense disambiguation and semantic generalization/specialization for this task. They also demonstrate that different patterns (in our case the two types of genitive constructions) encode different semantic information and should be treated differently in the sense that different models should be built for different patterns.

Download Full-text

Woodland Cover Change Assessment Using Decision Trees, Support Vector Machines and Artificial Neural Networks Classification Algorithms

2011 Fourth International Conference on Intelligent Computation Technology and Automation ◽

10.1109/icicta.2011.363 ◽

2011 ◽

Cited By ~ 3

Author(s):

Xidong Jiang ◽

Meizhen Lin ◽

Junlei Zhao

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Support Vector Machines ◽

Decision Trees ◽

Support Vector ◽

Classification Algorithms ◽

Change Assessment ◽

Vector Machines ◽

Artificial Neural

Download Full-text

Support vector machines, Decision Trees and Neural Networks for auditor selection

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-2008-8305 ◽

2008 ◽

Vol 8 (3) ◽

pp. 213-224 ◽

Cited By ~ 10

Author(s):

Efstathios Kirkos ◽

Charalambos Spathis ◽

Yannis Manolopoulos

Keyword(s):

Neural Networks ◽

Support Vector Machines ◽

Decision Trees ◽

Support Vector ◽

Vector Machines ◽

Auditor Selection

Download Full-text

Selection of Support Vector Candidates Using Relative Support Distance for Sustainability in Large-Scale Support Vector Machines

Applied Sciences ◽

10.3390/app10196979 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6979

Author(s):

Minho Ryu ◽

Kichun Lee

Keyword(s):

Support Vector Machines ◽

Quadratic Programming ◽

Decision Trees ◽

Programming Problem ◽

Large Scale ◽

Classification Performance ◽

Quadratic Programming Problem ◽

Support Vector ◽

Training Time ◽

Vector Machines

Support vector machines (SVMs) are a well-known classifier due to their superior classification performance. They are defined by a hyperplane, which separates two classes with the largest margin. In the computation of the hyperplane, however, it is necessary to solve a quadratic programming problem. The storage cost of a quadratic programming problem grows with the square of the number of training sample points, and the time complexity is proportional to the cube of the number in general. Thus, it is worth studying how to reduce the training time of SVMs without compromising the performance to prepare for sustainability in large-scale SVM problems. In this paper, we proposed a novel data reduction method for reducing the training time by combining decision trees and relative support distance. We applied a new concept, relative support distance, to select good support vector candidates in each partition generated by the decision trees. The selected support vector candidates improved the training speed for large-scale SVM problems. In experiments, we demonstrated that our approach significantly reduced the training time while maintaining good classification performance in comparison with existing approaches.

Download Full-text

Artificial Intelligence Techniques for Unbalanced Datasets in Real World Classification Tasks

Machine Learning ◽

10.4018/978-1-60960-818-7.ch304 ◽

2012 ◽

pp. 414-427 ◽

Cited By ~ 1

Author(s):

Marco Vannucci ◽

Valentina Colla ◽

Silvia Cateni ◽

Mirko Sgarbi

Keyword(s):

Artificial Intelligence ◽

Support Vector Machines ◽

Decision Trees ◽

Real World ◽

Support Vector ◽

Artificial Intelligence Techniques ◽

Vector Machines ◽

Classification Tasks

In this chapter a survey on the problem of classification tasks in unbalanced datasets is presented. The effect of the imbalance of the distribution of target classes in databases is analyzed with respect to the performance of standard classifiers such as decision trees and support vector machines, and the main approaches to improve the generally not satisfactory results obtained by such methods are described. Finally, two typical applications coming from real world frameworks are introduced, and the uses of the techniques employed for the related classification tasks are shown in practice.

Download Full-text