A Data-Driven Exploratory Approach for Level Curve Estimation With Autonomous Underwater Agents

This contribution presents a method to estimate environmental boundaries with mobile agents. The agents sample a concentration field of interest at their respective positions and infer a level curve of the unknown field. The presented method is based on support vector machines (SVMs), whereby the concentration level of interest serves as the decision boundary. The field itself does not have to be estimated in order to obtain the level curve which makes the method computationally very appealing. A myopic strategy is developed to pick locations that yield most informative concentration measurements. Cooperative operations of multiple agents are demonstrated by dividing the domain in Voronoi tessellations. Numerical studies demonstrate the feasibility of the method on a real data set of the California coastal area. The exploration strategy is benchmarked against random walk which it clearly outperforms.

Download Full-text

An Algebraic Approach to Clustering and Classification with Support Vector Machines

Mathematics ◽

10.3390/math10010128 ◽

2022 ◽

Vol 10 (1) ◽

pp. 128

Author(s):

Güvenç Arslan ◽

Uğur Madran ◽

Duygu Soyoğlu

Keyword(s):

Support Vector Machines ◽

Clustering Algorithm ◽

Algebraic Approach ◽

Real Data ◽

Training Data ◽

Support Vector ◽

Intermediate Step ◽

Data Set ◽

Vector Machines ◽

Clustering And Classification

In this note, we propose a novel classification approach by introducing a new clustering method, which is used as an intermediate step to discover the structure of a data set. The proposed clustering algorithm uses similarities and the concept of a clique to obtain clusters, which can be used with different strategies for classification. This approach also reduces the size of the training data set. In this study, we apply support vector machines (SVMs) after obtaining clusters with the proposed clustering algorithm. The proposed clustering algorithm is applied with different strategies for applying SVMs. The results for several real data sets show that the performance is comparable with the standard SVM while reducing the size of the training data set and also the number of support vectors.

Download Full-text

An Expert System Based on Fisher Score and LS-SVM for Cardiac Arrhythmia Diagnosis

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/849674 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6 ◽

Cited By ~ 19

Author(s):

Ersen Yılmaz

Keyword(s):

Expert System ◽

Cardiac Arrhythmia ◽

Feature Space ◽

Support Vector ◽

Feature Subset ◽

Fisher Score ◽

Data Set ◽

Second Stage ◽

Vector Machines ◽

Two Stages

An expert system having two stages is proposed for cardiac arrhythmia diagnosis. In the first stage, Fisher score is used for feature selection to reduce the feature space dimension of a data set. The second stage is classification stage in which least squares support vector machines classifier is performed by using the feature subset selected in the first stage to diagnose cardiac arrhythmia. Performance of the proposed expert system is evaluated by using an arrhythmia data set which is taken from UCI machine learning repository.

Download Full-text

Patent Innovation Factors Evolution Based on P-SVM and GCCA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.39.247 ◽

2010 ◽

Vol 39 ◽

pp. 247-252

Author(s):

Sheng Xu ◽

Zhi Juan Wang ◽

Hui Fang Zhao

Keyword(s):

Genetic Algorithm ◽

Support Vector Machines ◽

Correlation Coefficient ◽

Network Architecture ◽

Innovation Management ◽

Model Development ◽

Real Data ◽

Support Vector ◽

Vector Machines ◽

Gray Correlation

A two-stage neural network architecture constructed by combining potential support vector machines (P-SVM) with genetic algorithm (GA) and gray correlation coefficient analysis (GCCA) is proposed for patent innovation factors evolution. The enterprises patent innovation is complex to conduct due to its nonlinearity of influenced factors. It is necessary to make a trade off among these factors when some of them conflict firstly. A novel way about nonlinear regression model with the potential support vector machines (P-SVM) is presented in this paper. In the model development, the genetic algorithm is employed to optimize P-SVM parameters selection. After the selected key factors by the PSVM with GA model, the main factors that affect patent innovation generation have been quantitatively studied using the method of gray correlation coefficient analysis. Using a set of real data in China, the results show that the methods developed in this paper can provide valuable information for patent innovation management and related municipal planning projects.

Download Full-text

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

BioMed Research International ◽

10.1155/2016/4248026 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Abbas Akkasi ◽

Ekrem Varoğlu ◽

Nazife Dimililer

Keyword(s):

Conditional Random Fields ◽

Named Entity Recognition ◽

Classification Performance ◽

Entity Recognition ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Rule Based ◽

Named Entity ◽

Vector Machines

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.

Download Full-text

A Review of Machine Learning Techniques for Anomaly Detection in Static Graphs

Implementing Computational Intelligence Techniques for Security Systems Design - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-2418-3.ch007 ◽

2020 ◽

pp. 146-162

Author(s):

Hesham M. Al-Ammal

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Anomaly Detection ◽

Real Life ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Learning Techniques ◽

Vector Machines

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.

Download Full-text

Learning to Identify At-Risk Students in Distance Education Using Interaction Counts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.62211 ◽

2016 ◽

Vol 23 (2) ◽

pp. 124 ◽

Cited By ~ 2

Author(s):

Douglas Detoni ◽

Cristian Cechinel ◽

Ricardo Araujo Matsumura ◽

Daniela Francisco Brauner

Keyword(s):

Machine Learning ◽

At Risk ◽

At Risk Students ◽

Drop Out ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Student Dropout ◽

Vector Machines ◽

Machine Learning Models

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.

Download Full-text

The Application of Machine Learning to a General Risk–Need Assessment Instrument in the Prediction of Criminal Recidivism

Criminal Justice and Behavior ◽

10.1177/0093854820969753 ◽

2020 ◽

pp. 009385482096975

Author(s):

Mehdi Ghasemi ◽

Daniel Anvari ◽

Mahshid Atapour ◽

J. Stephen wormith ◽

Keira C. Stockdale ◽

...

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Characteristic Curve ◽

Assessment Instrument ◽

Support Vector ◽

Data Set ◽

Applied Machine Learning ◽

Vector Machines ◽

Individual Scores

The Level of Service/Case Management Inventory (LS/CMI) is one of the most frequently used tools to assess criminogenic risk–need in justice-involved individuals. Meta-analytic research demonstrates strong predictive accuracy for various recidivism outcomes. In this exploratory study, we applied machine learning (ML) algorithms (decision trees, random forests, and support vector machines) to a data set with nearly 100,000 LS/CMI administrations to provincial corrections clientele in Ontario, Canada, and approximately 3 years follow-up. The overall accuracies and areas under the receiver operating characteristic curve (AUCs) were comparable, although ML outperformed LS/CMI in terms of predictive accuracy for the middle scores where it is hardest to predict the recidivism outcome. Moreover, ML improved the AUCs for individual scores to near 0.60, from 0.50 for the LS/CMI, indicating that ML also improves the ability to rank individuals according to their probability of recidivating. Potential considerations, applications, and future directions are discussed.

Download Full-text

Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2012-0009 ◽

2012 ◽

Vol 57 (5) ◽

Cited By ~ 5

Author(s):

Mohammad Reza Daliri

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Support Vector Machines ◽

Particle Swarm ◽

Support Vector ◽

Emission Computed Tomography ◽

Binary Particle Swarm Optimization ◽

Data Set ◽

Swarm Optimization ◽

Vector Machines

AbstractIn this article, we propose a feature selection strategy using a binary particle swarm optimization algorithm for the diagnosis of different medical diseases. The support vector machines were used for the fitness function of the binary particle swarm optimization. We evaluated our proposed method on four databases from the machine learning repository, including the single proton emission computed tomography heart database, the Wisconsin breast cancer data set, the Pima Indians diabetes database, and the Dermatology data set. The results indicate that, with selected less number of features, we obtained a higher accuracy in diagnosing heart, cancer, diabetes, and erythematosquamous diseases. The results were compared with the traditional feature selection methods, namely, the F-score and the information gain, and a superior accuracy was obtained with our method. Compared to the genetic algorithm for feature selection, the results of the proposed method show a higher accuracy in all of the data, except in one. In addition, in comparison with other methods that used the same data, our approach has a higher performance using less number of features.

Download Full-text

A new classification strategy for human activity recognition using cost sensitive support vector machines for imbalanced data

Kybernetes ◽

10.1108/k-07-2014-0138 ◽

2014 ◽

Vol 43 (8) ◽

pp. 1150-1164 ◽

Cited By ~ 9

Author(s):

Bilal M’hamed Abidine ◽

Belkacem Fergani ◽

Mourad Oussalah ◽

Lamya Fergani

Keyword(s):

Support Vector Machines ◽

Probabilistic Models ◽

Conditional Random Fields ◽

Performance Metrics ◽

Imbalanced Data ◽

Sampling Technique ◽

Support Vector ◽

Data Set ◽

Content Type ◽

Vector Machines

Purpose – The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur more frequently than others. Typically probabilistic models such as Hidden Markov Model (HMM) and Conditional Random Fields (CRF) are known as commonly employed for such purpose. The paper aims to discuss these issues. Design/methodology/approach – In this work, the authors propose a robust strategy combining the Synthetic Minority Over-sampling Technique (SMOTE) with Cost Sensitive Support Vector Machines (CS-SVM) with an adaptive tuning of cost parameter in order to handle imbalanced data problem. Findings – The results have demonstrated the usefulness of the approach through comparison with state of art of approaches including HMM, CRF, the traditional C-Support vector machines (C-SVM) and the Cost-Sensitive-SVM (CS-SVM) for classifying the activities using binary and ubiquitous sensors. Originality/value – Performance metrics in the experiment/simulation include Accuracy, Precision/Recall and F measure.

Download Full-text