Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval

Author(s):  
Hiroshi Murata ◽  
◽  
Takashi Onoda ◽  
Seiji Yamada ◽  

Support Vector Machines (SVMs) were applied to interactive document retrieval that uses active learning. In such a retrieval system, the degree of relevance is evaluated by using a signed distance from the optimal hyperplane. It is not clear, however, how the signed distance in SVMs has characteristics of vector space model. We therefore formulated the degree of relevance by using the signed distance in SVMs and comparatively analyzed it with a conventional Rocchio-based method. Although vector normalization has been utilized as preprocessing for document retrieval, few studies explained why vector normalization was effective. Based on our comparative analysis, we theoretically show the effectiveness of normalizing document vectors in SVM-based interactive document retrieval. We then propose a cosine kernel that is suitable for SVM-based interactive document retrieval. The effectiveness of the method was compared experimentally with conventional relevance feedback for Boolean, Term Frequency and Term Frequency-Inverse Document Frequency representations of document vectors. Experimental results for a Text REtrieval Conference data set showed that the cosine kernel is effective for all document representations, especially Term Frequency representation.

2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ersen Yılmaz

An expert system having two stages is proposed for cardiac arrhythmia diagnosis. In the first stage, Fisher score is used for feature selection to reduce the feature space dimension of a data set. The second stage is classification stage in which least squares support vector machines classifier is performed by using the feature subset selected in the first stage to diagnose cardiac arrhythmia. Performance of the proposed expert system is evaluated by using an arrhythmia data set which is taken from UCI machine learning repository.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Abbas Akkasi ◽  
Ekrem Varoğlu ◽  
Nazife Dimililer

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.


Author(s):  
Hsien-Chung Lin ◽  
Eugen Solowjow ◽  
Masayoshi Tomizuka ◽  
Edwin Kreuzer

This contribution presents a method to estimate environmental boundaries with mobile agents. The agents sample a concentration field of interest at their respective positions and infer a level curve of the unknown field. The presented method is based on support vector machines (SVMs), whereby the concentration level of interest serves as the decision boundary. The field itself does not have to be estimated in order to obtain the level curve which makes the method computationally very appealing. A myopic strategy is developed to pick locations that yield most informative concentration measurements. Cooperative operations of multiple agents are demonstrated by dividing the domain in Voronoi tessellations. Numerical studies demonstrate the feasibility of the method on a real data set of the California coastal area. The exploration strategy is benchmarked against random walk which it clearly outperforms.


Author(s):  
Hesham M. Al-Ammal

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.


2016 ◽  
Vol 23 (2) ◽  
pp. 124 ◽  
Author(s):  
Douglas Detoni ◽  
Cristian Cechinel ◽  
Ricardo Araujo Matsumura ◽  
Daniela Francisco Brauner

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.


2020 ◽  
pp. 009385482096975
Author(s):  
Mehdi Ghasemi ◽  
Daniel Anvari ◽  
Mahshid Atapour ◽  
J. Stephen wormith ◽  
Keira C. Stockdale ◽  
...  

The Level of Service/Case Management Inventory (LS/CMI) is one of the most frequently used tools to assess criminogenic risk–need in justice-involved individuals. Meta-analytic research demonstrates strong predictive accuracy for various recidivism outcomes. In this exploratory study, we applied machine learning (ML) algorithms (decision trees, random forests, and support vector machines) to a data set with nearly 100,000 LS/CMI administrations to provincial corrections clientele in Ontario, Canada, and approximately 3 years follow-up. The overall accuracies and areas under the receiver operating characteristic curve (AUCs) were comparable, although ML outperformed LS/CMI in terms of predictive accuracy for the middle scores where it is hardest to predict the recidivism outcome. Moreover, ML improved the AUCs for individual scores to near 0.60, from 0.50 for the LS/CMI, indicating that ML also improves the ability to rank individuals according to their probability of recidivating. Potential considerations, applications, and future directions are discussed.


Sign in / Sign up

Export Citation Format

Share Document