Beyond Cross-Validation—Accuracy Estimation for Incremental and Active Learning Models

Christian Limberg; Heiko Wersing; Helge Ritter

doi:10.3390/make2030018

Beyond Cross-Validation—Accuracy Estimation for Incremental and Active Learning Models

Machine Learning and Knowledge Extraction ◽

10.3390/make2030018 ◽

2020 ◽

Vol 2 (3) ◽

pp. 327-346

Author(s):

Christian Limberg ◽

Heiko Wersing ◽

Helge Ritter

Keyword(s):

Active Learning ◽

Cross Validation ◽

Recognition Task ◽

Training Data ◽

Data Sets ◽

Accuracy Estimation ◽

Benchmark Data ◽

Machine Learning Applications ◽

Novel Method ◽

Human Teacher

For incremental machine-learning applications it is often important to robustly estimate the system accuracy during training, especially if humans perform the supervised teaching. Cross-validation and interleaved test/train error are here the standard supervised approaches. We propose a novel semi-supervised accuracy estimation approach that clearly outperforms these two methods. We introduce the Configram Estimation (CGEM) approach to predict the accuracy of any classifier that delivers confidences. By calculating classification confidences for unseen samples, it is possible to train an offline regression model, capable of predicting the classifier’s accuracy on novel data in a semi-supervised fashion. We evaluate our method with several diverse classifiers and on analytical and real-world benchmark data sets for both incremental and active learning. The results show that our novel method improves accuracy estimation over standard methods and requires less supervised training data after deployment of the model. We demonstrate the application of our approach to a challenging robot object recognition task, where the human teacher can use our method to judge sufficient training.

Learning to rank with click-through features in a reinforcement learning framework

International Journal of Web Information Systems ◽

10.1108/ijwis-12-2015-0046 ◽

2016 ◽

Vol 12 (4) ◽

pp. 448-476 ◽

Cited By ~ 2

Author(s):

Amir Hosein Keyhanipour ◽

Behzad Moshiri ◽

Maryam Piroozmand ◽

Farhad Oroumchian ◽

Ali Moeini

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Training Data ◽

High Dimensionality ◽

Compact Representation ◽

Second Phase ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Benchmark Data

Purpose Learning to rank algorithms inherently faces many challenges. The most important challenges could be listed as high-dimensionality of the training data, the dynamic nature of Web information resources and lack of click-through data. High dimensionality of the training data affects effectiveness and efficiency of learning algorithms. Besides, most of learning to rank benchmark datasets do not include click-through data as a very rich source of information about the search behavior of users while dealing with the ranked lists of search results. To deal with these limitations, this paper aims to introduce a novel learning to rank algorithm by using a set of complex click-through features in a reinforcement learning (RL) model. These features are calculated from the existing click-through information in the data set or even from data sets without any explicit click-through information. Design/methodology/approach The proposed ranking algorithm (QRC-Rank) applies RL techniques on a set of calculated click-through features. QRC-Rank is as a two-steps process. In the first step, Transformation phase, a compact benchmark data set is created which contains a set of click-through features. These feature are calculated from the original click-through information available in the data set and constitute a compact representation of click-through information. To find most effective click-through feature, a number of scenarios are investigated. The second phase is Model-Generation, in which a RL model is built to rank the documents. This model is created by applying temporal difference learning methods such as Q-Learning and SARSA. Findings The proposed learning to rank method, QRC-rank, is evaluated on WCL2R and LETOR4.0 data sets. Experimental results demonstrate that QRC-Rank outperforms the state-of-the-art learning to rank methods such as SVMRank, RankBoost, ListNet and AdaRank based on the precision and normalized discount cumulative gain evaluation criteria. The use of the click-through features calculated from the training data set is a major contributor to the performance of the system. Originality/value In this paper, we have demonstrated the viability of the proposed features that provide a compact representation for the click through data in a learning to rank application. These compact click-through features are calculated from the original features of the learning to rank benchmark data set. In addition, a Markov Decision Process model is proposed for the learning to rank problem using RL, including the sets of states, actions, rewarding strategy and the transition function.

Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms

10.7287/peerj.preprints.3004v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Stephanie R Debats ◽

Lyndon D Estes ◽

David R Thompson ◽

Kelly K Caylor

Keyword(s):

Active Learning ◽

Large Scale ◽

Learning Algorithm ◽

Training Data ◽

Sub Saharan Africa ◽

Data Sets ◽

Field Patterns ◽

Sub Saharan ◽

Highly Correlated ◽

Computational Resources

Sub-Saharan Africa and other developing regions of the world are dominated by smallholder farms, which are characterized by small, heterogeneous, and often indistinct field patterns. In previous work, we developed an algorithm for mapping both smallholder and commercial agricultural fields that includes efficient extraction of a vast set of simple, highly correlated, and interdependent features, followed by a random forest classifier. In this paper, we demonstrated how active learning can be incorporated in the algorithm to create smaller, more efficient training data sets, which reduced computational resources, minimized the need for humans to hand-label data, and boosted performance. We designed a patch-based uncertainty metric to drive the active learning framework, based on the regular grid of a crowdsourcing platform, and demonstrated how subject matter experts can be replaced with fleets of crowdsourcing workers. Our active learning algorithm achieved similar performance as an algorithm trained with randomly selected data, but with 62% less data samples.

A Meta-Heuristic Model for Data Classification Using Target Optimization

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2017070102 ◽

2017 ◽

Vol 8 (3) ◽

pp. 24-36 ◽

Cited By ~ 2

Author(s):

Rabindra K. Barik ◽

Rojalina Priyadarshini ◽

Nilamadhab Dash

Keyword(s):

Search Algorithm ◽

Gravitational Search Algorithm ◽

Data Classification ◽

Optimization Techniques ◽

Training Data ◽

Data Sets ◽

Training Process ◽

Swarm Optimization ◽

Benchmark Data ◽

The Neural Network

The paper contains an extensive experimental study which focuses on a major idea on Target Optimization (TO) prior to the training process of artificial machines. Generally, during training process of an artificial machine, output is computed from two important parameters i.e. input and target. In general practice input is taken from the training data and target is randomly chosen, which may not be relevant to the corresponding training data. Hence, the overall training of the neural network becomes inefficient. The present study tries to put forward TO as an efficient methodology which may be helpful in addressing the said problem. The proposed work tries to implement the concept of TO and compares the outcomes with the conventional classifiers. In this regard, different benchmark data sets are used to compare the effect of TO on data classification by using Particle Swarm Optimization (PSO) and Gravitational Search Algorithm (GSA) optimization techniques.

A Statistical Parsing Framework for Sentiment Classification

Computational Linguistics ◽

10.1162/coli_a_00221 ◽

2015 ◽

Vol 41 (2) ◽

pp. 293-336 ◽

Cited By ~ 22

Author(s):

Li Dong ◽

Furu Wei ◽

Shujie Liu ◽

Ming Zhou ◽

Ke Xu

Keyword(s):

Sentiment Analysis ◽

Sentiment Classification ◽

Training Data ◽

Data Sets ◽

Syntactic Parsing ◽

Benchmark Data ◽

Sentence Level ◽

Statistical Parsing ◽

Parse Trees ◽

Context Free

We present a statistical parsing framework for sentence-level sentiment classification in this article. Unlike previous works that use syntactic parsing results for sentiment analysis, we develop a statistical parser to directly analyze the sentiment structure of a sentence. We show that complicated phenomena in sentiment analysis (e.g., negation, intensification, and contrast) can be handled the same way as simple and straightforward sentiment expressions in a unified and probabilistic way. We formulate the sentiment grammar upon Context-Free Grammars (CFGs), and provide a formal description of the sentiment parsing framework. We develop the parsing model to obtain possible sentiment parse trees for a sentence, from which the polarity model is proposed to derive the sentiment strength and polarity, and the ranking model is dedicated to selecting the best sentiment tree. We train the parser directly from examples of sentences annotated only with sentiment polarity labels but without any syntactic annotations or polarity annotations of constituents within sentences. Therefore we can obtain training data easily. In particular, we train a sentiment parser, s.parser, from a large amount of review sentences with users' ratings as rough sentiment polarity labels. Extensive experiments on existing benchmark data sets show significant improvements over baseline sentiment classification approaches.

Deep Active Learning with Adaptive Acquisition

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/343 ◽

2019 ◽

Cited By ~ 1

Author(s):

Manuel Haussmann ◽

Fred Hamprecht ◽

Melih Kandemir

Keyword(s):

Active Learning ◽

A Priori ◽

Data Sets ◽

Neural Net ◽

Policy Network ◽

Data Set ◽

Specific Data ◽

Machine Learning Applications ◽

Data Points ◽

Validation Set

Model selection is treated as a standard performance boosting step in many machine learning applications. Once all other properties of a learning problem are fixed, the model is selected by grid search on a held-out validation set. This is strictly inapplicable to active learning. Within the standardized workflow, the acquisition function is chosen among available heuristics a priori, and its success is observed only after the labeling budget is already exhausted. More importantly, none of the earlier studies report a unique consistently successful acquisition heuristic to the extent to stand out as the unique best choice. We present a method to break this vicious circle by defining the acquisition function as a learning predictor and training it by reinforcement feedback collected from each labeling round. As active learning is a scarce data regime, we bootstrap from a well-known heuristic that filters the bulk of data points on which all heuristics would agree, and learn a policy to warp the top portion of this ranking in the most beneficial way for the character of a specific data distribution. Our system consists of a Bayesian neural net, the predictor, a bootstrap acquisition function, a probabilistic state definition, and another Bayesian policy network that can effectively incorporate this input distribution. We observe on three benchmark data sets that our method always manages to either invent a new superior acquisition function or to adapt itself to the a priori unknown best performing heuristic for each specific data set.

The k conditional nearest neighbor algorithm for classification and class probability estimation

PeerJ Computer Science ◽

10.7717/peerj-cs.194 ◽

2019 ◽

Vol 5 ◽

pp. e194 ◽

Cited By ~ 2

Author(s):

Hyukjun Gweon ◽

Matthias Schonlau ◽

Stefan H. Steiner

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbors ◽

Training Data ◽

Data Sets ◽

Posterior Probabilities ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Benchmark Data ◽

Nonparametric Classification ◽

Class Probability

The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap.

Learning Realistic Patterns from Visually Unrealistic Stimuli: Generalization and Data Anonymization

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.13252 ◽

2021 ◽

Vol 72 ◽

pp. 1163-1214

Author(s):

Konstantinos Nikolaidis ◽

Stein Kristiansen ◽

Thomas Plagemann ◽

Vera Goebel ◽

Knut Liestøl ◽

...

Keyword(s):

Sleep Stage ◽

Original Data ◽

Training Data ◽

Data Sets ◽

Classification Models ◽

Data Anonymization ◽

Sleep Stage Classification ◽

Machine Learning Applications ◽

Anonymized Data ◽

Accuracy Difference

Good training data is a prerequisite to develop useful Machine Learning applications. However, in many domains existing data sets cannot be shared due to privacy regulations (e.g., from medical studies). This work investigates a simple yet unconventional approach for anonymized data synthesis to enable third parties to benefit from such anonymized data. We explore the feasibility of learning implicitly from visually unrealistic, task-relevant stimuli, which are synthesized by exciting the neurons of a trained deep neural network. As such, neuronal excitation can be used to generate synthetic stimuli. The stimuli data is used to train new classification models. Furthermore, we extend this framework to inhibit representations that are associated with specific individuals. We use sleep monitoring data from both an open and a large closed clinical study, and Electroencephalogram sleep stage classification data, to evaluate whether (1) end-users can create and successfully use customized classification models, and (2) the identity of participants in the study is protected. Extensive comparative empirical investigation shows that different algorithms trained on the stimuli are able to generalize successfully on the same task as the original model. Architectural and algorithmic similarity between new and original models play an important role in performance. For similar architectures, the performance is close to that of using the original data (e.g., Accuracy difference of 0.56%-3.82%, Kappa coefficient difference of 0.02-0.08). Further experiments show that the stimuli can provide state-ofthe-art resilience against adversarial association and membership inference attacks.

Towards a Universal Semantic Dictionary

Applied Sciences ◽

10.3390/app9194060 ◽

2019 ◽

Vol 9 (19) ◽

pp. 4060 ◽

Cited By ~ 1

Author(s):

Maria Jose Castro-Bleda ◽

Eszter Iklódi ◽

Gábor Recski ◽

Gábor Borbély

Keyword(s):

Arbitrary Number ◽

Training Data ◽

Word Embeddings ◽

Italian Translation ◽

Benchmark Data ◽

First Case ◽

Comparable Performance ◽

Baseline System ◽

Novel Method ◽

Better Than

A novel method for finding linear mappings among word embeddings for several languages, taking as pivot a shared, multilingual embedding space, is proposed in this paper. Previous approaches learned translation matrices between two specific languages, while this method learns translation matrices between a given language and a shared, multilingual space. The system was first trained on bilingual, and later on multilingual corpora as well. In the first case, two different training data were applied: Dinu’s English–Italian benchmark data, and English–Italian translation pairs extracted from the PanLex database. In the second case, only the PanLex database was used. The system performs on English–Italian languages with the best setting significantly better than the baseline system given by Mikolov, and it provides a comparable performance with more sophisticated systems. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number of languages.

Visual Place Recognition via Robust ℓ2-Norm Distance Based Holism and Landmark Integration

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018034 ◽

2019 ◽

Vol 33 ◽

pp. 8034-8041 ◽

Cited By ~ 1

Author(s):

Kai Liu ◽

Hua Wang ◽

Fei Han ◽

Hao Zhang

Keyword(s):

Large Scale ◽

Optimization Problem ◽

Data Sets ◽

Place Recognition ◽

Benchmark Data ◽

Localization And Mapping ◽

Novel Method ◽

Visual Place Recognition ◽

Holistic Representation

Visual place recognition is essential for large-scale simultaneous localization and mapping (SLAM). Long-term robot operations across different time of the days, months, and seasons introduce new challenges from significant environment appearance variations. In this paper, we propose a novel method to learn a location representation that can integrate the semantic landmarks of a place with its holistic representation. To promote the robustness of our new model against the drastic appearance variations due to long-term visual changes, we formulate our objective to use non-squared ℓ2-norm distances, which leads to a difficult optimization problem that minimizes the ratio of the ℓ2,1-norms of matrices. To solve our objective, we derive a new efficient iterative algorithm, whose convergence is rigorously guaranteed by theory. In addition, because our solution is strictly orthogonal, the learned location representations can have better place recognition capabilities. We evaluate the proposed method using two large-scale benchmark data sets, the CMU-VL and Nordland data sets. Experimental results have validated the effectiveness of our new method in long-term visual place recognition applications.

Towards a Universal Semantic Dictionary

10.20944/preprints201907.0336.v1 ◽

2019 ◽

Author(s):

María José Castro-Bleda ◽

Eszter Iklodi ◽

Gabor Recski ◽

Gabor Borbely

Keyword(s):

Training Data ◽

Italian Translation ◽

Benchmark Data ◽

First Case ◽

Universal Space ◽

Comparable Performance ◽

Baseline System ◽

Novel Method ◽

Universal Embedding ◽

Better Than

A novel method for finding linear mappings among word embeddings for several languages, taking as pivot a shared, universal embedding space, is proposed in this paper. Previous approaches learn translation matrices between two specific languages, but this method learn translation matrices between a given language and a shared, universal space. The system was first trained on bilingual, and later on multilingual corpora as well. In the first case two different training data were applied; Dinu’s English-Italian benchmark data, and English-Italian translation pairs extracted from the PanLex database. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [1], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [2] and Dinu et al. [3]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number of languages.