Learning with Partial Supervision

Author(s):  
Abdelhamid Bouchachia

Recently the field of machine learning, pattern recognition, and data mining has witnessed a new research stream that is <i>learning with partial supervisio</i>n -LPS- (known also as <i>semi-supervised learning</i>). This learning scheme is motivated by the fact that the process of acquiring the labeling information of data could be quite costly and sometimes prone to mislabeling. The general spectrum of learning from data is envisioned in Figure 1. As shown, in many situations, the data is neither perfectly nor completely labeled.<div><br></div><div>LPS aims at using available labeled samples in order to guide the process of building classification and clustering machineries and help boost their accuracy. Basically, LPS is a combination of two learning paradigms: supervised and unsupervised where the former deals exclusively with labeled data and the latter is concerned with unlabeled data. Hence, the following questions:</div><div><br></div><div><ul><li>Can we improve supervised learning with unlabeled data?&nbsp;<br></li><li>Can we guide unsupervised learning by incorporating few labeled samples?<br></li></ul></div><div><br></div><div>Typical LPS applications are medical diagnosis (Bouchachia &amp; Pedrycz, 2006a), facial expression recognition (Cohen et al., 2004), text classification (Nigam et al., 2000), protein classification (Weston et al., 2003), and several natural language processing applications such as word sense disambiguation (Niu et al., 2005), and text chunking (Ando &amp; Zhangz, 2005).</div><div><br></div><div>Because LPS is still a young but active research field, it lacks a survey outlining the existing approaches and research trends. In this chapter, we will take a step towards an overview. We will discuss (i) the background of LPS, (iii) the main focus of our LPS research and explain the underlying assumptions behind LPS, and (iv) future directions and challenges of LPS research. </div>

2012 ◽  
pp. 1880-1888
Author(s):  
Abdelhamid Bouchachia

Recently the field of machine learning, pattern recognition, and data mining has witnessed a new research stream that is <i>learning with partial supervisio</i>n -LPS- (known also as <i>semi-supervised learning</i>). This learning scheme is motivated by the fact that the process of acquiring the labeling information of data could be quite costly and sometimes prone to mislabeling. The general spectrum of learning from data is envisioned in Figure 1. As shown, in many situations, the data is neither perfectly nor completely labeled.<div><br></div><div>LPS aims at using available labeled samples in order to guide the process of building classification and clustering machineries and help boost their accuracy. Basically, LPS is a combination of two learning paradigms: supervised and unsupervised where the former deals exclusively with labeled data and the latter is concerned with unlabeled data. Hence, the following questions:</div><div><br></div><div><ul><li>Can we improve supervised learning with unlabeled data?&nbsp;<br></li><li>Can we guide unsupervised learning by incorporating few labeled samples?<br></li></ul></div><div><br></div><div>Typical LPS applications are medical diagnosis (Bouchachia &amp; Pedrycz, 2006a), facial expression recognition (Cohen et al., 2004), text classification (Nigam et al., 2000), protein classification (Weston et al., 2003), and several natural language processing applications such as word sense disambiguation (Niu et al., 2005), and text chunking (Ando &amp; Zhangz, 2005).</div><div><br></div><div>Because LPS is still a young but active research field, it lacks a survey outlining the existing approaches and research trends. In this chapter, we will take a step towards an overview. We will discuss (i) the background of LPS, (iii) the main focus of our LPS research and explain the underlying assumptions behind LPS, and (iv) future directions and challenges of LPS research. </div>


2020 ◽  
Vol 1 ◽  
pp. 1-18
Author(s):  
Amine Medad ◽  
Mauro Gaio ◽  
Ludovic Moncla ◽  
Sébastien Mustière ◽  
Yannick Le Nir

Abstract. Discourse may contain both named and nominal entities. Most common nouns or nominal mentions in natural language do not have a single, simple meaning but rather a number of related meanings. This form of ambiguity led to the development of a task in natural language processing known as Word Sense Disambiguation. Recognition and categorisation of named and nominal entities is an essential step for Word Sense Disambiguation methods. Up to now, named entity recognition and categorisation systems mainly focused on the annotation, categorisation and identification of named entities. This paper focuses on the annotation and the identification of spatial nominal entities. We explore the combination of Transfer Learning principle and supervised learning algorithms, in order to build a system to detect spatial nominal entities. For this purpose, different supervised learning algorithms are evaluated with three different context sizes on two manually annotated datasets built from Wikipedia articles and hiking description texts. The studied algorithms have been selected for one or more of their specific properties potentially useful in solving our problem. The results of the first phase of experiments reveal that the selected algorithms have similar performances in terms of ability to detect spatial nominal entities. The study also confirms the importance of the size of the window to describe the context, when word-embedding principle is used to represent the semantics of each word.


2016 ◽  
Vol 55 ◽  
pp. 1025-1058 ◽  
Author(s):  
Osman Başkaya ◽  
David Jurgens

Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context, and successful approaches are known to benefit many applications in Natural Language Processing. Although supervised learning has been shown to provide superior WSD performance, current sense-annotated corpora do not contain a sufficient number of instances per word type to train supervised systems for all words. While unsupervised techniques have been proposed to overcome this data sparsity problem, such techniques have not outperformed supervised methods. In this paper, we propose a new approach to building semi-supervised WSD systems that combines a small amount of sense-annotated data with information from Word Sense Induction, a fully-unsupervised technique that automatically learns the different senses of a word based on how it is used. In three experiments, we show how sense induction models may be effectively combined to ultimately produce high-performance semi-supervised WSD systems that exceed the performance of state-of-the-art supervised WSD techniques trained on the same sense-annotated data. We anticipate that our results and released software will also benefit evaluation practices for sense induction systems and those working in low-resource languages by demonstrating how to quickly produce accurate WSD systems with minimal annotation effort.


2018 ◽  
Vol 10 (10) ◽  
pp. 3729 ◽  
Author(s):  
Hei Wang ◽  
Yung Chi ◽  
Ping Hsin

With the advent of the knowledge economy, firms often compete for intellectual property rights. Being the first to acquire high-potential patents can assist firms in achieving future competitive advantages. To identify patents capable of being developed, firms often search for a focus by using existing patent documents. Because of the rapid development of technology, the number of patent documents is immense. A prominent topic among current firms is how to use this large number of patent documents to discover new business opportunities while avoiding conflicts with existing patents. In the search for technological opportunities, a crucial task is to present results in the form of an easily understood visualization. Currently, natural language processing can help in achieving this goal. In natural language processing, word sense disambiguation (WSD) is the problem of determining which “sense” (meaning) of a word is activated in a given context. Given a word and its possible senses, as defined by a dictionary, we classify the occurrence of a word in context into one or more of its sense classes. The features of the context (such as neighboring words) provide evidence for these classifications. The current method for patent document analysis warrants improvement in areas, such as the analysis of many dimensions and the development of recommendation methods. This study proposes a visualization method that supports semantics, reduces the number of dimensions formed by terms, and can easily be understood by users. Since polysemous words occur frequently in patent documents, we also propose a WSD method to decrease the calculated degrees of distortion between terms. An analysis of outlier distributions is used to construct a patent map capable of distinguishing similar patents. During the development of new strategies, the constructed patent map can assist firms in understanding patent distributions in commercial areas, thereby preventing patent infringement caused by the development of similar technologies. Subsequently, technological opportunities can be recommended according to the patent map, aiding firms in assessing relevant patents in commercial areas early and sustainably achieving future competitive advantages.


Author(s):  
Stylianos Asteriadis ◽  
Stylianos Asteriadis ◽  
Nikos Nikolaidis ◽  
Nikos Nikolaidis ◽  
Ioannis Pitas ◽  
...  

Facial feature localization is an important task in numerous applications of face image analysis that include face recognition and verification, facial expression recognition, driver‘s alertness estimation, head pose estimation etc. Thus, the area has been a very active research field for many years and a multitude of methods appear in the literature. Depending on the targeted application, the proposed methods have different characteristics and are designed to perform in different setups. Thus, a method of general applicability seems to be away from the current state of the art. This chapter intends to offer an up-to-date literature review of facial feature detection algorithms. A review of the image databases and performance metrics that are used to benchmark these algorithms is also provided.


Author(s):  
Marina Sokolova ◽  
Stan Szpakowicz

This chapter presents applications of machine learning techniques to traditional problems in natural language processing, including part-of-speech tagging, entity recognition and word-sense disambiguation. People usually solve such problems without difficulty or at least do a very good job. Linguistics may suggest labour-intensive ways of manually constructing rule-based systems. It is, however, the easy availability of large collections of texts that has made machine learning a method of choice for processing volumes of data well above the human capacity. One of the main purposes of text processing is all manner of information extraction and knowledge extraction from such large text. Machine learning methods discussed in this chapter have stimulated wide-ranging research in natural language processing and helped build applications with serious deployment potential.


2011 ◽  
pp. 503-521
Author(s):  
Flavius Frasincar ◽  
Jethro Borsje ◽  
Leonard Levering

This article proposes Hermes, a Semantic Webbased framework for building personalized news services. It makes use of ontologies for knowledge representation, natural language processing techniques for semantic text analysis, and semantic query languages for specifying wanted information. Hermes is supported by an implementation of the framework, the Hermes News Portal, a tool which allows users to have a personalized online access to news items. The Hermes framework and its associated implementation aim at advancing the state-of-the-art of semantic approaches for personalized news services by employing Semantic Web standards, exploiting domain information, using a word sense disambiguation procedure, and being able to express temporal constraints for the desired news items.


2019 ◽  
Vol 26 (5) ◽  
pp. 438-446 ◽  
Author(s):  
Ahmad Pesaranghader ◽  
Stan Matwin ◽  
Marina Sokolova ◽  
Ali Pesaranghader

Abstract Objective In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.


Sign in / Sign up

Export Citation Format

Share Document