scholarly journals NP Animacy Identification for Anaphora Resolution

2007 ◽  
Vol 29 ◽  
pp. 79-103 ◽  
Author(s):  
C. Orasan ◽  
R. J. Evans

In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which uses information about the unique beginners in WordNet to classify NPs on the basis of their animacy. The second method relies on a machine learning algorithm which exploits a WordNet enriched with animacy information for each sense. The effect of word sense disambiguation on the two methods is also assessed. The intrinsic evaluation reveals that the machine learning method reaches human levels of performance. The extrinsic evaluation demonstrates that animacy identification can be beneficial in anaphora resolution, especially in the cases where animate entities are identified with high precision.

Author(s):  
Raymond J. Mooney

This article introduces the type of symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. Machine learning is the study of computational systems that improve performance on some task with experience. Most machine learning methods concern the task of categorizing examples described by a set of features. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word-sense disambiguation and anaphora resolution. Finally, this article reviews the applications to a variety of these problems, such as morphology, part-of-speech tagging, word-sense disambiguation, syntactic parsing, semantic parsing, information extraction, and anaphora resolution.


2017 ◽  
Vol 21 (1) ◽  
pp. 515-522 ◽  
Author(s):  
Muhammad Abid ◽  
Asad Habib ◽  
Jawad Ashraf ◽  
Abdul Shahid

Author(s):  
Abarna Ramprakash

Money laundering is the illegal process of concealing the origins of money obtained illegally by passing it through a complex sequence of banking transfers. Currently banks use rule based systems to identify the suspicious transactions which could be used for money laundering. However these systems generate a large number of false positives which leads the banks to spend a huge amount of money and time in investigating the false positives. Hence, in this paper, the monitoring of transactions is to be done using XGBoost machine learning algorithm in order to reduce the number of false positives and to increase the probability of identifying true positives.


2018 ◽  
Vol 25 (7) ◽  
pp. 800-808 ◽  
Author(s):  
Yue Wang ◽  
Kai Zheng ◽  
Hua Xu ◽  
Qiaozhu Mei

Abstract Objective Medical word sense disambiguation (WSD) is challenging and often requires significant training with data labeled by domain experts. This work aims to develop an interactive learning algorithm that makes efficient use of expert’s domain knowledge in building high-quality medical WSD models with minimal human effort. Methods We developed an interactive learning algorithm with expert labeling instances and features. An expert can provide supervision in 3 ways: labeling instances, specifying indicative words of a sense, and highlighting supporting evidence in a labeled instance. The algorithm learns from these labels and iteratively selects the most informative instances to ask for future labels. Our evaluation used 3 WSD corpora: 198 ambiguous terms from Medical Subject Headings (MSH) as MEDLINE indexing terms, 74 ambiguous abbreviations in clinical notes from the University of Minnesota (UMN), and 24 ambiguous abbreviations in clinical notes from Vanderbilt University Hospital (VUH). For each ambiguous term and each learning algorithm, a learning curve that plots the accuracy on the test set against the number of labeled instances was generated. The area under the learning curve was used as the primary evaluation metric. Results Our interactive learning algorithm significantly outperformed active learning, the previous fastest learning algorithm for medical WSD. Compared to active learning, it achieved 90% accuracy for the MSH corpus with 42% less labeling effort, 35% less labeling effort for the UMN corpus, and 16% less labeling effort for the VUH corpus. Conclusions High-quality WSD models can be efficiently trained with minimal supervision by inviting experts to label informative instances and provide domain knowledge through labeling/highlighting contextual features.


2002 ◽  
Vol 8 (4) ◽  
pp. 311-325 ◽  
Author(s):  
V. HOSTE ◽  
I. HENDRICKX ◽  
W. DAELEMANS ◽  
A. VAN DEN BOSCH

Various Machine Learning (ML) approaches have been demonstrated to produce relatively successful Word Sense Disambiguation (WSD) systems. There are still unexplained differences among the performance measurements of different algorithms, hence it is warranted to deepen the investigation into which algorithm has the right ‘bias’ for this task. In this paper, we show that this is not easy to accomplish, due to intricate interactions between information sources, parameter settings, and properties of the training data. We investigate the impact of parameter optimization on generalization accuracy in a memory-based learning approach to English and Dutch WSD. A ‘word-expert’ architecture was adopted, yielding a set of classifiers, each specialized in one single wordform. The experts consist of multiple memory-based learning classifiers, each taking different information sources as input, combined in a voting scheme. We optimized the architectural and parametric settings for each individual word-expert by performing cross-validation experiments on the learning material. The results of these experiments show that the variation of both the algorithmic parameters and the information sources available to the classifiers leads to large fluctuations in accuracy. We demonstrate that optimization per word-expert leads to an overall significant improvement in the generalization accuracies of the produced WSD systems.


Sign in / Sign up

Export Citation Format

Share Document