Disambiguation and Filtering Methods in Using Web Knowledge for Coreference Resolution

The authors investigate two publicly available Web knowledge bases, Wikipedia and Yago, in an attempt to leverage semantic information and increase the performance level of a state-of-the-art coreference resolution engine. They extract semantic compatibility and aliasing information from Wikipedia and Yago, and incorporate it into a coreference resolution system. The authors show that using such knowledge with no disambiguation and filtering does not bring any improvement over the baseline, mirroring the previous findings (Ponzetto & Poesio, 2009). They propose, therefore, a number of solutions to reduce the amount of noise coming from Web resources: using disambiguation tools for Wikipedia, pruning Yago to eliminate the most generic categories and imposing additional constraints on affected mentions. The evaluation experiments on the ACE-02 corpus show that the knowledge, extracted from Wikipedia and Yago, improves the system’s performance by 2-3 percentage points.

Download Full-text

Evaluation-driven design of a robust coreference resolution system

Natural Language Engineering ◽

10.1017/s135132490300319x ◽

2003 ◽

Vol 9 (3) ◽

pp. 281-306 ◽

Cited By ~ 1

Author(s):

ANDREI POPESCU-BELIS

Keyword(s):

Working Memory ◽

Theoretical Analysis ◽

Coreference Resolution ◽

Knowledge Sources ◽

Narrative Texts ◽

Data Annotation ◽

Evaluation Measures ◽

Optimal Behavior ◽

Resolution System

In this paper, we describe a system for coreference resolution and emphasize the role of evaluation for its design. The goal of the system is to group referring expressions (identified beforehand in narrative texts) into sets of coreferring expressions that correspond to discourse entities. Several knowledge sources are distinguished, such as referential compatibility between a referring expression and a discourse entity, activation factors for discourse entities, size of working memory, or meta-rules for the creation of discourse entities. For each of them, the theoretical analysis of its relevance is compared to scores obtained through evaluation. After looping through all knowledge sources, an optimal behavior is chosen, then evaluated on test data. The paper also discusses evaluation measures as well as data annotation, and compares the present approach to others in the field.

Download Full-text

Entity Type Recognition for Heterogeneous Semantic Graphs

AI Magazine ◽

10.1609/aimag.v36i1.2569 ◽

2015 ◽

Vol 36 (1) ◽

pp. 75-86 ◽

Cited By ~ 4

Author(s):

Jennifer Sleeman ◽

Tim Finin ◽

Anupam Joshi

Keyword(s):

Machine Learning ◽

Background Knowledge ◽

Knowledge Bases ◽

Heterogeneous Data ◽

Unstructured Data ◽

Supervised Machine Learning ◽

Coreference Resolution ◽

Multiple Sources ◽

Fine Grained ◽

High Level

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.

Download Full-text

Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias throughout the Machine Learning Lifecyle

Computational Linguistics ◽

10.1162/coli_a_00413 ◽

2021 ◽

pp. 1-47

Author(s):

Yang Trista Cao ◽

Hal Daumé

Keyword(s):

Machine Learning ◽

Quality Of Service ◽

English Text ◽

Coreference Resolution ◽

Resolution System ◽

Building Systems ◽

Systematic Biases ◽

Build Systems

Abstract Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing datasets for trans-exclusionary biases, and develop two new datasets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service, stereotyping, and over- or under-representation, especially for binary and non-binary trans users.

Download Full-text

Knowledge Derived From Wikipedia For Computing Semantic Relatedness

Journal of Artificial Intelligence Research ◽

10.1613/jair.2308 ◽

2007 ◽

Vol 30 ◽

pp. 181-212 ◽

Cited By ~ 91

Author(s):

S. P. Ponzetto ◽

M. Strube

Keyword(s):

Machine Learning ◽

Knowledge Base ◽

Search Engine ◽

Semantic Network ◽

Semantic Relatedness ◽

Coreference Resolution ◽

Valuable Resource ◽

Resolution System

Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.

Download Full-text

Multi-pass Sieve Coreference Resolution System for Polish

Lecture Notes in Computer Science - Language, Data, and Knowledge ◽

10.1007/978-3-319-59888-8_20 ◽

2017 ◽

pp. 222-236

Author(s):

Bartłomiej Nitoń ◽

Maciej Ogrodniczuk

Keyword(s):

Coreference Resolution ◽

Resolution System

Download Full-text

Answering Definition Questions Using Web Knowledge Bases

Lecture Notes in Computer Science - Natural Language Processing – IJCNLP 2005 ◽

10.1007/11562214_44 ◽

2005 ◽

pp. 498-506 ◽

Cited By ~ 9

Author(s):

Zhushuo Zhang ◽

Yaqian Zhou ◽

Xuanjing Huang ◽

Lide Wu

Keyword(s):

Knowledge Bases ◽

Web Knowledge

Download Full-text

Improving mention detection for Basque based on a deep error analysis

Natural Language Engineering ◽

10.1017/s1351324916000206 ◽

2016 ◽

Vol 23 (3) ◽

pp. 351-384 ◽

Cited By ~ 2

Author(s):

ANDER SORALUZE ◽

OLATZ ARREGI ◽

XABIER ARREGI ◽

ARANTZA DÍAZ DE ILARRAZA

Keyword(s):

Error Analysis ◽

Coreference Resolution ◽

Linguistic Processing ◽

Exact Matching ◽

Standard Data ◽

Percentage Points ◽

Classification Of Error ◽

F Measure ◽

Error Types

AbstractThis paper presents the improvement process of a mention detector for Basque. The system is rule-based and takes into account the characteristics of mentions in Basque. A classification of error types is proposed based on the errors that occur during mention detection. A deep error analysis distinguishing error types and causes is presented and improvements are proposed. At the final stage, the system obtains an F-measure of 74.57% under the Exact Matching protocol and of 80.57% under Lenient Matching. We also show the performance of the mention detector with gold standard data as input, in order to omit errors caused by the previous stages of linguistic processing. In this scenario, we obtain an F-measure of 85.89% with Strict Matching and of 89.06% with Lenient Matching, i.e., a difference of 11.32 and 8.49 percentage points, respectively. Finally, how improvements in mention detection affect coreference resolution is analysed.

Download Full-text

A high-performance coreference resolution system using a constraint-based multi-agent strategy

10.3115/1220355.1220430 ◽

2004 ◽

Cited By ~ 1

Author(s):

Zhou GuoDong ◽

Su Jian

Keyword(s):

High Performance ◽

Coreference Resolution ◽

Resolution System ◽

Multi Agent

Download Full-text

Why does online collaboration work? Dependent judgments in sequential collaboration improve accuracy

10.31234/osf.io/w4xdk ◽

2021 ◽

Author(s):

Maren Mayer ◽

Daniel W. Heck

Keyword(s):

Semantic Information ◽

Knowledge Bases ◽

Online Collaboration ◽

The Internet ◽

General Knowledge ◽

Wisdom Of Crowds ◽

Sequential Process ◽

High Quality ◽

Improve Accuracy ◽

Working Together

On the internet, people often collaborate to generate extensive knowledge bases such as Wikipedia for semantic information or OpenStreetMap for geographic information. When contributing to such online projects, individual judgments follow a sequential process in which one contributor creates an entry and other contributors have the possibility to modify, extend, and correct the entry by making incremental changes. We refer to this way of working together as sequential collaboration because it is characterized by dependent judgments that are based on the latest judgment available. Since the process of correcting each other in sequential collaboration has not yet been studied systematically, we compare the accuracy of sequential collaboration and wisdom of crowds, the aggregation of a set of independent judgments. In three experiments with groups of four or six individuals, accuracy for answering general knowledge questions increased within sequences of judgments in which participants had the possibility to correct the judgment of the previous contributor. Moreover, the ﬁnal individual judgments in sequential collaboration were slightly more accurate than the averaged judgments in wisdom of crowds. This shows that collaboration can beneﬁt from the dependency of individual judgments, thus explaining why large collaborative online projects often provide data of high quality.

Download Full-text

A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature

10.18653/v1/2021.crac-1.5 ◽

2021 ◽

Author(s):

Andreas van Cranenburgh ◽

Esther Ploeger ◽

Frank van den Berg ◽

Remi Thüss

Keyword(s):

Coreference Resolution ◽

Rule Based ◽

Resolution System ◽

Dutch Literature

Download Full-text