Coreference Resolution: Toward End-to-End and Cross-Lingual Systems

André Ferreira Cruz; Gil Rocha; Henrique Lopes Cardoso

doi:10.3390/info11020074

Coreference Resolution: Toward End-to-End and Cross-Lingual Systems

Information ◽

10.3390/info11020074 ◽

2020 ◽

Vol 11 (2) ◽

pp. 74 ◽

Cited By ~ 1

Author(s):

André Ferreira Cruz ◽

Gil Rocha ◽

Henrique Lopes Cardoso

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Coreference Resolution ◽

Language Understanding ◽

External Resources ◽

End To End ◽

Cross Lingual ◽

Open Issues

The task of coreference resolution has attracted considerable attention in the literature due to its importance in deep language understanding and its potential as a subtask in a variety of complex natural language processing problems. In this study, we outlined the field’s terminology, describe existing metrics, their differences and shortcomings, as well as the available corpora and external resources. We analyzed existing state-of-the-art models and approaches, and reviewed recent advances and trends in the field, namely end-to-end systems that jointly model different subtasks of coreference resolution, and cross-lingual systems that aim to overcome the challenges of less-resourced languages. Finally, we discussed the main challenges and open issues faced by coreference resolution systems.

Download Full-text

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00313 ◽

2020 ◽

Vol 8 ◽

pp. 264-280

Author(s):

Sascha Rothe ◽

Shashi Narayan ◽

Aliaksei Severyn

Keyword(s):

Natural Language Processing ◽

Empirical Study ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Text Summarization ◽

Neural Models ◽

Language Understanding ◽

Sequence Generation ◽

Compute Time

Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2, and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion.

Download Full-text

Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/521 ◽

2021 ◽

Author(s):

Rexhina Blloshmi ◽

Simone Conia ◽

Rocco Tripodi ◽

Roberto Navigli

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Great Success ◽

Semantic Role ◽

Semantic Role Labeling ◽

Complex Predicate ◽

Input Sentence ◽

End To End

Despite the recent great success of the sequence-to-sequence paradigm in Natural Language Processing, the majority of current studies in Semantic Role Labeling (SRL) still frame the problem as a sequence labeling task. In this paper we go against the flow and propose GSRL (Generating Senses and RoLes), the first sequence-to-sequence model for end-to-end SRL. Our approach benefits from recently-proposed decoder-side pretraining techniques to generate both sense and role labels for all the predicates in an input sentence at once, in an end-to-end fashion. Evaluated on standard gold benchmarks, GSRL achieves state-of-the-art results in both dependency- and span-based English SRL, proving empirically that our simple generation-based model can learn to produce complex predicate-argument structures. Finally, we propose a framework for evaluating the robustness of an SRL model in a variety of synthetic low-resource scenarios which can aid human annotators in the creation of better, more diverse, and more challenging gold datasets. We release GSRL at github.com/SapienzaNLP/gsrl.

Download Full-text

A State-of-the-Art Review of Nigerian Languages Natural Language Processing Research

Advances in IT Standards and Standardization Research - Developing Countries and Technology Inclusion in the 21st Century Information Society ◽

10.4018/978-1-7998-3468-7.ch008 ◽

2021 ◽

pp. 147-167

Author(s):

Toluwase Victor Asubiaro ◽

Ebelechukwu Gloria Igwe

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Optical Character Recognition ◽

State Of The Art ◽

Resource Development ◽

African Languages ◽

Low Resource ◽

Research Areas ◽

Cross Lingual

African languages, including those that are natives to Nigeria, are low-resource languages because they lack basic computing resources such as language-dependent hardware keyboard. Speakers of these low-resource languages are therefore unfairly deprived of information access on the internet. There is no information about the level of progress that has been made on the computation of Nigerian languages. Hence, this chapter presents a state-of-the-art review of Nigerian languages natural language processing. The review reveals that only four Nigerian languages; Hausa, Ibibio, Igbo, and Yoruba have been significantly studied in published NLP papers. Creating alternatives to hardware keyboard is one of the most popular research areas, and means such as automatic diacritics restoration, virtual keyboard, and optical character recognition have been explored. There was also an inclination towards speech and computational morphological analysis. Resource development and knowledge representation modeling of the languages using rapid resource development and cross-lingual methods are recommended.

Download Full-text

Neural Discourse Segmentation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/949 ◽

2019 ◽

Author(s):

Jing Li

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Word Embeddings ◽

External Knowledge ◽

Coherence Relations ◽

End To End ◽

Discourse Units

Identifying discourse structures and coherence relations in a piece of text is a fundamental task in natural language processing. The first step of this process is segmenting sentences into clause-like units called elementary discourse units (EDUs). Traditional solutions to discourse segmentation heavily rely on carefully designed features. In this demonstration, we present SegBot, a system to split a given piece of text into sequence of EDUs by using an end-to-end neural segmentation model. Our model does not require hand-crafted features or external knowledge except word embeddings, yet it outperforms state-of-the-art solutions to discourse segmentation.

Download Full-text

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text

A WORD-BASED CHINESE LANGUAGE UNDERSTANDING SYSTEM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001488000042 ◽

1988 ◽

Vol 02 (01) ◽

pp. 25-35

Author(s):

TIAN-SHUN YAO

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chinese Language ◽

Computer Programs ◽

World Knowledge ◽

Knowledge Source ◽

Language Understanding ◽

Language Analysis ◽

The World

With the word-based theory of natural language processing, a word-based Chinese language understanding system has been developed. In the light of psychological language analysis and the features of the Chinese language, this theory of natural language processing is presented with the description of the computer programs based on it. The heart of the system is to define a Total Information Dictionary and the World Knowledge Source used in the system. The purpose of this research is to develop a system which can understand not only Chinese sentences but also the whole text.

Download Full-text

Guru

Cross-Disciplinary Advances in Applied Natural Language Processing ◽

10.4018/978-1-61350-447-5.ch011 ◽

2012 ◽

pp. 156-171 ◽

Cited By ~ 8

Author(s):

Andrew M. Olney ◽

Natalie K. Person ◽

Arthur C. Graesser

Keyword(s):

Natural Language Processing ◽

Knowledge Representation ◽

Natural Language ◽

Language Processing ◽

Natural Language Understanding ◽

Natural Language Generation ◽

Language Understanding ◽

Language Generation ◽

Processing Techniques

The authors discuss Guru, a conversational expert ITS. Guru is designed to mimic expert human tutors using advanced applied natural language processing techniques including natural language understanding, knowledge representation, and natural language generation.

Download Full-text

Reasoning about Quantities in Natural Language

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00118 ◽

2015 ◽

Vol 3 ◽

pp. 1-13 ◽

Cited By ~ 11

Author(s):

Subhro Roy ◽

Tim Vieira ◽

Dan Roth

Keyword(s):

Elementary School ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Computational Approach ◽

Quantitative Reasoning ◽

Language Understanding ◽

Numerical Reasoning ◽

Key Steps

Little work from the Natural Language Processing community has targeted the role of quantities in Natural Language Understanding. This paper takes some key steps towards facilitating reasoning about quantities expressed in natural language. We investigate two different tasks of numerical reasoning. First, we consider Quantity Entailment, a new task formulated to understand the role of quantities in general textual inference tasks. Second, we consider the problem of automatically understanding and solving elementary school math word problems. In order to address these quantitative reasoning problems we first develop a computational approach which we show to successfully recognize and normalize textual expressions of quantities. We then use these capabilities to further develop algorithms to assist reasoning in the context of the aforementioned tasks.

Download Full-text

Convolution–deconvolution word embedding: An end-to-end multi-prototype fusion embedding method for natural language processing

Information Fusion ◽

10.1016/j.inffus.2019.06.009 ◽

2020 ◽

Vol 53 ◽

pp. 112-122 ◽

Cited By ~ 9

Author(s):

Kai Shuang ◽

Zhixuan Zhang ◽

Jonathan Loo ◽

Sen Su

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Embedding ◽

Embedding Method ◽

End To End

Download Full-text

Textual entailment graphs

Natural Language Engineering ◽

10.1017/s1351324915000108 ◽

2015 ◽

Vol 21 (5) ◽

pp. 699-724 ◽

Cited By ~ 6

Author(s):

LILI KOTLERMAN ◽

IDO DAGAN ◽

BERNARDO MAGNINI ◽

LUISA BENTIVOGLI

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Gold Standard ◽

State Of The Art ◽

Text Analytics ◽

Joint Work ◽

Gold Standard Dataset ◽

Textual Entailment ◽

Interesting Task

AbstractIn this work, we present a novel type of graphs for natural language processing (NLP), namely textual entailment graphs (TEGs). We describe the complete methodology we developed for the construction of such graphs and provide some baselines for this task by evaluating relevant state-of-the-art technology. We situate our research in the context of text exploration, since it was motivated by joint work with industrial partners in the text analytics area. Accordingly, we present our motivating scenario and the first gold-standard dataset of TEGs. However, while our own motivation and the dataset focus on the text exploration setting, we suggest that TEGs can have different usages and suggest that automatic creation of such graphs is an interesting task for the community.

Download Full-text