State of the art in Computational Linguistics

We propose a new deterministic approach to coreference resolution that combines the global information and precise features of modern machine-learning models with the transparency and modularity of deterministic, rule-based systems. Our sieve architecture applies a battery of deterministic coreference models one at a time from highest to lowest precision, where each model builds on the previous model's cluster output. The two stages of our sieve-based architecture, a mention detection stage that heavily favors recall, followed by coreference sieves that are precision-oriented, offer a powerful way to achieve both high precision and high recall. Further, our approach makes use of global information through an entity-centric model that encourages the sharing of features across all mentions that point to the same real-world entity. Despite its simplicity, our approach gives state-of-the-art performance on several corpora and genres, and has also been incorporated into hybrid state-of-the-art coreference systems for Chinese and Arabic. Our system thus offers a new paradigm for combining knowledge in rule-based systems that has implications throughout computational linguistics.

Download Full-text

Towards computational models of multilingual sentence processing

10.31234/osf.io/kefmz ◽

2019 ◽

Author(s):

Stefan L. Frank

Keyword(s):

Computational Linguistics ◽

Sentence Processing ◽

Computational Models ◽

State Of The Art ◽

Cognitive Models ◽

The State ◽

Cognitive Modelling ◽

Current Review ◽

Multiple Languages ◽

Human Sentence Processing

Although computational models can simulate aspects of human sentence processing, research on this topic has remained almost exclusively limited to the single language case. The current review presents an overview of the state of the art in computational cognitive models of sentence processing, and discusses how recent sentence-processing models can be used to study bi- and multilingualism. Recent results from cognitive modelling and computational linguistics suggest that phenomena specific to bilingualism can emerge from systems that have no dedicated components for handling multiple languages. Hence, accounting for human bi-/multilingualism may not require models that are much more sophisticated than those for the monolingual case.

Download Full-text

Computational Analysis of Storylines

10.1017/9781108854221 ◽

2021 ◽

Keyword(s):

Computational Linguistics ◽

Language Processing ◽

Computational Analysis ◽

State Of The Art ◽

Relevant Information ◽

Event Extraction ◽

Multidisciplinary Research ◽

Narrative Structures ◽

Current State ◽

Event Representations

Event structures are central in Linguistics and Artificial Intelligence research: people can easily refer to changes in the world, identify their participants, distinguish relevant information, and have expectations of what can happen next. Part of this process is based on mechanisms similar to narratives, which are at the heart of information sharing. But it remains difficult to automatically detect events or automatically construct stories from such event representations. This book explores how to handle today's massive news streams and provides multidimensional, multimodal, and distributed approaches, like automated deep learning, to capture events and narrative structures involved in a 'story'. This overview of the current state-of-the-art on event extraction, temporal and casual relations, and storyline extraction aims to establish a new multidisciplinary research community with a common terminology and research agenda. Graduate students and researchers in natural language processing, computational linguistics, and media studies will benefit from this book.

Download Full-text

A Biscriptual Morphological Transducer for Crimean Tatar

10.33011/computel.v1i.423 ◽

2019 ◽

Author(s):

Francis M. Tyers ◽

Jonathan N. Washington ◽

Darya Kavitskaya ◽

Memduh Gökırmak

Keyword(s):

Computational Linguistics ◽

Morphological Analysis ◽

State Of The Art ◽

Full Range ◽

The State ◽

Loan Words ◽

The Core ◽

Finite State ◽

Morphological Modelling ◽

Crimean Tatar

This paper describes a weighted finite-state morphological transducer for Crimean Tatar able to analyse and generate in both Latin and Cyrillic orthographies. This transducer was developed by a team including a community member and language expert, a field linguist who works with the community, a Turkologist with computational linguistics expertise, and an experienced computational linguist with Turkic expertise. Dealing with two orthographic systems in the same transducer is challenging as they employ different strategies to deal with the spelling of loan words and encode the full range of the language's phonemes and their interaction. We develop the core transducer using the Latin orthography and then design a separate transliteration transducer to map the surface forms to Cyrillic. To help control the non-determinism in the orthographic mapping, we use weights to prioritise forms seen in the corpus. We perform an evaluation of all components of the system, finding an accuracy above 90% for morphological analysis and near 90% for orthographic conversion. This comprises the state of the art for Crimean Tatar morphological modelling, and, to our knowledge, is the first biscriptual single morphological transducer for any language.

Download Full-text

Computational Linguistics in Support of Linguistic Theory

Linguistic Issues in Language Technology ◽

10.33011/lilt.v3i.1213 ◽

2010 ◽

Vol 3 ◽

Author(s):

Emily M. Bender ◽

D. Terence Langendoen

Keyword(s):

Computational Methods ◽

Computational Linguistics ◽

State Of The Art ◽

Theory Development ◽

Linguistic Theory ◽

Linguistic Data ◽

Current State

In this paper, we overview the ways in which computational methods can serve the goals of analysis and theory development in linguistics, and encourage the reader to become involved in the emerging cyberinfrastructure for linguistics. We survey examples from diverse subfields of how computational methods are already being used, describe the current state of the art in cyberinfrastructure for linguistics, sketch a pie-in-the-sky view of where the field could go, and outline steps that linguists can take now to bring about better access to and use of linguistic data through cyberinfrastructure.

Download Full-text

Argumentation and meaning

Journal of Argumentation in Context ◽

10.1075/jaic.00005.osw ◽

2020 ◽

Vol 9 (1) ◽

pp. 1-18

Author(s):

Steve Oswald ◽

Sara Greco ◽

Johanna Miecznikowski ◽

Chiara Pollaroli ◽

Andrea Rocci

Keyword(s):

Discourse Analysis ◽

Computational Linguistics ◽

State Of The Art ◽

Private Sphere ◽

The Political ◽

Complex Relationship ◽

Special Issue ◽

Pragmatic Inference ◽

Introductory Discussion

Abstract This special issue aims to explore the semantic and pragmatic dimensions of meaning in terms of their significance and relevance in the study of argumentation. Accordingly, the contributors to the project, who have all presented their work during the 2nd Argumentation and Language conference, which took place in Lugano in February 2018,1 have been specifically instructed to produce papers which explicitly tackle the importance of the study of meaning for that of argumentative practices. All papers therefore cover at least one aspect of this complex relationship between argumentation and meaning, which contributes to delivering a state-of-the-art panorama on the issue. Drawing from computational linguistics, semantics, pragmatics and discourse analysis, the contributions to this special issue will illuminate how the study of meaning in its different forms may provide valuable insights for the study of people’s argumentative practices in different contexts, ranging from the political to the private sphere. This introductory discussion tackles specific aspects of the intricate relationship between pragmatic inference and argumentative inference – that is, between meaning and argumentation –, provides a brief survey of existing interfaces between the study of meaning and that of argumentation, and concludes with a presentation of the contributions to this special issue.

Download Full-text

Natural language interfaces to databases – an introduction

Natural Language Engineering ◽

10.1017/s135132490000005x ◽

1995 ◽

Vol 1 (1) ◽

pp. 29-81 ◽

Cited By ~ 283

Author(s):

I. Androutsopoulos ◽

G.D. Ritchie ◽

P. Thanisch

Keyword(s):

Natural Language ◽

Computational Linguistics ◽

State Of The Art ◽

Query Languages ◽

Natural Language Interfaces ◽

Advantages And Disadvantages ◽

Current State ◽

Database Updates ◽

Graphical Interfaces ◽

History Of

AbstractThis paper is an introduction to natural language interfaces to databases (NLIDBS). A brief overview of the history of NLIDBS is first given. Some advantages and disadvantages of NLIDBS are then discussed, comparing NLIDBS to formal query languages, form-based interfaces, and graphical interfaces. An introduction to some of the linguistic problems NLIDBS have to confront follows, for the benefit of readers less familiar with computational linguistics. The discussion then moves on to NLIDB architectures, portability issues, restricted natural language input systems (including menu-based NLIDBS), and NLIDBS with reasoning capabilities. Some less explored areas of NLIDB research are then presented, namely database updates, meta-knowledge questions, temporal questions, and multi-modal NLIDBS. The paper ends with reflections on the current state of the art.

Download Full-text

The Semantics and Collocations Relation in Food Reviews

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128372 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Fazel Keshtkar ◽

Ledong Shi ◽

Syed Ahmad Chan Bukhari

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Topic Modeling ◽

State Of The Art ◽

Semantic Relation ◽

The Other ◽

Good Place ◽

The Common

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.

Download Full-text

A state-of-the-art of semantic change computation

Natural Language Engineering ◽

10.1017/s1351324918000220 ◽

2018 ◽

Vol 24 (5) ◽

pp. 649-676 ◽

Cited By ~ 5

Author(s):

XURI TANG

Keyword(s):

Data Visualization ◽

Computational Linguistics ◽

Comprehensive Evaluation ◽

State Of The Art ◽

The State ◽

Word Sense ◽

Semantic Change ◽

Evaluation Data ◽

Visualization Techniques ◽

Core Issues

AbstractThis paper reviews the state-of-the-art of one emergent field in computational linguistics—semantic change computation. It summarizes the literature by proposing a framework that identifies five components in the field: diachronic corpus, diachronic word sense characterization, change modelling, evaluation and data visualization. Despite its potentials, the review shows that current studies are mainly focused on testifying hypotheses of semantic change from theoretical linguistics and that several core issues remain to be tackled: the need of diachronic corpora for languages other than English, the comparison and development of approaches to diachronic word sense characterization and change modelling, the need of comprehensive evaluation data and further exploration of data visualization techniques for hypothesis justification.

Download Full-text

Argumentation Mining in User-Generated Web Discourse

Computational Linguistics ◽

10.1162/coli_a_00276 ◽

2017 ◽

Vol 43 (1) ◽

pp. 125-179 ◽

Cited By ~ 34

Author(s):

Ivan Habernal ◽

Iryna Gurevych

Keyword(s):

Computational Linguistics ◽

State Of The Art ◽

Research Field ◽

Source Codes ◽

Machine Learning Methods ◽

Argumentation Mining ◽

Gold Standard Corpus ◽

Data Source ◽

Multiple Domains ◽

Annotation Study

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.

Download Full-text