State of the art in Computational Linguistics

Author(s):  
Giacomo Ferrari
2013 ◽  
Vol 39 (4) ◽  
pp. 885-916 ◽  
Author(s):  
Heeyoung Lee ◽  
Angel Chang ◽  
Yves Peirsman ◽  
Nathanael Chambers ◽  
Mihai Surdeanu ◽  
...  

We propose a new deterministic approach to coreference resolution that combines the global information and precise features of modern machine-learning models with the transparency and modularity of deterministic, rule-based systems. Our sieve architecture applies a battery of deterministic coreference models one at a time from highest to lowest precision, where each model builds on the previous model's cluster output. The two stages of our sieve-based architecture, a mention detection stage that heavily favors recall, followed by coreference sieves that are precision-oriented, offer a powerful way to achieve both high precision and high recall. Further, our approach makes use of global information through an entity-centric model that encourages the sharing of features across all mentions that point to the same real-world entity. Despite its simplicity, our approach gives state-of-the-art performance on several corpora and genres, and has also been incorporated into hybrid state-of-the-art coreference systems for Chinese and Arabic. Our system thus offers a new paradigm for combining knowledge in rule-based systems that has implications throughout computational linguistics.


2019 ◽  
Author(s):  
Stefan L. Frank

Although computational models can simulate aspects of human sentence processing, research on this topic has remained almost exclusively limited to the single language case. The current review presents an overview of the state of the art in computational cognitive models of sentence processing, and discusses how recent sentence-processing models can be used to study bi- and multilingualism. Recent results from cognitive modelling and computational linguistics suggest that phenomena specific to bilingualism can emerge from systems that have no dedicated components for handling multiple languages. Hence, accounting for human bi-/multilingualism may not require models that are much more sophisticated than those for the monolingual case.


2021 ◽  

Event structures are central in Linguistics and Artificial Intelligence research: people can easily refer to changes in the world, identify their participants, distinguish relevant information, and have expectations of what can happen next. Part of this process is based on mechanisms similar to narratives, which are at the heart of information sharing. But it remains difficult to automatically detect events or automatically construct stories from such event representations. This book explores how to handle today's massive news streams and provides multidimensional, multimodal, and distributed approaches, like automated deep learning, to capture events and narrative structures involved in a 'story'. This overview of the current state-of-the-art on event extraction, temporal and casual relations, and storyline extraction aims to establish a new multidisciplinary research community with a common terminology and research agenda. Graduate students and researchers in natural language processing, computational linguistics, and media studies will benefit from this book.


2019 ◽  
Author(s):  
Francis M. Tyers ◽  
Jonathan N. Washington ◽  
Darya Kavitskaya ◽  
Memduh Gökırmak

This paper describes a weighted finite-state morphological transducer for Crimean Tatar able to analyse and generate in both Latin and Cyrillic orthographies. This transducer was developed by a team including a community member and language expert, a field linguist who works with the community, a Turkologist with computational linguistics expertise, and an experienced computational linguist with Turkic expertise. Dealing with two orthographic systems in the same transducer is challenging as they employ different strategies to deal with the spelling of loan words and encode the full range of the language's phonemes and their interaction. We develop the core transducer using the Latin orthography and then design a separate transliteration transducer to map the surface forms to Cyrillic. To help control the non-determinism in the orthographic mapping, we use weights to prioritise forms seen in the corpus. We perform an evaluation of all components of the system, finding an accuracy above 90% for morphological analysis and near 90% for orthographic conversion. This comprises the state of the art for Crimean Tatar morphological modelling, and, to our knowledge, is the first biscriptual single morphological transducer for any language.


2010 ◽  
Vol 3 ◽  
Author(s):  
Emily M. Bender ◽  
D. Terence Langendoen

In this paper, we overview the ways in which computational methods can serve the goals of analysis and theory development in linguistics, and encourage the reader to become involved in the emerging cyberinfrastructure for linguistics. We survey examples from diverse subfields of how computational methods are already being used, describe the current state of the art in cyberinfrastructure for linguistics, sketch a pie-in-the-sky view of where the field could go, and outline steps that linguists can take now to bring about better access to and use of linguistic data through cyberinfrastructure.


2020 ◽  
Vol 9 (1) ◽  
pp. 1-18
Author(s):  
Steve Oswald ◽  
Sara Greco ◽  
Johanna Miecznikowski ◽  
Chiara Pollaroli ◽  
Andrea Rocci

Abstract This special issue aims to explore the semantic and pragmatic dimensions of meaning in terms of their significance and relevance in the study of argumentation. Accordingly, the contributors to the project, who have all presented their work during the 2nd Argumentation and Language conference, which took place in Lugano in February 2018,1 have been specifically instructed to produce papers which explicitly tackle the importance of the study of meaning for that of argumentative practices. All papers therefore cover at least one aspect of this complex relationship between argumentation and meaning, which contributes to delivering a state-of-the-art panorama on the issue. Drawing from computational linguistics, semantics, pragmatics and discourse analysis, the contributions to this special issue will illuminate how the study of meaning in its different forms may provide valuable insights for the study of people’s argumentative practices in different contexts, ranging from the political to the private sphere. This introductory discussion tackles specific aspects of the intricate relationship between pragmatic inference and argumentative inference – that is, between meaning and argumentation –, provides a brief survey of existing interfaces between the study of meaning and that of argumentation, and concludes with a presentation of the contributions to this special issue.


1995 ◽  
Vol 1 (1) ◽  
pp. 29-81 ◽  
Author(s):  
I. Androutsopoulos ◽  
G.D. Ritchie ◽  
P. Thanisch

AbstractThis paper is an introduction to natural language interfaces to databases (NLIDBS). A brief overview of the history of NLIDBS is first given. Some advantages and disadvantages of NLIDBS are then discussed, comparing NLIDBS to formal query languages, form-based interfaces, and graphical interfaces. An introduction to some of the linguistic problems NLIDBS have to confront follows, for the benefit of readers less familiar with computational linguistics. The discussion then moves on to NLIDB architectures, portability issues, restricted natural language input systems (including menu-based NLIDBS), and NLIDBS with reasoning capabilities. Some less explored areas of NLIDB research are then presented, namely database updates, meta-knowledge questions, temporal questions, and multi-modal NLIDBS. The paper ends with reflections on the current state of the art.


Author(s):  
Fazel Keshtkar ◽  
Ledong Shi ◽  
Syed Ahmad Chan Bukhari

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.


2018 ◽  
Vol 24 (5) ◽  
pp. 649-676 ◽  
Author(s):  
XURI TANG

AbstractThis paper reviews the state-of-the-art of one emergent field in computational linguistics—semantic change computation. It summarizes the literature by proposing a framework that identifies five components in the field: diachronic corpus, diachronic word sense characterization, change modelling, evaluation and data visualization. Despite its potentials, the review shows that current studies are mainly focused on testifying hypotheses of semantic change from theoretical linguistics and that several core issues remain to be tackled: the need of diachronic corpora for languages other than English, the comparison and development of approaches to diachronic word sense characterization and change modelling, the need of comprehensive evaluation data and further exploration of data visualization techniques for hypothesis justification.


2017 ◽  
Vol 43 (1) ◽  
pp. 125-179 ◽  
Author(s):  
Ivan Habernal ◽  
Iryna Gurevych

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.


Sign in / Sign up

Export Citation Format

Share Document