Applications of term identification technology: 
domain description and content characterisation

The identification and extraction of technical terms is one of the better understood and most robust Natural Language Processing (NLP) technologies within the current state of the art of language engineering. In generic information management contexts, terms have been used primarily for procedures seeking to identify a set of phrases that is useful for tasks such as text indexing, computational lexicology, and machine-assisted translation: such tasks make important use of the assumption that terminology is representative of a given domain. This paper discusses an extension of basic terminology identification technology for the application to two higher level semantic tasks: domain description, the specification of the technical domain of a document, and content characterisation, the construction of a compact, coherent and useful representation of the topical content of a text. With these extensions, terminology identification becomes the foundation of an operational environment for document processing and content abstraction.

Download Full-text

Computational Analysis of Storylines

10.1017/9781108854221 ◽

2021 ◽

Keyword(s):

Computational Linguistics ◽

Language Processing ◽

Computational Analysis ◽

State Of The Art ◽

Relevant Information ◽

Event Extraction ◽

Multidisciplinary Research ◽

Narrative Structures ◽

Current State ◽

Event Representations

Event structures are central in Linguistics and Artificial Intelligence research: people can easily refer to changes in the world, identify their participants, distinguish relevant information, and have expectations of what can happen next. Part of this process is based on mechanisms similar to narratives, which are at the heart of information sharing. But it remains difficult to automatically detect events or automatically construct stories from such event representations. This book explores how to handle today's massive news streams and provides multidimensional, multimodal, and distributed approaches, like automated deep learning, to capture events and narrative structures involved in a 'story'. This overview of the current state-of-the-art on event extraction, temporal and casual relations, and storyline extraction aims to establish a new multidisciplinary research community with a common terminology and research agenda. Graduate students and researchers in natural language processing, computational linguistics, and media studies will benefit from this book.

Download Full-text

A System for Extracting Study Design Parameters from Nutritional Genomics Abstracts

Journal of Integrative Bioinformatics ◽

10.1515/jib-2013-222 ◽

2013 ◽

Vol 10 (2) ◽

pp. 82-93 ◽

Cited By ~ 2

Author(s):

Cassidy Kelly ◽

Hui Yang

Keyword(s):

Language Processing ◽

Study Design ◽

State Of The Art ◽

Design Parameters ◽

Regular Expressions ◽

Journal Articles ◽

Current State ◽

Novel Approach ◽

Algorithmic Framework ◽

Nutritional Genomics

Summary The extraction of study design parameters from biomedical journal articles is an important problem in natural language processing (NLP). Such parameters define the characteristics of a study, such as the duration, the number of subjects, and their profile. Here we present a system for extracting study design parameters from sentences in article abstracts. This system will be used as a component of a larger system for creating nutrigenomics networks from articles in the nutritional genomics domain. The algorithms presented consist of manually designed rules expressed either as regular expressions or in terms of sentence parse structure. A number of filters and NLP tools are also utilized within a pipelined algorithmic framework. Using this novel approach, our system performs extraction at a finer level of granularity than comparable systems, while generating results that surpass the current state of the art.

Download Full-text

Discourse structure and language technology

Natural Language Engineering ◽

10.1017/s1351324911000337 ◽

2011 ◽

Vol 18 (4) ◽

pp. 437-490 ◽

Cited By ~ 34

Author(s):

B. WEBBER ◽

M. EGG ◽

V. KORDONI

Keyword(s):

Natural Language ◽

State Of The Art ◽

Discourse Structure ◽

Algorithm Performance ◽

Language Engineering ◽

Language Technology ◽

Current State ◽

Technology Applications ◽

Formal Properties

AbstractAn increasing number of researchers and practitioners in Natural Language Engineering face the prospect of having to work with entire texts, rather than individual sentences. While it is clear that text must have useful structure, its nature may be less clear, making it more difficult to exploit in applications. This survey of work on discourse structure thus provides a primer on the bases of which discourse is structured along with some of their formal properties. It then lays out the current state-of-the-art with respect to algorithms for recognizing these different structures, and how these algorithms are currently being used in Language Technology applications. After identifying resources that should prove useful in improving algorithm performance across a range of languages, we conclude by speculating on future discourse structure-enabled technology.

Download Full-text

Natural language interfaces to databases

The Knowledge Engineering Review ◽

10.1017/s0269888900005476 ◽

1990 ◽

Vol 5 (4) ◽

pp. 225-249 ◽

Cited By ~ 52

Author(s):

Ann Copestake ◽

Karen Sparck Jones

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Central Process ◽

Current State ◽

Natural Language Question ◽

The One ◽

Language Question ◽

And Task

AbstractThis paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.

Download Full-text

Asian language processing: current state-of-the-art

Language Resources and Evaluation ◽

10.1007/s10579-007-9041-9 ◽

2007 ◽

Vol 40 (3-4) ◽

pp. 203-218

Author(s):

Chu-Ren Huang ◽

Takenobu Tokunaga ◽

Sophia Yat Mei Lee

Keyword(s):

Language Processing ◽

State Of The Art ◽

Current State ◽

Asian Language

Download Full-text

Anniversary article: Then and now: 25 years of progress in natural language engineering

Natural Language Engineering ◽

10.1017/s1351324919000081 ◽

2019 ◽

Vol 25 (3) ◽

pp. 405-418

Author(s):

John Tait ◽

Yorick Wilks

Keyword(s):

Natural Language ◽

Speech Processing ◽

State Of The Art ◽

Language Engineering ◽

Part Of Speech Tagging ◽

Ethical Implications ◽

Part Of Speech ◽

Current State ◽

Metaphor Processing ◽

Speech Tagging

AbstractThe paper reviews the state of the art of natural language engineering (NLE) around 1995, when this journal first appeared, and makes a critical comparison with the current state of the art in 2018, as we prepare the 25th Volume. Specifically the then state of the art in parsing, information extraction, chatbots, and dialogue systems, speech processing and machine translation are briefly reviewed. The emergence in the 1980s and 1990s of machine learning (ML) and statistical methods (SM) is noted. Important trends and areas of progress in the subsequent years are identified. In particular, the move to the use of n-grams or skip grams and/or chunking with part of speech tagging and away from whole sentence parsing is noted, as is the increasing dominance of SM and ML. Some outstanding issues which merit further research are briefly pointed out, including metaphor processing and the ethical implications of NLE.

Download Full-text

An Overview of Biomolecular Event Extraction from Scientific Documents

Computational and Mathematical Methods in Medicine ◽

10.1155/2015/571381 ◽

2015 ◽

Vol 2015 ◽

pp. 1-19 ◽

Cited By ~ 2

Author(s):

Jorge A. Vanegas ◽

Sérgio Matos ◽

Fabio González ◽

José L. Oliveira

Keyword(s):

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Biomedical Literature ◽

Event Extraction ◽

Automatic Extraction ◽

Biological Processes ◽

Scientific Texts ◽

Research Areas ◽

Current State

This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed.

Download Full-text

Meemi: A simple method for post-processing and integrating cross-lingual word embeddings

Natural Language Engineering ◽

10.1017/s1351324921000280 ◽

2021 ◽

pp. 1-23

Author(s):

Yerai Doval ◽

Jose Camacho-Collados ◽

Luis Espinosa-Anke ◽

Steven Schockaert

Keyword(s):

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Orthogonal Transformation ◽

Word Embeddings ◽

Initial Alignment ◽

Simple Method ◽

Word Similarity ◽

Current State ◽

Cross Lingual

Abstract Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Current state-of-the-art approaches learn these embeddings by aligning two disjoint monolingual vector spaces through an orthogonal transformation which preserves the structure of the monolingual counterparts. In this work, we propose to apply an additional transformation after this initial alignment step, which aims to bring the vector representations of a given word and its translations closer to their average. Since this additional transformation is non-orthogonal, it also affects the structure of the monolingual spaces. We show that our approach both improves the integration of the monolingual spaces and the quality of the monolingual spaces themselves. Furthermore, because our transformation can be applied to an arbitrary number of languages, we are able to effectively obtain a truly multilingual space. The resulting (monolingual and multilingual) spaces show consistent gains over the current state-of-the-art in standard intrinsic tasks, namely dictionary induction and word similarity, as well as in extrinsic tasks such as cross-lingual hypernym discovery and cross-lingual natural language inference.

Download Full-text

We are not ready yet: limitations of transfer learning for Disease Named Entity Recognition

10.1101/2021.07.11.451939 ◽

2021 ◽

Author(s):

Lisa Langnickel ◽

Juliane Fluck

Keyword(s):

Transfer Learning ◽

Language Processing ◽

State Of The Art ◽

Named Entity Recognition ◽

Clinical Applications ◽

Entity Recognition ◽

Data Sets ◽

Named Entity ◽

Current State ◽

Biomedical Named Entity Recognition

Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on available test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize. Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data. We therefore argue that there is a need for larger annotated data sets for training and testing.

Download Full-text

Evidence from neurolinguistic methodologies: Can it actually inform linguistic/language acquisition theories and translate to evidence-based applications?

Second language Research ◽

10.1177/0267658316644010 ◽

2016 ◽

Vol 34 (1) ◽

pp. 125-143 ◽

Cited By ~ 16

Author(s):

Leah Roberts ◽

Jorge González Alonso ◽

Christos Pliatsikas ◽

Jason Rothman

Keyword(s):

Language Acquisition ◽

Language Processing ◽

State Of The Art ◽

Point Of View ◽

Language Teachers ◽

Evidence Based ◽

Special Issue ◽

L2 Acquisition ◽

The Past ◽

Current State

This special issue is a testament to the recent burgeoning interest by theoretical linguists, language acquisitionists and teaching practitioners in the neuroscience of language. It offers a highly valuable, state-of-the-art overview of the neurophysiological methods that are currently being applied to questions in the field of second language (L2) acquisition, teaching and processing. Research in the area of neurolinguistics has developed dramatically in the past 20 years, providing a wealth of exciting findings, many of which are discussed in the articles in this issue of the journal. The goal of this commentary is twofold. The first is to critically assess the current state of neurolinguistic data from the point of view of language acquisition and processing – informed by the articles that comprise this special issue and the literature as a whole – pondering how the neuroscience of language/processing might inform us with respect to linguistic and language acquisition theories. The second goal is to offer some links from implications of exploring the first goal towards informing language teachers and the creation of linguistically and neurolinguistically-informed evidence-based pedagogies for non-native language teaching.

Download Full-text