Asian language processing: current state-of-the-art

2007 ◽  
Vol 40 (3-4) ◽  
pp. 203-218
Author(s):  
Chu-Ren Huang ◽  
Takenobu Tokunaga ◽  
Sophia Yat Mei Lee
2021 ◽  

Event structures are central in Linguistics and Artificial Intelligence research: people can easily refer to changes in the world, identify their participants, distinguish relevant information, and have expectations of what can happen next. Part of this process is based on mechanisms similar to narratives, which are at the heart of information sharing. But it remains difficult to automatically detect events or automatically construct stories from such event representations. This book explores how to handle today's massive news streams and provides multidimensional, multimodal, and distributed approaches, like automated deep learning, to capture events and narrative structures involved in a 'story'. This overview of the current state-of-the-art on event extraction, temporal and casual relations, and storyline extraction aims to establish a new multidisciplinary research community with a common terminology and research agenda. Graduate students and researchers in natural language processing, computational linguistics, and media studies will benefit from this book.


2013 ◽  
Vol 10 (2) ◽  
pp. 82-93 ◽  
Author(s):  
Cassidy Kelly ◽  
Hui Yang

Summary The extraction of study design parameters from biomedical journal articles is an important problem in natural language processing (NLP). Such parameters define the characteristics of a study, such as the duration, the number of subjects, and their profile. Here we present a system for extracting study design parameters from sentences in article abstracts. This system will be used as a component of a larger system for creating nutrigenomics networks from articles in the nutritional genomics domain. The algorithms presented consist of manually designed rules expressed either as regular expressions or in terms of sentence parse structure. A number of filters and NLP tools are also utilized within a pipelined algorithmic framework. Using this novel approach, our system performs extraction at a finer level of granularity than comparable systems, while generating results that surpass the current state of the art.


1990 ◽  
Vol 5 (4) ◽  
pp. 225-249 ◽  
Author(s):  
Ann Copestake ◽  
Karen Sparck Jones

AbstractThis paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.


1999 ◽  
Vol 5 (1) ◽  
pp. 17-44 ◽  
Author(s):  
BRANIMIR BOGURAEV ◽  
CHRISTOPHER KENNEDY

The identification and extraction of technical terms is one of the better understood and most robust Natural Language Processing (NLP) technologies within the current state of the art of language engineering. In generic information management contexts, terms have been used primarily for procedures seeking to identify a set of phrases that is useful for tasks such as text indexing, computational lexicology, and machine-assisted translation: such tasks make important use of the assumption that terminology is representative of a given domain. This paper discusses an extension of basic terminology identification technology for the application to two higher level semantic tasks: domain description, the specification of the technical domain of a document, and content characterisation, the construction of a compact, coherent and useful representation of the topical content of a text. With these extensions, terminology identification becomes the foundation of an operational environment for document processing and content abstraction.


2015 ◽  
Vol 2015 ◽  
pp. 1-19 ◽  
Author(s):  
Jorge A. Vanegas ◽  
Sérgio Matos ◽  
Fabio González ◽  
José L. Oliveira

This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed.


2021 ◽  
pp. 1-23
Author(s):  
Yerai Doval ◽  
Jose Camacho-Collados ◽  
Luis Espinosa-Anke ◽  
Steven Schockaert

Abstract Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Current state-of-the-art approaches learn these embeddings by aligning two disjoint monolingual vector spaces through an orthogonal transformation which preserves the structure of the monolingual counterparts. In this work, we propose to apply an additional transformation after this initial alignment step, which aims to bring the vector representations of a given word and its translations closer to their average. Since this additional transformation is non-orthogonal, it also affects the structure of the monolingual spaces. We show that our approach both improves the integration of the monolingual spaces and the quality of the monolingual spaces themselves. Furthermore, because our transformation can be applied to an arbitrary number of languages, we are able to effectively obtain a truly multilingual space. The resulting (monolingual and multilingual) spaces show consistent gains over the current state-of-the-art in standard intrinsic tasks, namely dictionary induction and word similarity, as well as in extrinsic tasks such as cross-lingual hypernym discovery and cross-lingual natural language inference.


2021 ◽  
Author(s):  
Lisa Langnickel ◽  
Juliane Fluck

Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on available test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize. Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data. We therefore argue that there is a need for larger annotated data sets for training and testing.


2016 ◽  
Vol 34 (1) ◽  
pp. 125-143 ◽  
Author(s):  
Leah Roberts ◽  
Jorge González Alonso ◽  
Christos Pliatsikas ◽  
Jason Rothman

This special issue is a testament to the recent burgeoning interest by theoretical linguists, language acquisitionists and teaching practitioners in the neuroscience of language. It offers a highly valuable, state-of-the-art overview of the neurophysiological methods that are currently being applied to questions in the field of second language (L2) acquisition, teaching and processing. Research in the area of neurolinguistics has developed dramatically in the past 20 years, providing a wealth of exciting findings, many of which are discussed in the articles in this issue of the journal. The goal of this commentary is twofold. The first is to critically assess the current state of neurolinguistic data from the point of view of language acquisition and processing – informed by the articles that comprise this special issue and the literature as a whole – pondering how the neuroscience of language/processing might inform us with respect to linguistic and language acquisition theories. The second goal is to offer some links from implications of exploring the first goal towards informing language teachers and the creation of linguistically and neurolinguistically-informed evidence-based pedagogies for non-native language teaching.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2312
Author(s):  
Tom Bolton ◽  
Tooska Dargahi ◽  
Sana Belguith ◽  
Mabrook S. Al-Rakhami ◽  
Ali Hassan Sodhro

Since the purchase of Siri by Apple, and its release with the iPhone 4S in 2011, virtual assistants (VAs) have grown in number and popularity. The sophisticated natural language processing and speech recognition employed by VAs enables users to interact with them conversationally, almost as they would with another human. To service user voice requests, VAs transmit large amounts of data to their vendors; these data are processed and stored in the Cloud. The potential data security and privacy issues involved in this process provided the motivation to examine the current state of the art in VA research. In this study, we identify peer-reviewed literature that focuses on security and privacy concerns surrounding these assistants, including current trends in addressing how voice assistants are vulnerable to malicious attacks and worries that the VA is recording without the user’s knowledge or consent. The findings show that not only are these worries manifold, but there is a gap in the current state of the art, and no current literature reviews on the topic exist. This review sheds light on future research directions, such as providing solutions to perform voice authentication without an external device, and the compliance of VAs with privacy regulations.


1995 ◽  
Vol 38 (5) ◽  
pp. 1126-1142 ◽  
Author(s):  
Jeffrey W. Gilger

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).


Sign in / Sign up

Export Citation Format

Share Document