scholarly journals Computational Analysis of Storylines

2021 ◽  

Event structures are central in Linguistics and Artificial Intelligence research: people can easily refer to changes in the world, identify their participants, distinguish relevant information, and have expectations of what can happen next. Part of this process is based on mechanisms similar to narratives, which are at the heart of information sharing. But it remains difficult to automatically detect events or automatically construct stories from such event representations. This book explores how to handle today's massive news streams and provides multidimensional, multimodal, and distributed approaches, like automated deep learning, to capture events and narrative structures involved in a 'story'. This overview of the current state-of-the-art on event extraction, temporal and casual relations, and storyline extraction aims to establish a new multidisciplinary research community with a common terminology and research agenda. Graduate students and researchers in natural language processing, computational linguistics, and media studies will benefit from this book.

2015 ◽  
Vol 2015 ◽  
pp. 1-19 ◽  
Author(s):  
Jorge A. Vanegas ◽  
Sérgio Matos ◽  
Fabio González ◽  
José L. Oliveira

This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed.


Author(s):  
Wang Chen ◽  
Yifan Gao ◽  
Jiani Zhang ◽  
Irwin King ◽  
Michael R. Lyu

Keyphrase generation (KG) aims to generate a set of keyphrases given a document, which is a fundamental task in natural language processing (NLP). Most previous methods solve this problem in an extractive manner, while recently, several attempts are made under the generative setting using deep neural networks. However, the state-of-the-art generative methods simply treat the document title and the document main body equally, ignoring the leading role of the title to the overall document. To solve this problem, we introduce a new model called Title-Guided Network (TG-Net) for automatic keyphrase generation task based on the encoderdecoder architecture with two new features: (i) the title is additionally employed as a query-like input, and (ii) a titleguided encoder gathers the relevant information from the title to each word in the document. Experiments on a range of KG datasets demonstrate that our model outperforms the state-of-the-art models with a large margin, especially for documents with either very low or very high title length ratios.


Author(s):  
Alex Dexter ◽  
Spencer A. Thomas ◽  
Rory T. Steven ◽  
Kenneth N. Robinson ◽  
Adam J. Taylor ◽  
...  

AbstractHigh dimensionality omics and hyperspectral imaging datasets present difficult challenges for feature extraction and data mining due to huge numbers of features that cannot be simultaneously examined. The sample numbers and variables of these methods are constantly growing as new technologies are developed, and computational analysis needs to evolve to keep up with growing demand. Current state of the art algorithms can handle some routine datasets but struggle when datasets grow above a certain size. We present a training deep learning via neural networks on non-linear dimensionality reduction, in particular t-distributed stochastic neighbour embedding (t-SNE), to overcome prior limitations of these methods.One Sentence SummaryAnalysis of prohibitively large datasets by combining deep learning via neural networks with non-linear dimensionality reduction.


2010 ◽  
Vol 3 ◽  
Author(s):  
Emily M. Bender ◽  
D. Terence Langendoen

In this paper, we overview the ways in which computational methods can serve the goals of analysis and theory development in linguistics, and encourage the reader to become involved in the emerging cyberinfrastructure for linguistics. We survey examples from diverse subfields of how computational methods are already being used, describe the current state of the art in cyberinfrastructure for linguistics, sketch a pie-in-the-sky view of where the field could go, and outline steps that linguists can take now to bring about better access to and use of linguistic data through cyberinfrastructure.


Author(s):  
Karine Megerdoomian

This chapter introduces the fields of Computational Linguistics (CL)—the computational modelling of linguistic representations and theories—and Natural Language Processing (NLP)—the design and implementation of tools for automated language understanding and production—and discusses some of the existing tensions between the formal approach to linguistics and the current state of the research and development in CL and NLP. The paper goes on to explain the specific challenges faced by CL and NLP for Persian, much of it derived from the intricacies presented by the Perso-Arabic script in automatically identifying word and phrase boundaries in text, as well as difficulties in automatic processing of compound words and light verb constructions. The chapter then provides an overview of the state of the art in current and recent CL and NLP for Persian. It concludes with areas for improvement and suggestions for future directions.


2013 ◽  
Vol 10 (2) ◽  
pp. 82-93 ◽  
Author(s):  
Cassidy Kelly ◽  
Hui Yang

Summary The extraction of study design parameters from biomedical journal articles is an important problem in natural language processing (NLP). Such parameters define the characteristics of a study, such as the duration, the number of subjects, and their profile. Here we present a system for extracting study design parameters from sentences in article abstracts. This system will be used as a component of a larger system for creating nutrigenomics networks from articles in the nutritional genomics domain. The algorithms presented consist of manually designed rules expressed either as regular expressions or in terms of sentence parse structure. A number of filters and NLP tools are also utilized within a pipelined algorithmic framework. Using this novel approach, our system performs extraction at a finer level of granularity than comparable systems, while generating results that surpass the current state of the art.


1995 ◽  
Vol 1 (1) ◽  
pp. 29-81 ◽  
Author(s):  
I. Androutsopoulos ◽  
G.D. Ritchie ◽  
P. Thanisch

AbstractThis paper is an introduction to natural language interfaces to databases (NLIDBS). A brief overview of the history of NLIDBS is first given. Some advantages and disadvantages of NLIDBS are then discussed, comparing NLIDBS to formal query languages, form-based interfaces, and graphical interfaces. An introduction to some of the linguistic problems NLIDBS have to confront follows, for the benefit of readers less familiar with computational linguistics. The discussion then moves on to NLIDB architectures, portability issues, restricted natural language input systems (including menu-based NLIDBS), and NLIDBS with reasoning capabilities. Some less explored areas of NLIDB research are then presented, namely database updates, meta-knowledge questions, temporal questions, and multi-modal NLIDBS. The paper ends with reflections on the current state of the art.


Author(s):  
Fazel Keshtkar ◽  
Ledong Shi ◽  
Syed Ahmad Chan Bukhari

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.


1990 ◽  
Vol 5 (4) ◽  
pp. 225-249 ◽  
Author(s):  
Ann Copestake ◽  
Karen Sparck Jones

AbstractThis paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.


1999 ◽  
Vol 5 (1) ◽  
pp. 17-44 ◽  
Author(s):  
BRANIMIR BOGURAEV ◽  
CHRISTOPHER KENNEDY

The identification and extraction of technical terms is one of the better understood and most robust Natural Language Processing (NLP) technologies within the current state of the art of language engineering. In generic information management contexts, terms have been used primarily for procedures seeking to identify a set of phrases that is useful for tasks such as text indexing, computational lexicology, and machine-assisted translation: such tasks make important use of the assumption that terminology is representative of a given domain. This paper discusses an extension of basic terminology identification technology for the application to two higher level semantic tasks: domain description, the specification of the technical domain of a document, and content characterisation, the construction of a compact, coherent and useful representation of the topical content of a text. With these extensions, terminology identification becomes the foundation of an operational environment for document processing and content abstraction.


Sign in / Sign up

Export Citation Format

Share Document