Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text

Textual entailment graphs

Natural Language Engineering ◽

10.1017/s1351324915000108 ◽

2015 ◽

Vol 21 (5) ◽

pp. 699-724 ◽

Cited By ~ 6

Author(s):

LILI KOTLERMAN ◽

IDO DAGAN ◽

BERNARDO MAGNINI ◽

LUISA BENTIVOGLI

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Gold Standard ◽

State Of The Art ◽

Text Analytics ◽

Joint Work ◽

Gold Standard Dataset ◽

Textual Entailment ◽

Interesting Task

AbstractIn this work, we present a novel type of graphs for natural language processing (NLP), namely textual entailment graphs (TEGs). We describe the complete methodology we developed for the construction of such graphs and provide some baselines for this task by evaluating relevant state-of-the-art technology. We situate our research in the context of text exploration, since it was motivated by joint work with industrial partners in the text analytics area. Accordingly, we present our motivating scenario and the first gold-standard dataset of TEGs. However, while our own motivation and the dataset focus on the text exploration setting, we suggest that TEGs can have different usages and suggest that automatic creation of such graphs is an interesting task for the community.

Download Full-text

Using Natural Language Processing and Artificial Intelligence to Explore the Nutrition and Sustainability of Recipes and Food

Frontiers in Artificial Intelligence ◽

10.3389/frai.2020.621577 ◽

2021 ◽

Vol 3 ◽

Author(s):

Marieke van Erp ◽

Christian Reynolds ◽

Diana Maynard ◽

Alain Starke ◽

Rebeca Ibáñez Martín ◽

...

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

State Of The Art ◽

Interdisciplinary Approach ◽

Comprehensive Analysis ◽

The State ◽

Use Cases

In this paper, we discuss the use of natural language processing and artificial intelligence to analyze nutritional and sustainability aspects of recipes and food. We present the state-of-the-art and some use cases, followed by a discussion of challenges. Our perspective on addressing these is that while they typically have a technical nature, they nevertheless require an interdisciplinary approach combining natural language processing and artificial intelligence with expert domain knowledge to create practical tools and comprehensive analysis for the food domain.

Download Full-text

Natural Language Processing

Annual Review of Applied Linguistics ◽

10.1017/s0267190500001446 ◽

1996 ◽

Vol 16 ◽

pp. 70-85 ◽

Cited By ~ 5

Author(s):

Thomas C. Rindflesch

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Domain Knowledge ◽

The State ◽

Point Of View ◽

Computer Applications ◽

Significant Progress ◽

Future Directions

Work in computational linguistics began very soon after the development of the first computers (Booth, Brandwood and Cleave 1958), yet in the intervening four decades there has been a pervasive feeling that progress in computer understanding of natural language has not been commensurate with progress in other computer applications. Recently, a number of prominent researchers in natural language processing met to assess the state of the discipline and discuss future directions (Bates and Weischedel 1993). The consensus of this meeting was that increased attention to large amounts of lexical and domain knowledge was essential for significant progress, and current research efforts in the field reflect this point of view.

Download Full-text

The state of the art in text mining and natural language processing for pharmacogenomics

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2012.08.001 ◽

2012 ◽

Vol 45 (5) ◽

pp. 825-826 ◽

Cited By ~ 6

Author(s):

Adrien Coulet ◽

K. Bretonnel Cohen ◽

Russ B. Altman

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

The State

Download Full-text

Biomedical Concept Recognition Using Deep Neural Sequence Models

10.1101/530337 ◽

2019 ◽

Cited By ~ 2

Author(s):

Negacy D. Hailu ◽

Michael Bada ◽

Asmelash Teka Hadgu ◽

Lawrence E. Hunter

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Conditional Random Field ◽

Concept Recognition ◽

Performance Improvements ◽

Art Performance

AbstractBackgroundthe automated identification of mentions of ontological concepts in natural language texts is a central task in biomedical information extraction. Despite more than a decade of effort, performance in this task remains below the level necessary for many applications.Resultsrecently, applications of deep learning in natural language processing have demonstrated striking improvements over previously state-of-the-art performance in many related natural language processing tasks. Here we demonstrate similarly striking performance improvements in recognizing biomedical ontology concepts in full text journal articles using deep learning techniques originally developed for machine translation. For example, our best performing system improves the performance of the previous state-of-the-art in recognizing terms in the Gene Ontology Biological Process hierarchy, from a previous best F1 score of 0.40 to an F1 of 0.70, nearly halving the error rate. Nearly all other ontologies show similar performance improvements.ConclusionsA two-stage concept recognition system, which is a conditional random field model for span detection followed by a deep neural sequence model for normalization, improves the state-of-the-art performance for biomedical concept recognition. Treating the biomedical concept normalization task as a sequence-to-sequence mapping task similar to neural machine translation improves performance.

Download Full-text

Text: An R-package for Analyzing and Visualizing Human Language Using Natural Language Processing and Deep Learning

10.31234/osf.io/293kt ◽

2021 ◽

Author(s):

Oscar Nils Erik Kjell ◽

H. Andrew Schwartz ◽

Salvatore Giorgi

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rating Scale ◽

State Of The Art ◽

R Package ◽

Language Models ◽

Categorical Variables ◽

Human Language

The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language such as machine translation. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (www.r-text.org), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. Text is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large datasets. This tutorial describes useful methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel techniques and analysis pipelines. The reader learns about six methods: 1) textEmbed: to transform text to traditional or modern transformer-based word embeddings (i.e., numeric representations of words); 2) textTrain: to examine the relationships between text and numeric/categorical variables; 3) textSimilarity and 4) textSimilarityTest: to computing semantic similarity scores between texts and significance test the difference in meaning between two sets of texts; and 5) textProjection and 6) textProjectionPlot: to examine and visualize text within the embedding space according to latent or specified construct dimensions (e.g., low to high rating scale scores).

Download Full-text

The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview (Preprint)

10.2196/preprints.23375 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yanshan Wang ◽

Sunyang Fu ◽

Feichen Shen ◽

Sam Henry ◽

Ozlem Uzuner ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Shared Task ◽

Data Set ◽

Clinical Text ◽

Clinical Notes ◽

Clinical Domain ◽

Semantic Textual Similarity

BACKGROUND Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval. OBJECTIVE Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain. METHODS We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium. RESULTS Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs. CONCLUSIONS The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.

Download Full-text

NATURAL LANGUAGE PROCESSING AND BIM IN AECO SECTOR: A STATE OF THE ART

Proceedings of International Structural Engineering and Construction ◽

10.14455/isec.2020.7(2).con-22 ◽

2020 ◽

Vol 7 (2) ◽

Author(s):

Giuseppe Martino Di Giuda ◽

Mirko Locatelli ◽

Elena Seghezzi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Structured Data ◽

Information Modeling ◽

Construction Process ◽

Management Techniques ◽

Design Requirements ◽

Further Development

The research provides state-of-the-art theories, methods, and applications of Natural Language Processing in a BIM approach to define advantages, weaknesses, and potential developments. Traditionally, the design and construction process requirements are expressed through verbal qualitative evaluations instead of using numerical and structured data. Information modeling and management methods can hardly manage requirements expressed by means of qualitative and unstructured expressions. This paper aims to define the state of the art on Natural Language Processing applications for the numerical translation of basic design requirements in the AECO sector. The major advantages of this research would be the optimization and automation of numerical definition and validation of requirements in a structured data form. Structured data can, in fact, be easily managed in a BIM approach. NLP supporting information modeling methods allows the application of requirements engineering and management techniques to handle construction processes from a data-driven perspective. An investigation of the setting of a decision support system, based on NLP, for the definition and validation of requirements in the early stages of the construction process, is also provided as a potential further development.

Download Full-text

Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings

10.31219/osf.io/j76y3 ◽

2018 ◽

Cited By ~ 1

Author(s):

Debanjan Mahata ◽

John Kuriakose ◽

Rajiv Ratn Shah ◽

Roger Zimmermann

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Keyphrase Extraction ◽

Text Documents ◽

Benchmark Datasets

Keyphrase extraction is a fundamental task in natural language processing that facilitates mapping of documents to a set of representative phrases. In this paper, we present an unsupervised technique (Key2Vec) that leverages phrase embeddings for ranking keyphrases extracted from scientific articles. Specifically, we propose an effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank. Evaluations are performed on benchmark datasets producing state-of-the-art results.

Download Full-text