End-to-end drug entity recognition and adverse effect relation extraction via principal neighbourhood aggregation network

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text

An Attention-Based Model Using Character Composition of Entities in Chinese Relation Extraction

Information ◽

10.3390/info11020079 ◽

2020 ◽

Vol 11 (2) ◽

pp. 79 ◽

Cited By ~ 2

Author(s):

Xiaoyu Han ◽

Yue Zhang ◽

Wenkai Zhang ◽

Tinglei Huang

Keyword(s):

Language Processing ◽

Large Scale ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Additional Information ◽

Named Entity ◽

Proposed Model ◽

The Relationship ◽

Crucial Part

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.

Download Full-text

End-to-End Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-Level Vs. Character-Level

Communications in Computer and Information Science - Computational Linguistics ◽

10.1007/978-981-10-8438-6_18 ◽

2018 ◽

pp. 219-232 ◽

Cited By ~ 5

Author(s):

Thai-Hoang Pham ◽

Phuong Le-Hong

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Named Entity Recognition ◽

Network Models ◽

Entity Recognition ◽

Neural Network Models ◽

Named Entity ◽

Word Level ◽

End To End

Download Full-text

Information Extraction of Cybersecurity Concepts: An LSTM Approach

Applied Sciences ◽

10.3390/app9193945 ◽

2019 ◽

Vol 9 (19) ◽

pp. 3945 ◽

Cited By ~ 4

Author(s):

Houssem Gasmi ◽

Jannik Laval ◽

Abdelaziz Bouras

Keyword(s):

Neural Network ◽

Domain Knowledge ◽

Conditional Random Fields ◽

Short Term Memory ◽

Network Models ◽

Relation Extraction ◽

Entity Recognition ◽

Feature Engineering ◽

Neural Network Models ◽

Feature Based

Extracting cybersecurity entities and the relationships between them from online textual resources such as articles, bulletins, and blogs and converting these resources into more structured and formal representations has important applications in cybersecurity research and is valuable for professional practitioners. Previous works to accomplish this task were mainly based on utilizing feature-based models. Feature-based models are time-consuming and need labor-intensive feature engineering to describe the properties of entities, domain knowledge, entity context, and linguistic characteristics. Therefore, to alleviate the need for feature engineering, we propose the usage of neural network models, specifically the long short-term memory (LSTM) models to accomplish the tasks of Named Entity Recognition (NER) and Relation Extraction (RE). We evaluated the proposed models on two tasks. The first task is performing NER and evaluating the results against the state-of-the-art Conditional Random Fields (CRFs) method. The second task is performing RE using three LSTM models and comparing their results to assess which model is more suitable for the domain of cybersecurity. The proposed models achieved competitive performance with less feature-engineering work. We demonstrate that exploiting neural network models in cybersecurity text mining is effective and practical.

Download Full-text

DeepVar: An End-to-End Deep Learning Approach for Genomic Variant Recognition in Biomedical Literature

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5399 ◽

2020 ◽

Vol 34 (01) ◽

pp. 598-605

Author(s):

Chaoran Cheng ◽

Fei Tan ◽

Zhi Wei

Keyword(s):

Deep Learning ◽

Large Data ◽

Biomedical Literature ◽

Entity Recognition ◽

Learning Approach ◽

Learning Approaches ◽

Genomic Variants ◽

Low Resource ◽

End To End ◽

Genomic Variant

We consider the problem of Named Entity Recognition (NER) on biomedical scientific literature, and more specifically the genomic variants recognition in this work. Significant success has been achieved for NER on canonical tasks in recent years where large data sets are generally available. However, it remains a challenging problem on many domain-specific areas, especially the domains where only small gold annotations can be obtained. In addition, genomic variant entities exhibit diverse linguistic heterogeneity, differing much from those that have been characterized in existing canonical NER tasks. The state-of-the-art machine learning approaches heavily rely on arduous feature engineering to characterize those unique patterns. In this work, we present the first successful end-to-end deep learning approach to bridge the gap between generic NER algorithms and low-resource applications through genomic variants recognition. Our proposed model can result in promising performance without any hand-crafted features or post-processing rules. Our extensive experiments and results may shed light on other similar low-resource NER applications.

Download Full-text

Boundaries and edges rethinking: An end-to-end neural model for overlapping entity relation extraction

Information Processing & Management ◽

10.1016/j.ipm.2020.102311 ◽

2020 ◽

Vol 57 (6) ◽

pp. 102311 ◽

Cited By ~ 4

Author(s):

Hao Fei ◽

Yafeng Ren ◽

Donghong Ji

Keyword(s):

Relation Extraction ◽

Neural Model ◽

Entity Relation Extraction ◽

End To End

Download Full-text

An Unsupervised Approach for Cause-Effect Relation Extraction from Biomedical Text

Natural Language Processing and Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-91947-8_43 ◽

2018 ◽

pp. 419-427

Author(s):

Raksha Sharma ◽

Girish Palshikar ◽

Sachin Pawar

Keyword(s):

Relation Extraction ◽

Biomedical Text ◽

Unsupervised Approach ◽

Effect Relation

Download Full-text

RCorp: a resource for chemical disease semantic extraction in Chinese

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0936-3 ◽

2019 ◽

Vol 19 (S5) ◽

Cited By ~ 1

Author(s):

Yueping Sun ◽

Li Hou ◽

Lu Qin ◽

Yan Liu ◽

Jiao Li ◽

...

Keyword(s):

Chronic Disease ◽

Combination Therapy ◽

Medical Literature ◽

Relation Extraction ◽

Entity Recognition ◽

Agreement Score ◽

Disease Entities ◽

Result Analysis ◽

Synergistic Combinations ◽

Disease Specific

Abstract Background To robustly identify synergistic combinations of drugs, high-throughput screenings are desirable. It will be of great help to automatically identify the relations in the published papers with machine learning based tools. To support the chemical disease semantic relation extraction especially for chronic diseases, a chronic disease specific corpus for combination therapy discovery in Chinese (RCorp) is manually annotated. Methods In this study, we extracted abstracts from a Chinese medical literature server and followed the annotation framework of the BioCreative CDR corpus, with the guidelines modified to make the combination therapy related relations available. An annotation tool was incorporated to the standard annotation process. Results The resulting RCorp consists of 339 Chinese biomedical articles with 2367 annotated chemicals, 2113 diseases, 237 symptoms, 164 chemical-induce-disease relations, 163 chemical-induce-symptom relations, and 805 chemical-treat-disease relations. Each annotation includes both the mention text spans and normalized concept identifiers. The corpus gets an inter-annotator agreement score of 0.883 for chemical entities, 0.791 for disease entities which are measured by F score. And the F score for chemical-treat-disease relations gets 0.788 after unifying the entity mentions. Conclusions We extracted and manually annotated a chronic disease specific corpus for combination therapy discovery in Chinese. The result analysis of the corpus proves its quality for the combination therapy related knowledge discovery task. Our annotated corpus would be a useful resource for the modelling of entity recognition and relation extraction tools. In the future, an evaluation based on the corpus will be held.

Download Full-text

Dual CNN for Relation Extraction with Knowledge-Based Attention and Word Embeddings

Computational Intelligence and Neuroscience ◽

10.1155/2019/6789520 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Jun Li ◽

Guimin Huang ◽

Jianheng Chen ◽

Yabing Wang

Keyword(s):

Knowledge Base ◽

Relation Extraction ◽

Background Knowledge ◽

Word Embedding ◽

Entity Recognition ◽

Instance Selection ◽

Attention Model ◽

Training Tool ◽

Knowledge Based ◽

Proposed Model

Relation extraction is the underlying critical task of textual understanding. However, the existing methods currently have defects in instance selection and lack background knowledge for entity recognition. In this paper, we propose a knowledge-based attention model, which can make full use of supervised information from a knowledge base, to select an entity. We also design a method of dual convolutional neural networks (CNNs) considering the word embedding of each word is restricted by using a single training tool. The proposed model combines a CNN with an attention mechanism. The model inserts the word embedding and supervised information from the knowledge base into the CNN, performs convolution and pooling, and combines the knowledge base and CNN in the full connection layer. Based on these processes, the model not only obtains better entity representations but also improves the performance of relation extraction with the help of rich background knowledge. The experimental results demonstrate that the proposed model achieves competitive performance.

Download Full-text