Named Entity Recognition of Traditional Chinese Medicine Patents Based on BiLSTM-CRF

With the growing popularity of traditional Chinese medicine (TCM) in the world and the increasing awareness of intellectual property protection, the number of TCM patent application is growing year by year. TCM patents contain rich medical, legal, and economic information. Effective text mining of TCM patents is of great theoretical and practical significance (e.g., the R&D of new medicines, patent infringement litigation, and patent acquisition). Named entity recognition (NER) is a fundamental task in natural language processing and a crucial step before indepth analysis of TCM patent. In this paper, a method combining Bidirectional Long Short-Term Memory neural network with Conditional Random Field (BiLSTM-CRF) is proposed to automatically recognize entities of interest (i.e., herb names, disease names, symptoms, and therapeutic effects) from the abstract texts of TCM patents. By virtue of the capabilities of deep learning methods, the semantic information in the context can be learned without feature engineering. Experiments show that the BiLSTM-CRF-based method provides superior performance in comparison with various baseline methods.

Download Full-text

A Nested Named Entity Recognition Method for Traditional Chinese Medicine Records

Advances in Artificial Intelligence and Security - Communications in Computer and Information Science ◽

10.1007/978-3-030-78615-1_43 ◽

2021 ◽

pp. 488-497

Author(s):

Haifeng Xu ◽

Honglan Liu ◽

Qi Jia ◽

Yuxiao Zhan ◽

Yan Zhang ◽

...

Keyword(s):

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Named Entity Recognition ◽

Entity Recognition ◽

Recognition Method ◽

Named Entity

Download Full-text

Named Entity Recognition in Traditional Chinese Medicine Clinical Cases Combining BiLSTM-CRF with Knowledge Graph

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-030-29551-6_48 ◽

2019 ◽

pp. 537-548 ◽

Cited By ~ 2

Author(s):

Zhe Jin ◽

Yin Zhang ◽

Haodan Kuang ◽

Liang Yao ◽

Wenjin Zhang ◽

...

Keyword(s):

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Named Entity Recognition ◽

Entity Recognition ◽

Knowledge Graph ◽

Named Entity ◽

Clinical Cases

Download Full-text

Research on Named Entity Recognition of Traditional Chinese Medicine Electronic Medical Records

Health Information Science - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61951-0_6 ◽

2020 ◽

pp. 61-67

Author(s):

Feng Lin ◽

Dan Xie

Keyword(s):

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Electronic Medical Records ◽

Medical Records ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITION

Journal of Information and Communication Technology ◽

10.32890/jict.14.2015.8153 ◽

2015 ◽

Author(s):

Shohei Higashiyama ◽

Blondel Mathieu ◽

Kazuhiro Seki ◽

Kuniaki Uehara

Keyword(s):

Cost Function ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Superior Performance ◽

Support Vector ◽

Named Entity ◽

Vector Machines ◽

Representative Work ◽

Correct Category

Named Entity Recognition (NER) is a fundamental natural language processing task for the identifi cation and classifi cation of expressions into predefi ned categories, such as person and organization. Existing NER systems usually target about 10 categories and do not incorporate analysis of category relations. However, categories often belong naturally to some predefi ned hierarchy. In such cases, the distance between categories in the hierarchy becomes a rich source of information that can be exploited. This is intuitively useful particularly when the categories are numerous. On that account, this paper proposes an NER approach that can leverage category hierarchy information by introducing, in the structured perceptron framework, a cost function more strongly penalizing category predictions that are more distant from the correct category in the hierarchy. Experimental results on the GENIA biomedical text corpus indicate the effectiveness of the proposed approach as compared with the case where no cost function is utilized. In addition, the proposed approach demonstrates the superior performance over a representative work using multi-class support vector machines on the same corpus. A possible direction to further improve the proposed approach is to investigate more elaborate cost functions than a simple additive cost adopted in this work.

Download Full-text

Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books

Natural Language Processing and Chinese Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60457-8_39 ◽

2020 ◽

pp. 481-489

Author(s):

Bingyan Song ◽

Zhenshan Bao ◽

YueZhang Wang ◽

Wenbo Zhang ◽

Chao Sun

Keyword(s):

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

Improving Distantly-Supervised Named Entity Recognition for Traditional Chinese Medicine Text via a Novel Back-Labeling Approach

IEEE Access ◽

10.1109/access.2020.3015056 ◽

2020 ◽

Vol 8 ◽

pp. 145413-145421

Author(s):

Dezheng Zhang ◽

Chao Xia ◽

Cong Xu ◽

Qi Jia ◽

Shibing Yang ◽

...

Keyword(s):

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Labeling Approach

Download Full-text

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Data ◽

10.3390/data6070071 ◽

2021 ◽

Vol 6 (7) ◽

pp. 71

Author(s):

Gonçalo Carnaz ◽

Mário Antunes ◽

Vitor Beires Nogueira

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Identification ◽

Named Entities ◽

Related Data ◽

Named Entity ◽

Chain Of Custody ◽

Evidence Collection

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.

Download Full-text

Towards corpus and model: Hierarchical structured-attention-based features for Indonesian named entity recognition

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202286 ◽

2021 ◽

pp. 1-12

Author(s):

Yingwen Fu ◽

Nankai Lin ◽

Xiaotian Lin ◽

Shengyi Jiang

Keyword(s):

Language Processing ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Neural Models ◽

Performance Models ◽

Named Entity ◽

High Resource ◽

Benchmark Datasets

Named entity recognition (NER) is fundamental to natural language processing (NLP). Most state-of-the-art researches on NER are based on pre-trained language models (PLMs) or classic neural models. However, these researches are mainly oriented to high-resource languages such as English. While for Indonesian, related resources (both in dataset and technology) are not yet well-developed. Besides, affix is an important word composition for Indonesian language, indicating the essentiality of character and token features for token-wise Indonesian NLP tasks. However, features extracted by currently top-performance models are insufficient. Aiming at Indonesian NER task, in this paper, we build an Indonesian NER dataset (IDNER) comprising over 50 thousand sentences (over 670 thousand tokens) to alleviate the shortage of labeled resources in Indonesian. Furthermore, we construct a hierarchical structured-attention-based model (HSA) for Indonesian NER to extract sequence features from different perspectives. Specifically, we use an enhanced convolutional structure as well as an enhanced attention structure to extract deeper features from characters and tokens. Experimental results show that HSA establishes competitive performance on IDNER and three benchmark datasets.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

An Attention-Based Model Using Character Composition of Entities in Chinese Relation Extraction

Information ◽

10.3390/info11020079 ◽

2020 ◽

Vol 11 (2) ◽

pp. 79 ◽

Cited By ~ 2

Author(s):

Xiaoyu Han ◽

Yue Zhang ◽

Wenkai Zhang ◽

Tinglei Huang

Keyword(s):

Language Processing ◽

Large Scale ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Additional Information ◽

Named Entity ◽

Proposed Model ◽

The Relationship ◽

Crucial Part

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.

Download Full-text