Machine learning based biomedical named entity recognition

Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, we systematically investigated three different types of word representation (WR) features for BNER, including clustering-based representation, distributional representation, and word embeddings. We selected one algorithm from each of the three types of WR features and applied them to the JNLPBA and BioCreAtIvE II BNER tasks. Our results showed that all the three WR algorithms were beneficial to machine learning-based BNER systems. Moreover, combining these different types of WR features further improved BNER performance, indicating that they are complementary to each other. By combining all the three types of WR features, the improvements inF-measure on the BioCreAtIvE II GM and JNLPBA corpora were 3.75% and 1.39%, respectively, when compared with the systems using baseline features. To the best of our knowledge, this is the first study to systematically evaluate the effect of three different types of WR features for BNER tasks.

Download Full-text

Biomedical Named Entity Recognition: A Survey of Machine-Learning Tools

Theory and Applications for Advanced Text Mining ◽

10.5772/51066 ◽

2012 ◽

Cited By ~ 18

Author(s):

David Campos ◽

Sergio Matos ◽

Jose Luis

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Tools ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

A CRF Based Machine Learning Approach for Biomedical Named Entity Recognition

2017 Second International Conference on Recent Trends and Challenges in Computational Models (ICRTCCM) ◽

10.1109/icrtccm.2017.23 ◽

2017 ◽

Cited By ~ 1

Author(s):

U. Kanimozhi ◽

D. Manjula

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Approach ◽

Named Entity ◽

Machine Learning Approach ◽

Biomedical Named Entity Recognition

Download Full-text

A comparative study of biomedical named entity recognition methods based machine learning approach

2014 Third IEEE International Colloquium in Information Science and Technology (CIST) ◽

10.1109/cist.2014.7016641 ◽

2014 ◽

Cited By ~ 3

Author(s):

Mohammed Rais ◽

Abdelmonaime Lachkar ◽

Abdelhamid Lachkar ◽

Said El Alaoui Ouatik

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Approach ◽

Named Entity ◽

Machine Learning Approach ◽

Biomedical Named Entity Recognition

Download Full-text

CBLNER: A Multi-models Biomedical Named Entity Recognition System Based on Machine Learning

Intelligent Computing Theories and Application - Lecture Notes in Computer Science ◽

10.1007/978-3-030-26969-2_5 ◽

2019 ◽

pp. 51-60

Author(s):

Gong Lejun ◽

Liu Xiaolin ◽

Yang Xuemin ◽

Zhang Lipeng ◽

Jia Yao ◽

...

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

Biomedical Named Entity Recognition Using the SVM Methodologies and bio Tagging Schemes

Revista de Chimie ◽

10.37358/rc.21.4.8456 ◽

2020 ◽

Vol 72 (4) ◽

pp. 52-64

Author(s):

Thiyagu Meenachisundaram ◽

Manjula Dhanabalachandran

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Local Context ◽

Named Entity ◽

Conventional Machine ◽

Context Features ◽

Linear Text ◽

Biomedical Named Entity Recognition

Biomedical Named Entity Recognition (BNER) is identification of entities such as drugs, genes, and chemicals from biomedical text, which help in information extraction from the domain literature. It would allow extracting information such as drug profiles, similar or related drugs and associations between drugs and their targets. This venue presents opportunities for improvement even though many machine learning methods have been applied. The efficiency can be improved in case of biological related chemical entities as there are varied structure and properties. This new approach combines two state-of-the-art algorithms and aims to improve the performance by applying it to varied sets of features including linguistic, orthographic, Morphological, domain features and local context features. It uses the sequence tagging capability of CRF to identify the boundary of the entity and classification efficiency of SVM to detect subtypes in BNER. The method is tested on two different datasets 1) GENIA and 2) CHEMDNER corpus with different types of entities. The result shows that proposed hybrid method enhances the BNER compared to the conventional machine learning algorithms. Moreover the detailed study of SVM and the methodologies has been discussed clearly. The linear and non linear text classification can be mapped clearly in the section 3. The final section describes the results and the evaluation of the proposed method.

Download Full-text

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Data ◽

10.3390/data6070071 ◽

2021 ◽

Vol 6 (7) ◽

pp. 71

Author(s):

Gonçalo Carnaz ◽

Mário Antunes ◽

Vitor Beires Nogueira

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Identification ◽

Named Entities ◽

Related Data ◽

Named Entity ◽

Chain Of Custody ◽

Evidence Collection

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text