A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets

Clinical performance audits are routinely performed in Emergency Medical Services (EMS) to ensure adherence to treatment protocols, to identify individual areas of weakness for remediation, and to discover systemic deficiencies to guide the development of the training syllabus. At present, these audits are performed by manual chart review, which is time-consuming and laborious. In this paper, we report a weakly-supervised machine learning approach to train a named entity recognition model that can be used for automatic EMS clinical audits. The dataset used in this study contained 58,898 unlabeled ambulance incidents encountered by the Singapore Civil Defence Force from 1st April 2019 to 30th June 2019. With only 5% labeled data, we successfully trained three different models to perform the NER task, achieving F1 scores of around 0.981 under entity type matching evaluation and around 0.976 under strict evaluation. The BiLSTM-CRF model was 1~2 orders of magnitude lighter and faster than our BERT-based models. Our proposed proof-of-concept approach may improve the efficiency of clinical audits and can also help with EMS database research. Further external validation of this approach is needed.

Download Full-text

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Data ◽

10.3390/data6070071 ◽

2021 ◽

Vol 6 (7) ◽

pp. 71

Author(s):

Gonçalo Carnaz ◽

Mário Antunes ◽

Vitor Beires Nogueira

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Identification ◽

Named Entities ◽

Related Data ◽

Named Entity ◽

Chain Of Custody ◽

Evidence Collection

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

A Comparative Study of Dictionary-based and Machine Learning-based Named Entity Recognition in Pashto

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval ◽

10.1145/3443279.3443307 ◽

2020 ◽

Author(s):

Rafiullah Momand ◽

Shakirullah Waseeb ◽

Ahmad Masood Latif Rai

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

Recurrent Neural Network-Based Model for Named Entity Recognition with Improved Word Embeddings

IETE Journal of Research ◽

10.1080/03772063.2021.2006805 ◽

2021 ◽

pp. 1-7

Author(s):

Archana Goyal ◽

Vishal Gupta ◽

Manish Kumar

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity

Download Full-text

A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets

A CRF Based Machine Learning Approach for Biomedical Named Entity Recognition

A comparative study of biomedical named entity recognition methods based machine learning approach

A scalable machine-learning approach for semi-structured named entity recognition

Named Entity Recognition in Crime Using Machine Learning Approach

Named Entity Recognition Using Hybrid Machine Learning Approach

A Weakly-Supervised Named Entity Recognition Machine Learning Approach for Emergency Medical Services Clinical Audit

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

A Comparative Study of Dictionary-based and Machine Learning-based Named Entity Recognition in Pashto

Recurrent Neural Network-Based Model for Named Entity Recognition with Improved Word Embeddings

Export Citation Format