Проект исторического предметного рубрикатора-тезауруса на базе конструктора комбинаторных поисковых запросов Inspert в контексте перспектив корпусной историографии

Language Processing ◽

Corpus Linguistics ◽

Entity Recognition ◽

Historical Science ◽

Historical Texts ◽

Named Entity ◽

The Subject ◽

Analytical Services

The paper presents the topicality and problems of the development of corpus historiography in the context of the principles and methods of corpus linguistics. A project of a subject heading thesaurus for marking up texts, taking into account the specifics of the subject of historical science is presented on the example of various industry thesauri and search and analytical services. The structure of the rubricator and the functions of the corpus of historical texts are correlated with the methods of automated natural language processing: Named-entity recognition, Frame-Based NLP System

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language ◽

Language Processing ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Probing Patient Messages Enhanced by Natural Language Processing: A Top-Down Message Corpus Analysis

Health Data Science ◽

10.34133/2021/1504854 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

George Mastorakos ◽

Aditya Khurana ◽

Ming Huang ◽

Sunyang Fu ◽

Ahmad P. Tafti ◽

...

Keyword(s):

Natural Language ◽

Language Processing ◽

Corpus Analysis ◽

Entity Recognition ◽

Message Content ◽

Named Entity ◽

Medical Concepts ◽

Insight Into

Background. Patients increasingly use asynchronous communication platforms to converse with care teams. Natural language processing (NLP) to classify content and automate triage of these messages has great potential to enhance clinical efficiency. We characterize the contents of a corpus of portal messages generated by patients using NLP methods. We aim to demonstrate descriptive analyses of patient text that can contribute to the development of future sophisticated NLP applications. Methods. We collected approximately 3,000 portal messages from the cardiology, dermatology, and gastroenterology departments at Mayo Clinic. After labeling these messages as either Active Symptom, Logistical, Prescription, or Update, we used NER (named entity recognition) to identify medical concepts based on the UMLS library. We hierarchically analyzed the distribution of these messages in terms of departments, message types, medical concepts, and keywords therewithin. Results. Active Symptom and Logistical content types comprised approximately 67% of the message cohort. The “Findings” medical concept had the largest number of keywords across all groupings of content types and departments. “Anatomical Sites” and “Disorders” keywords were more prevalent in Active Symptom messages, while “Drugs” keywords were most prevalent in Prescription messages. Logistical messages tended to have the lower proportions of “Anatomical Sites,”, “Disorders,”, “Drugs,”, and “Findings” keywords when compared to other message content types. Conclusions. This descriptive corpus analysis sheds light on the content and foci of portal messages. The insight into the content and differences among message themes can inform the development of more robust NLP models.

Advances in Computer and Electrical Engineering - Handbook of Research on Engineering Innovations and Technology Management in Organizations ◽

Advances in Computational Linguistics and Text Processing Frameworks

10.4018/978-1-7998-2772-6.ch012 ◽

2020 ◽

pp. 217-244

Author(s):

Ayush Srivastav ◽

Hera Khan ◽

Amit Kumar Mishra

Keyword(s):

Neural Networks ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Text Processing ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.

Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2013-001837 ◽

2014 ◽

Vol 21 (3) ◽

pp. 406-413 ◽

Cited By ~ 20

Author(s):

Todd Lingren ◽

Louise Deleger ◽

Katalin Molnar ◽

Haijun Zhai ◽

Jareen Meinzen-Derr ◽

...

Keyword(s):

Clinical Trial ◽

Language Processing ◽

Gold Standard ◽

Entity Recognition ◽

Potential Bias ◽

Named Entity ◽

The Impact ◽

Standard Development

DeNERT-KG: Named Entity and Relation Extraction Model Using DQN, Knowledge Graph, and BERT

Applied Sciences ◽

10.3390/app10186429 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6429

Author(s):

SungMin Yang ◽

SoYeop Yoo ◽

OkRan Jeong

Keyword(s):

Natural Language ◽

Language Processing ◽

Language Model ◽

Artificial Intelligence Technology

Relation Extraction ◽

Entity Recognition ◽

Knowledge Graph ◽

Named Entity ◽

Along with studies on artificial intelligence technology, research is also being carried out actively in the field of natural language processing to understand and process people’s language, in other words, natural language. For computers to learn on their own, the skill of understanding natural language is very important. There are a wide variety of tasks involved in the field of natural language processing, but we would like to focus on the named entity registration and relation extraction task, which is considered to be the most important in understanding sentences. We propose DeNERT-KG, a model that can extract subject, object, and relationships, to grasp the meaning inherent in a sentence. Based on the BERT language model and Deep Q-Network, the named entity recognition (NER) model for extracting subject and object is established, and a knowledge graph is applied for relation extraction. Using the DeNERT-KG model, it is possible to extract the subject, type of subject, object, type of object, and relationship from a sentence, and verify this model through experiments.

Automated Construction Specification Review with Named Entity Recognition Using Natural Language Processing

Journal of Construction Engineering and Management ◽

10.1061/(asce)co.1943-7862.0001953 ◽

2021 ◽

Vol 147 (1) ◽

pp. 04020147

Author(s):

Seonghyeon Moon ◽

Gitaek Lee ◽

Seokho Chi ◽

Hyunchul Oh

Keyword(s):

Natural Language ◽

Language Processing ◽

Entity Recognition ◽

Named Entity ◽

Construction Specification

Named Entity Recognition in Natural Language Processing: A Systematic Review

10.1007/978-981-16-3346-1_66 ◽

2021 ◽

pp. 817-828

Author(s):

Abhishek Sharma ◽

Amrita ◽

Sudeshna Chakraborty ◽

Shivam Kumar

Keyword(s):

Systematic Review ◽

Natural Language ◽

Language Processing ◽

Entity Recognition ◽

Named Entity

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

Large Language Models for Latvian Named Entity Recognition

10.3233/faia200603 ◽

2020 ◽

Author(s):

Rinalds Vīksna ◽

Inguna Skadiņa

Keyword(s):

Natural Language ◽

Language Processing ◽

Entity Recognition ◽

Language Models ◽

Named Entity ◽

Language Data

Transformer-based language models pre-trained on large corpora have demonstrated good results on multiple natural language processing tasks for widely used languages including named entity recognition (NER). In this paper, we investigate the role of the BERT models in the NER task for Latvian. We introduce the BERT model pre-trained on the Latvian language data. We demonstrate that the Latvian BERT model, pre-trained on large Latvian corpora, achieves better results (81.91 F1-measure on average vs 78.37 on M-BERT for a dataset with nine named entity types, and 79.72 vs 78.83 on another dataset with seven types) than multilingual BERT and outperforms previously developed Latvian NER systems.

Deep learning in clinical natural language processing: a methodical review

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz200 ◽

2019 ◽

Vol 27 (3) ◽

pp. 457-470 ◽

Cited By ~ 25

Author(s):

Stephen Wu ◽

Kirk Roberts ◽

Surabhi Datta ◽

Jingcheng Du ◽

Zongcheng Ji ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Recurrent Neural Networks ◽

Entity Recognition ◽

Named Entity

Abstract Objective This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research. Materials and Methods We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers. Results DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a “long tail” of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific. Discussion Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning). Conclusion Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field.

Evaluation of a Concept Mapping Task Using Named Entity Recognition and Normalization in Unstructured Clinical Text

Journal of Healthcare Informatics Research ◽

10.1007/s41666-020-00079-z ◽

2020 ◽

Vol 4 (4) ◽

pp. 395-410

Author(s):

Sapna Trivedi ◽

Roger Gildersleeve ◽

Sandra Franco ◽

Andrew S. Kanter ◽

Afzal Chaudhry

Keyword(s):

Pilot Study ◽

Language Processing ◽

Wide Spectrum ◽

Entity Recognition ◽

Free Text ◽

Semantic Type ◽

Test Set ◽

Named Entity

AbstractIn this pilot study, we explore the feasibility and accuracy of using a query in a commercial natural language processing engine in a named entity recognition and normalization task to extract a wide spectrum of clinical concepts from free text clinical letters. Editorial guidance developed by two independent clinicians was used to annotate sixty anonymized clinic letters to create the gold standard. Concepts were categorized by semantic type, and labels were applied to indicate contextual attributes such as negation. The natural language processing (NLP) engine was Linguamatics I2E version 5.3.1, equipped with an algorithm for contextualizing words and phrases and an ontology of terms from Intelligent Medical Objects to which those tokens were mapped. Performance of the engine was assessed on a training set of the documents using precision, recall, and the F1 score, with subset analysis for semantic type, accurate negation, exact versus partial conceptual matching, and discontinuous text. The engine underwent tuning, and the final performance was determined for a test set. The test set showed an F1 score of 0.81 and 0.84 using strict and relaxed criteria respectively when appropriate negation was not required and 0.75 and 0.77 when it was. F1 scores were higher when concepts were derived from continuous text only. This pilot study showed that a commercially available NLP engine delivered good overall results for identifying a wide spectrum of structured clinical concepts. Such a system holds promise for extracting concepts from free text to populate problem lists or for data mining projects.