Named-Entity-Recognition-Based Automated System for Diagnosing Cybersecurity Situations in IoT Networks

The aim of this paper was to enhance the process of diagnosing and detecting possible vulnerabilities within an Internet of Things (IoT) system by using a named entity recognition (NER)-based solution. In both research and practice, security system management experts rely on a large variety of heterogeneous security data sources, which are usually available in the form of natural language. This is challenging as the process is very time consuming and it is difficult to stay up to date with the constant findings in the areas of security threats, vulnerabilities, attacks, countermeasures, and risks. The proposed system is conceived as a semantic indexing solution of existing vulnerabilities and serves as an information tool for security management experts. By integrating the proposed system, the users can easily discover the potential vulnerabilities of their IoT devices. The proposed solution integrates ontologies and NER techniques in order to obtain a high rate of automation with the scope of reaching a self-maintained and up-to-date system in terms of vulnerabilities and common exposures knowledge. To achieve this, a total of 312 CVEs (common vulnerabilities and exposures) specific to the IoT field were identified. CVEs are arguably one of the most important cybersecurity resources nowadays, containing information about the latest discovered vulnerabilities. This set is further used as data corpus for an NER model designed to identify the main entities and relations that are relevant to IoT security. The goal is to automatically monitor cybersecurity information relevant to IoT, and filter and present it in an organized and structured framework based on users’ needs. The taxonomies specific to IoT security are implemented via a domain ontology, which is later used to process natural language. Relevant tokens are marked as entities and the relations between them identified. The text analysis solution is connected to a gateway which scans the environment and identifies the main IoT devices and communication technologies. The strength of the approach proposed within this research is that the designed semantic gateway is using context-aware searches in the modeled IoT security database and can identify possible vulnerabilities before they can be exploited.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

Probing Patient Messages Enhanced by Natural Language Processing: A Top-Down Message Corpus Analysis

Health Data Science ◽

10.34133/2021/1504854 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

George Mastorakos ◽

Aditya Khurana ◽

Ming Huang ◽

Sunyang Fu ◽

Ahmad P. Tafti ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Corpus Analysis ◽

Entity Recognition ◽

Message Content ◽

Named Entity ◽

Medical Concepts ◽

Insight Into

Background. Patients increasingly use asynchronous communication platforms to converse with care teams. Natural language processing (NLP) to classify content and automate triage of these messages has great potential to enhance clinical efficiency. We characterize the contents of a corpus of portal messages generated by patients using NLP methods. We aim to demonstrate descriptive analyses of patient text that can contribute to the development of future sophisticated NLP applications. Methods. We collected approximately 3,000 portal messages from the cardiology, dermatology, and gastroenterology departments at Mayo Clinic. After labeling these messages as either Active Symptom, Logistical, Prescription, or Update, we used NER (named entity recognition) to identify medical concepts based on the UMLS library. We hierarchically analyzed the distribution of these messages in terms of departments, message types, medical concepts, and keywords therewithin. Results. Active Symptom and Logistical content types comprised approximately 67% of the message cohort. The “Findings” medical concept had the largest number of keywords across all groupings of content types and departments. “Anatomical Sites” and “Disorders” keywords were more prevalent in Active Symptom messages, while “Drugs” keywords were most prevalent in Prescription messages. Logistical messages tended to have the lower proportions of “Anatomical Sites,”, “Disorders,”, “Drugs,”, and “Findings” keywords when compared to other message content types. Conclusions. This descriptive corpus analysis sheds light on the content and foci of portal messages. The insight into the content and differences among message themes can inform the development of more robust NLP models.

Download Full-text

Advances in Computational Linguistics and Text Processing Frameworks

Advances in Computer and Electrical Engineering - Handbook of Research on Engineering Innovations and Technology Management in Organizations ◽

10.4018/978-1-7998-2772-6.ch012 ◽

2020 ◽

pp. 217-244

Author(s):

Ayush Srivastav ◽

Hera Khan ◽

Amit Kumar Mishra

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Text Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.

Download Full-text

Named Entity Recognition in Natural Language Texts obtained through Audio Interfaces

Proceedings of the IV International research conference "Information technologies in Science, Management, Social sphere and Medicine"(ITSMSSM 2017) ◽

10.2991/itsmssm-17.2017.68 ◽

2017 ◽

Author(s):

Nail Gafurov ◽

Alexey Platonov

Keyword(s):

Natural Language ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Audio Interfaces

Download Full-text

OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition

Genomics & Informatics ◽

10.5808/gi.21015 ◽

2021 ◽

Vol 19 (3) ◽

pp. e27

Author(s):

Pierre Larmande ◽

Yusha Liu ◽

Xinzhi Yao ◽

Jingbo Xia

Keyword(s):

Natural Language ◽

Language Processing ◽

Rapid Evolution ◽

Named Entity Recognition ◽

Entity Recognition ◽

Biological Databases ◽

Named Entity ◽

Biological Domain ◽

Or Gene ◽

Protein Dataset

Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pre-trained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.

Download Full-text

DeNERT-KG: Named Entity and Relation Extraction Model Using DQN, Knowledge Graph, and BERT

Applied Sciences ◽

10.3390/app10186429 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6429

Author(s):

SungMin Yang ◽

SoYeop Yoo ◽

OkRan Jeong

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Model ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Knowledge Graph ◽

Named Entity ◽

Artificial Intelligence Technology

Along with studies on artificial intelligence technology, research is also being carried out actively in the field of natural language processing to understand and process people’s language, in other words, natural language. For computers to learn on their own, the skill of understanding natural language is very important. There are a wide variety of tasks involved in the field of natural language processing, but we would like to focus on the named entity registration and relation extraction task, which is considered to be the most important in understanding sentences. We propose DeNERT-KG, a model that can extract subject, object, and relationships, to grasp the meaning inherent in a sentence. Based on the BERT language model and Deep Q-Network, the named entity recognition (NER) model for extracting subject and object is established, and a knowledge graph is applied for relation extraction. Using the DeNERT-KG model, it is possible to extract the subject, type of subject, object, type of object, and relationship from a sentence, and verify this model through experiments.

Download Full-text