A Novel Metric to Quantify the Effect of Pathway Enrichment Evaluation With Respect to Biomedical Text-Mined Terms: Development and Feasibility Study (Preprint)

BACKGROUND Natural language processing has long been applied in various applications for biomedical knowledge inference and discovery. Enrichment analysis based on named entity recognition is a classic application for inferring enriched associations in terms of specific biomedical entities such as gene, chemical, and mutation. OBJECTIVE The aim of this study was to investigate the effect of pathway enrichment evaluation with respect to biomedical text-mining results and to develop a novel metric to quantify the effect. METHODS Four biomedical text mining methods were selected to represent natural language processing methods on drug-related gene mining. Subsequently, a pathway enrichment experiment was performed by using the mined genes, and a series of inverse pathway frequency (IPF) metrics was proposed accordingly to evaluate the effect of pathway enrichment. Thereafter, 7 IPF metrics and traditional <i>P</i> value metrics were compared in simulation experiments to test the robustness of the proposed metrics. RESULTS IPF metrics were evaluated in a case study of rapamycin-related gene set. By applying the best IPF metrics in a pathway enrichment simulation test, a novel discovery of drug efficacy of rapamycin for breast cancer was replicated from the data chosen prior to the year 2000. Our findings show the effectiveness of the best IPF metric in support of knowledge discovery in new drug use. Further, the mechanism underlying the drug-disease association was visualized by Cytoscape. CONCLUSIONS The results of this study suggest the effectiveness of the proposed IPF metrics in pathway enrichment evaluation as well as its application in drug use discovery.

Download Full-text

Enhancing Disability Determination Decision Process Through Natural Language Processing

10.4018/978-1-6684-3542-7.ch036 ◽

2022 ◽

pp. 682-693

Author(s):

Eslam Amer

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Decision Process ◽

Biomedical Text ◽

Biomedical Text Mining ◽

Disability Benefits ◽

New Approach ◽

Disability Determination

In this article, a new approach is introduced that makes use of the valuable information that can be extracted from a patient's electronic healthcare records (EHRs). The approach employs natural language processing and biomedical text mining to handle patient's data. The developed approach extracts relevant medical entities and builds relations between symptoms and other clinical signature modifiers. The extracted features are viewed as evaluation features. The approach utilizes such evaluation features to decide whether an applicant could gain disability benefits or not. Evaluations showed that the proposed approach accurately extracts symptoms and other laboratory marks with high F-measures (93.5-95.6%). Also, results showed an excellent deduction in assessments to approve or reject an applicant case to obtain a disability benefit.

Download Full-text

Current issues in biomedical text mining and natural language processing

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2009.09.001 ◽

2009 ◽

Vol 42 (5) ◽

pp. 757-759 ◽

Cited By ~ 28

Author(s):

Wendy W. Chapman ◽

K. Bretonnel Cohen

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Biomedical Text ◽

Biomedical Text Mining

Download Full-text

Enhancing Disability Determination Decision Process Through Natural Language Processing

International Journal of Applied Research on Public Health Management ◽

10.4018/ijarphm.2019070102 ◽

2019 ◽

Vol 4 (2) ◽

pp. 15-28

Author(s):

Eslam Amer

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Decision Process ◽

Biomedical Text ◽

Biomedical Text Mining ◽

Disability Benefits ◽

Disability Benefit ◽

New Approach ◽

Disability Determination

Download Full-text

Does higher education properly prepare graduates for the growing artificial intelligence market? Gaps identification using text mining

Human Systems Management ◽

10.3233/hsm-211179 ◽

2021 ◽

pp. 1-13

Author(s):

Lamiae Benhayoun ◽

Daniel Lang

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Academic Training ◽

Market Requirements ◽

Job Advertisements ◽

The Individual

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

Identifying Causality and Contributory Factors of Pipeline incidents by Employing Natural Language Processing and Text Mining Techniques

Process Safety and Environmental Protection ◽

10.1016/j.psep.2021.05.036 ◽

2021 ◽

Author(s):

Guanyang Liu ◽

Mason Boyd ◽

Mengxi Yu ◽

S. Zohra Halim ◽

Noor Quddus

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Contributory Factors

Download Full-text

Probing Patient Messages Enhanced by Natural Language Processing: A Top-Down Message Corpus Analysis

Health Data Science ◽

10.34133/2021/1504854 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

George Mastorakos ◽

Aditya Khurana ◽

Ming Huang ◽

Sunyang Fu ◽

Ahmad P. Tafti ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Corpus Analysis ◽

Entity Recognition ◽

Message Content ◽

Named Entity ◽

Medical Concepts ◽

Insight Into

Background. Patients increasingly use asynchronous communication platforms to converse with care teams. Natural language processing (NLP) to classify content and automate triage of these messages has great potential to enhance clinical efficiency. We characterize the contents of a corpus of portal messages generated by patients using NLP methods. We aim to demonstrate descriptive analyses of patient text that can contribute to the development of future sophisticated NLP applications. Methods. We collected approximately 3,000 portal messages from the cardiology, dermatology, and gastroenterology departments at Mayo Clinic. After labeling these messages as either Active Symptom, Logistical, Prescription, or Update, we used NER (named entity recognition) to identify medical concepts based on the UMLS library. We hierarchically analyzed the distribution of these messages in terms of departments, message types, medical concepts, and keywords therewithin. Results. Active Symptom and Logistical content types comprised approximately 67% of the message cohort. The “Findings” medical concept had the largest number of keywords across all groupings of content types and departments. “Anatomical Sites” and “Disorders” keywords were more prevalent in Active Symptom messages, while “Drugs” keywords were most prevalent in Prescription messages. Logistical messages tended to have the lower proportions of “Anatomical Sites,”, “Disorders,”, “Drugs,”, and “Findings” keywords when compared to other message content types. Conclusions. This descriptive corpus analysis sheds light on the content and foci of portal messages. The insight into the content and differences among message themes can inform the development of more robust NLP models.

Download Full-text

Application of Natural Language Processing and Text Mining to Identify Patterns in Construction-Defect Litigation Cases

Journal of Legal Affairs and Dispute Resolution in Engineering and Construction ◽

10.1061/(asce)la.1943-4170.0000308 ◽

2019 ◽

Vol 11 (4) ◽

pp. 04519024 ◽

Cited By ~ 4

Author(s):

Yashovardhan Jallan ◽

Elizabeth Brogan ◽

Baabak Ashuri ◽

Caroline M. Clevenger

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing

Download Full-text

FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining

BMC Bioinformatics ◽

10.1186/s12859-018-2211-5 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 10

Author(s):

John A. Bachman ◽

Benjamin M. Gyori ◽

Peter K. Sorger

Keyword(s):

Text Mining ◽

Human Protein ◽

Entity Recognition ◽

Biomedical Text ◽

Biomedical Text Mining ◽

Protein Families

Download Full-text

EventEpi–A Natural Language Processing Framework for Event-Based Surveillance

10.1101/19006395 ◽

2019 ◽

Author(s):

Auss Abbood ◽

Alexander Ullrich ◽

Rüdiger Busche ◽

Stéphane Ghozzi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Application ◽

Fine Tuning ◽

Entity Recognition ◽

World Health ◽

Support Vector ◽

Event Based ◽

Processing Framework

AbstractAccording to the World Health Organization (WHO), around 60% of all outbreaks are detected using informal sources. In many public health institutes, including the WHO and the Robert Koch Institute (RKI), dedicated groups of epidemiologists sift through numerous articles and newsletters to detect relevant events. This media screening is one important part of event-based surveillance (EBS). Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. To support EBS, but also to gain insights into what makes an article and the event it describes relevant, we developed a natural-language-processing framework for automated information extraction and relevance scoring. First, we scraped relevant sources for EBS as done at RKI (WHO Disease Outbreak News and ProMED) and automatically extracted the articles’ key data: disease, country, date, and confirmed-case count. For this, we performed named entity recognition in two steps: EpiTator, an open-source epidemiological annotation tool, suggested many different possibilities for each. We trained a naive Bayes classifier to find the single most likely one using RKI’s EBS database as labels. Then, for relevance scoring, we defined two classes to which any article might belong: The article is relevant if it is in the EBS database and irrelevant otherwise. We compared the performance of different classifiers, using document and word embeddings. Two of the tested algorithms stood out: The multilayer perceptron performed best overall, with a precision of 0.19, recall of 0.50, specificity of 0.89, F1 of 0.28, and the highest tested index balanced accuracy of 0.46. The support-vector machine, on the other hand, had the highest recall (0.88) which can be of higher interest for epidemiologists. Finally, we integrated these functionalities into a web application called EventEpi where relevant sources are automatically analyzed and put into a database. The user can also provide any URL or text, that will be analyzed in the same way and added to the database. Each of these steps could be improved, in particular with larger labeled datasets and fine-tuning of the learning algorithms. The overall framework, however, works already well and can be used in production, promising improvements in EBS. The source code is publicly available at https://github.com/aauss/EventEpi.

Download Full-text