natural language processing algorithm Latest Research Papers

Here, we developed and validated a highly generalizable natural language processing algorithm based on deep-learning. Our algorithm was trained and tested on a highly diverse dataset from over 2,000 hospital sites and 500 radiologists. The resulting algorithm achieved an AUROC of 0.96 for the presence or absence of liver lesions while achieving a specificity of 0.99 and a sensitivity of 0.6.

Download Full-text

Developing A Deep Learning Natural Language Processing Algorithm For Automated Reporting Of Adverse Drug Reactions

10.1101/2021.12.11.21267504 ◽

2021 ◽

Author(s):

Christopher McMaster ◽

Julia Chan ◽

David FL Liew ◽

Elizabeth Su ◽

Albert G Frauman ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Adverse Events ◽

Natural Language ◽

Adverse Drug Reactions ◽

Language Processing ◽

Processing Algorithm ◽

Drug Reactions ◽

Discharge Summaries ◽

Natural Language Processing Algorithm

The detection of adverse drug reactions (ADRs) is critical to our understanding of the safety and risk-benefit profile of medications. With an incidence that has not changed over the last 30 years, ADRs are a significant source of patient morbidity, responsible for 5-10% of acute care hospital admissions worldwide. Spontaneous reporting of ADRs has long been the standard method of reporting, however this approach is known to have high rates of under-reporting, a problem that limits pharmacovigilance efforts. Automated ADR reporting presents an alternative pathway to increase reporting rates, although this may be limited by over-reporting of other drug-related adverse events. We developed a deep learning natural language processing algorithm to identify ADRs in discharge summaries at a single academic hospital centre. Our model was developed in two stages: first, a pre-trained model (DeBERTa) was further pre-trained on 150,000 unlabelled discharge summaries; secondly, this model was fine-tuned to detect ADR mentions in a corpus of 861 annotated discharge summaries. To ensure that our algorithm could differentiate ADRs from other drug-related adverse events, the annotated corpus was enriched for both validated ADR reports and confounding drug-related adverse events using. The final model demonstrated good performance with a ROC-AUC of 0.934 (95% CI 0.931 - 0.955) for the task of identifying discharge summaries containing ADR mentions.

Download Full-text

Cognitive HSE Risk Prediction and Notification Tool Based on Natural Language Processing

10.2118/205877-ms ◽

2021 ◽

Author(s):

Tharunya Danabal ◽

Neethi Sarah John ◽

Abhijeet Pramod Ghawade ◽

Pranjal Padharinath Ahire

Keyword(s):

Natural Language Processing ◽

Pilot Study ◽

Natural Language ◽

Language Processing ◽

Reporting System ◽

Risk Index ◽

Processing Algorithm ◽

Cognitive Tool ◽

Severity Levels ◽

Natural Language Processing Algorithm

Abstract The focus of this work is on developing a cognitive tool that predicts the most frequent HSE hazards with the highest potential severity levels. The tool identifies these risks using a natural language processing algorithm on HSE leading and lagging indicator reports submitted to an oilfield services company’s global HSE reporting system. The purpose of the tool is to prioritize proactive actions and provide focus to raise workforce awareness. A natural language processing algorithm was developed to identify priority HSE risks based on potential severity levels and frequency of occurrence. The algorithm uses vectorization, compression, and clustering methods to categorize the risks by potential severity and frequency using a formulated risk index methodology. In the pilot study, a user interface was developed to configure the frequency and the number of the prioritized HSE risks that are to be communicated from the tool to those employees who opted to receive the information in a given location. From this pilot study using data reported in the company’s online HSE reporting system, the algorithm successfully identified five priority HSE risks across different hazard categories based on the risk index. Using a high volume of reporting data, the risk index factored multiple coefficients such as severity levels, frequency and cluster tightness to prioritize the HSE risks. The observations at each stage of the developed algorithm are as follows:In the data cleaning stage, all stop words (such as a, and, the) were removed, followed by tokenization to divide text in the HSE reports into tokens and remove punctuation.In the vectorization stage, many vectors were formed using the Term Frequency - Inverse Document Frequency (TF-IDF) method.In the compression stage, an autoencoder removed the noise from the input data.In the agglomerative clustering stage, HSE reports with similar words were grouped into clusters and the number of clusters generated per category were in the range of three to five. The novelty of this approach is its ability to prioritize a location’s HSE risks using an algorithm containing natural language processing techniques. This cognitive tool treats reported HSE information as data to identify and flag priority HSE risks factoring in the frequency of similar reports and their associated severity levels. The proof of concept has demonstrated the potential ability of the tool. The next stage would be to test predictive capabilities for injury prevention.

Download Full-text