scholarly journals Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gian Maria Zaccaria ◽  
Vito Colella ◽  
Simona Colucci ◽  
Felice Clemente ◽  
Fabio Pavone ◽  
...  

AbstractThe unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mateusz Szczepański ◽  
Marek Pawlicki ◽  
Rafał Kozik ◽  
Michał Choraś

AbstractThe ubiquity of social media and their deep integration in the contemporary society has granted new ways to interact, exchange information, form groups, or earn money—all on a scale never seen before. Those possibilities paired with the widespread popularity contribute to the level of impact that social media display. Unfortunately, the benefits brought by them come at a cost. Social Media can be employed by various entities to spread disinformation—so called ‘Fake News’, either to make a profit or influence the behaviour of the society. To reduce the impact and spread of Fake News, a diverse array of countermeasures were devised. These include linguistic-based approaches, which often utilise Natural Language Processing (NLP) and Deep Learning (DL). However, as the latest advancements in the Artificial Intelligence (AI) domain show, the model’s high performance is no longer enough. The explainability of the system’s decision is equally crucial in real-life scenarios. Therefore, the objective of this paper is to present a novel explainability approach in BERT-based fake news detectors. This approach does not require extensive changes to the system and can be attached as an extension for operating detectors. For this purposes, two Explainable Artificial Intelligence (xAI) techniques, Local Interpretable Model-Agnostic Explanations (LIME) and Anchors, will be used and evaluated on fake news data, i.e., short pieces of text forming tweets or headlines. This focus of this paper is on the explainability approach for fake news detectors, as the detectors themselves were part of previous works of the authors.


Cureus ◽  
2017 ◽  
Author(s):  
Farhan Mohammad ◽  
Gwenalyn Garcia ◽  
Shiksha Kedia ◽  
Juan Ding ◽  
Matthew Hurford ◽  
...  

2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
N Cruz ◽  
M Serrano ◽  
A Lopez ◽  
I H Medrano ◽  
J Lozano ◽  
...  

Abstract Background Research efforts to develop strategies to effectively identify patients and reduce the burden of cardiovascular diseases is essential for the future of the health system. Most research studies have used only coded parts of electronic health records (EHRs) for case-detection obtaining missed data cases, reducing study quality and in some case bias findings. Incorporating information from free-text into case-detection through Big Data and Artificial Intelligence techniques improves research quality. Savana has developed EHRead, a powerful technology that applies Natural Language Processing, Machine Learning and Deep Learning, to analyse and automatically extracts highly valuable medical information from unstructured free text contained in the EHR to support research and practice. Purpose We aimed to validate the linguistic accuracy performance of Savana, in terms of Precision (P), Recall (R) and overall performance (F-Score) in the cardiovascular domain since this is one of the most prevalent disease in the general population. This means validating the extent to which the Savana system identifies mentions to atherosclerotic/cardiovascular clinical phenotypes in EHRs. Methods The project was conducted in 3 Spanish sites and the system was validated using a corpus that consisted of 739 EHRs, including the emergency, medical and discharge records, written in free text. These EHRs were randomly selected from the total number of clinical documents generated during the period of 2012–2017 and were fully anonymized to comply with legal and ethical requirements. Two physicians per site reviewed records (randomly selected) and annotated all direct references to atherosclerotic/cardiovascular clinical phenotypes, following the annotation guidelines previously developed. A third physician adjudicated discordant annotations. Savana's performance was automatically calculated using as validation resource the gold standard created by the experts. Results We found good levels of performance achieved by Savana in the identification of mentions to atherosclerotic/cardiovascular clinical phenotypes, yielding an overall P, R, and F-score of 0.97, 0.92, and 0.94, respectively. We also found that going through all the EHRs and identifying the mentions to atherosclerotic/cardiovascular clinical phenotypes, the expert spent ∼ 60h while Savana ∼ 36 min. Conclusion(s) Innovative techniques to identify atherosclerotic/cardiovascular clinical phenotypes could be used to support real world data research and clinical practice. Overall Savana showed a high performance, comparable with those obtained by an expert physician annotator doing the same task. Additionally, a significant reduction of time in using automatic information extraction system was achieved.


2013 ◽  
Vol 2 (1) ◽  
Author(s):  
Long Yang ◽  
Beth Harrison ◽  
Mary Ann Perle ◽  
Mandeep Singh ◽  
Zhiheng Pei ◽  
...  

2012 ◽  
Vol 138 (suppl 2) ◽  
pp. A225-A225
Author(s):  
Long Yang ◽  
Beth Harrison ◽  
Mary Ann Perle ◽  
Mandeep Singh ◽  
Zhiheng Pei ◽  
...  

2019 ◽  
Author(s):  
Hannah L. Weeks ◽  
Cole Beck ◽  
Elizabeth McNeer ◽  
Cosmin A. Bejan ◽  
Joshua C. Denny ◽  
...  

ABSTRACTObjectiveWe developed medExtractR, a natural language processing system to extract medication dose and timing information from clinical notes. Our system facilitates creation of medication-specific research datasets from electronic health records.Materials and MethodsWritten using the R programming language, medExtractR combines lexicon dictionaries and regular expression patterns to identify relevant medication information (‘drug entities’). The system is designed to extract particular medications of interest, rather than all possible medications mentioned in a clinical note. MedExtractR was developed on notes from Vanderbilt University’s Synthetic Derivative, using two medications (tacrolimus and lamotrigine) prescribed with varying complexity, and with a third drug (allopurinol) used for testing generalizability of results. We evaluated medExtractR and compared it to three existing systems: MedEx, MedXN, and CLAMP.ResultsOn 50 test notes for each development drug and 110 test notes for the additional drug, medExtractR achieved high overall performance (F-measures > 0.95). This exceeded the performance of the three existing systems across all drugs, with the exception of a couple specific entity-level evaluations including dose amount for lamotrigine and allopurinol.DiscussionMedExtractR successfully extracted medication entities for medications of interest. High performance in entity-level extraction tasks provides a strong foundation for developing robust research datasets for pharmacological research. However, its targeted approach provides a narrower scope compared with existing systems.ConclusionMedExtractR (available as an R package) achieved high performance values in extracting specific medications from clinical text, leading to higher quality research datasets for drug-related studies than some existing general-purpose medication extraction tools.


2020 ◽  
Vol 27 (3) ◽  
pp. 407-418 ◽  
Author(s):  
Hannah L Weeks ◽  
Cole Beck ◽  
Elizabeth McNeer ◽  
Michael L Williams ◽  
Cosmin A Bejan ◽  
...  

Abstract Objective We developed medExtractR, a natural language processing system to extract medication information from clinical notes. Using a targeted approach, medExtractR focuses on individual drugs to facilitate creation of medication-specific research datasets from electronic health records. Materials and Methods Written using the R programming language, medExtractR combines lexicon dictionaries and regular expressions to identify relevant medication entities (eg, drug name, strength, frequency). MedExtractR was developed on notes from Vanderbilt University Medical Center, using medications prescribed with varying complexity. We evaluated medExtractR and compared it with 3 existing systems: MedEx, MedXN, and CLAMP (Clinical Language Annotation, Modeling, and Processing). We also demonstrated how medExtractR can be easily tuned for better performance on an outside dataset using the MIMIC-III (Medical Information Mart for Intensive Care III) database. Results On 50 test notes per development drug and 110 test notes for an additional drug, medExtractR achieved high overall performance (F-measures >0.95), exceeding performance of the 3 existing systems across all drugs. MedExtractR achieved the highest F-measure for each individual entity, except drug name and dose amount for allopurinol. With tuning and customization, medExtractR achieved F-measures >0.90 in the MIMIC-III dataset. Discussion The medExtractR system successfully extracted entities for medications of interest. High performance in entity-level extraction provides a strong foundation for developing robust research datasets for pharmacological research. When working with new datasets, medExtractR should be tuned on a small sample of notes before being broadly applied. Conclusions The medExtractR system achieved high performance extracting specific medications from clinical text, leading to higher-quality research datasets for drug-related studies than some existing general-purpose medication extraction tools.


2019 ◽  
pp. 1-8
Author(s):  
Enrico Santus ◽  
Clara Li ◽  
Adam Yala ◽  
Donald Peck ◽  
Rufina Soomro ◽  
...  

PURPOSE Natural language processing (NLP) techniques have been adopted to reduce the curation costs of electronic health records. However, studies have questioned whether such techniques can be applied to data from previously unseen institutions. We investigated the performance of a common neural NLP algorithm on data from both known and heldout (ie, institutions whose data were withheld from the training set and only used for testing) hospitals. We also explored how diversity in the training data affects the system’s generalization ability. METHODS We collected 24,881 breast pathology reports from seven hospitals and manually annotated them with nine key attributes that describe types of atypia and cancer. We trained a convolutional neural network (CNN) on annotations from either only one (CNN1), only two (CNN2), or only four (CNN4) hospitals. The trained systems were tested on data from five organizations, including both known and heldout ones. For every setting, we provide the accuracy scores as well as the learning curves that show how much data are necessary to achieve good performance and generalizability. RESULTS The system achieved a cross-institutional accuracy of 93.87% when trained on reports from only one hospital (CNN1). Performance improved to 95.7% and 96%, respectively, when the system was trained on reports from two (CNN2) and four (CNN4) hospitals. The introduction of diversity during training did not lead to improvements on the known institutions, but it boosted performance on the heldout institutions. When tested on reports from heldout hospitals, CNN4 outperformed CNN1 and CNN2 by 2.13% and 0.3%, respectively. CONCLUSION Real-world scenarios require that neural NLP approaches scale to data from previously unseen institutions. We show that a common neural NLP algorithm for information extraction can achieve this goal, especially when diverse data are used during training.


Skull Base ◽  
2009 ◽  
Vol 19 (03) ◽  
Author(s):  
Gopi Shah ◽  
Marc Rosen ◽  
James Evans

2006 ◽  
Vol preprint (2007) ◽  
pp. 1
Author(s):  
Kristi Smock ◽  
Hassan Yaish ◽  
Mitchell Cairo ◽  
Mark Lones ◽  
Carlynn Willmore-Payne ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document