scholarly journals Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale

JAMIA Open ◽  
2020 ◽  
Vol 3 (2) ◽  
pp. 185-189
Author(s):  
Timothy A Miller ◽  
Paul Avillach ◽  
Kenneth D Mandl

Abstract Objective To develop scalable natural language processing (NLP) infrastructure for processing the free text in electronic health records (EHRs). Materials and Methods We extend the open-source Apache cTAKES NLP software with several standard technologies for scalability. We remove processing bottlenecks by monitoring component queue size. We process EHR free text for patients in the PrecisionLink Biobank at Boston Children’s Hospital. The extracted concepts are made searchable via a web-based portal. Results We processed over 1.2 million notes for over 8000 patients, extracting 154 million concepts. Our largest tested configuration processes over 1 million notes per day. Discussion The unique information represented by extracted NLP concepts has great potential to provide a more complete picture of patient status. Conclusion NLP large EHR document collections can be done efficiently, in service of high throughput phenotyping.

2021 ◽  
Vol 89 (9) ◽  
pp. S155
Author(s):  
Nicolas Nunez ◽  
Joanna M. Biernacka ◽  
Manuel Gardea-Resendez ◽  
Bhavani Singh Agnikula Kshatriya ◽  
Euijung Ryu ◽  
...  

2018 ◽  
Author(s):  
Lili Chan ◽  
Kelly Beers ◽  
Kinsuk Chauhan ◽  
Neha Debnath ◽  
Aparna Saha ◽  
...  

AbstractBackgroundIdentification of symptoms is challenging with surveys, which are time-intensive and low-throughput. Natural language processing (NLP) could be utilized to identify symptoms from narrative documentation in the electronic health record (EHR).MethodsWe utilized NLP to parse notes for maintenance hemodialysis (HD) patients from two EHR databases (BioMe and MIMIC-III) to identify fatigue, nausea/vomiting, anxiety, depression, cramping, itching, and pain. We compared NLP performance with International Classification of Diseases (ICD) codes and validated the performance of both NLP and codes against manual chart review in a representative subset.ResultsWe identified 1034 and 929 HD patients from BioMe and MIMIC-III respectively. The most frequently identified symptoms by NLP from both cohorts were fatigue, pain, and nausea and/or vomiting. NLP was significantly more sensitive than ICD codes for nearly all symptoms. In the BioMe dataset, sensitivity for NLP ranged from 0.85-0.99 vs. 0.09-0.59 for ICD codes. In the MIMIC-III dataset, NLP sensitivity was 0.8-0.98 vs. 0.02-0.53 for ICD. ICD codes were significantly more specific for nausea and/or vomiting (NLP 0.57 vs. ICD 0.97, P=0.03) in BioMe and for depression (NLP 0.67 vs. ICD 0.99, P=0.002) in MIMIC-III. A majority of patients in both cohorts had ?4 symptoms. The more encounters available for a patient the more likely NLP was to identify a symptom.ConclusionsNLP out performed ICD codes for identification of symptoms on several tests parameters including sensitivity for a majority of symptoms. NLP may be useful for the high-throughput identification of patient centered outcomes from EHR.Significance StatementPatients on maintenance hemodialysis experience a high frequency of symptoms. However, symptoms have been measured utilizing time-intensive surveys. This paper compares natural language processing (NLP) to administrative codes for the identification of seven key symptoms from two cohorts with electronic health records and validation through manual chart review. NLP identified high rates of symptoms; the most common were fatigue, pain, and nausea and/or vomiting. A majority of patients had ≥4 symptoms. NLP was significantly more sensitive at identifying symptoms compared to administrative codes for nearly all symptoms but specificity was not significantly different compared to codes. This paper demonstrates utility of a high throughput method of identifying symptoms from EHR which may advance the field of patient centered research in nephrology.


2019 ◽  
Vol 26 (4) ◽  
pp. 364-379 ◽  
Author(s):  
Theresa A Koleck ◽  
Caitlin Dreisbach ◽  
Philip E Bourne ◽  
Suzanne Bakken

Abstract Objective Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives. Materials and Methods Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study. Results Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics. Discussion NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves. Conclusion Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.


2018 ◽  
Vol 1 (1) ◽  
pp. 003-004
Author(s):  
Man Liu

Cancer is in the midst of leading causes of death. In 2018, around 1,735,350 new cases of cancer were estimated and 609,640 people will die from cancer in the United States. A wealth of cancer-relevant information is conserved in a variety of types of healthcare records, for example, the electronic health records (EHRs). However, part of the critical information is organized in the free narrative text which hampers machine to interpret the information underlying the text. The development of artificial intelligence provides a variety of solutions to this plight. For example, the technology of natural language processing (NLP) has emerged bridging the gap between free text and structured representation of cancer information. Recently, several researchers have published their work on unearthing cancer-related information in EHRs based on the NLP technology. Apart from the traditional NLP methods, the development of deep learning helps EHRs mining go further.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
Alberto Ortiz ◽  
Jose' M Portoles ◽  
M A Dolores Del Pino Y Pino ◽  
Jesus Barea ◽  
Carlos Del Rio ◽  
...  

Abstract Background and Aims Secondary hyperparathyroidism (SHPT) is common in patients suffering from chronic kidney disease (CKD) and worsens over time in patients undergoing haemodialysis (HD). The clinical management of SHPT in HD patients is challenging. In this context, the SENEFRO-BD-SHPT study aims to access and analyse the free-text narratives in the electronic health records (EHRs) of patients with SHPT undergoing HD to characterize their demographic and clinical characteristics, including comorbidities, medication use, and control of relevant biochemical parameters. Method SENEFRO-BD-SHPT was an observational, retrospective, and multicentre study based on the secondary analysis of EHRs from 8 hospitals from the Spanish National Healthcare Network (Figure 1A). The unstructured clinical data in patients’ EHRs between January 1st 2014 and December 31st 2018 were analysed using the EHRead® technology, based on Natural Language Processing (NLP) and machine learning. We conducted a cross-sectional analysis of all patients at the time of inclusion, hereafter referred to as index date (Figure 1A). For HD patients, the index date was defined as the timepoint when diagnostic criteria for either HD or SHPT were first met, namely a) PTH > 300 pg/ml and/or b) documented use of drugs for the management of SHPT such as calcimimetics, vitamin D or vitamin-D analogues. Follow-up analyses were performed at 6- and 12-months following the index date. Crucially, to guarantee the homogeneity and quality of the data, we only considered HD patients with SHPT with available PTH values at baseline and at least at one time point during follow up. Results From a source population of 3,290,365 EHRs in the hospitals catchment area, a total of 623 patients with SHPT undergoing HD were found. Of these, 282 patients had available PTH data (Figure 1A). Regarding demographic characteristics, 68.4% patients (n = 193) were male, with a mean (±SD) age of 67.1 (±15.4) years. The most common causes of CKD were diabetic nephropathy (29%; n = 81), hypertensive/renal vascular disease (24.4%; n = 68), tubulointerstitial disease (19.3%; n = 54), and glomerular disease (12.5%; n = 35) (Figure 1B). The most frequent comorbidities in patients’ EHRs at index date were hypertension (83.7%, n = 236), type 2 diabetes (43.6%; n = 123), and heart failure (34.8%; n = 98) (Figure 1B). The percentage of patients with controlled PTH, calcium (Ca), or phosphorus (P) values at index date and during follow up is shown in Figure 1C; overall, these values remained stable across the study period. Finally, Figure 1D displays the use of selected SHPT-related medications at index date, namely vitamin-D and analogues (63.1%; n = 178), phosphate binders (46.8%; n = 132), and calcimimetics (9.6%; n = 27). Conclusion SENEFRO-BD-SHPT represents the first attempt to use clinical NLP and big data analytics to offer an updated picture of patients with SHPT undergoing HD in Spain based on unstructured clinical data. NLP holds great potential to characterize the epidemiology and clinical management of CKD using real-world evidence in EHRs. However, free-text narratives in the EHRs may be suboptimal to study analytical variables. Funding: Unrestricted grant from Amgen to SENEFRO


2019 ◽  
Author(s):  
J. Harry Caufield ◽  
Yichao Zhou ◽  
Yunsheng Bai ◽  
David A. Liem ◽  
Anders O. Garlid ◽  
...  

AbstractWe have developed ACROBAT (Annotation for Case Reports using Open Biomedical Annotation Terms), a typing system for detailed information extraction from clinical text. This resource supports detailed identification and categorization of entities, events, and relations within clinical text documents, including clincal case reports (CCRs) and the free-text components of electronic health records. Using ACROBAT and the text of 200 CCRs, we annotated a wide variety of real-world clinical disease presentations. The resulting dataset, MACCROBAT2018, is a rich collection of annotated clinical language appropriate for training biomedical natural language processing systems.


2011 ◽  
Vol 7 (4) ◽  
pp. e15-e19 ◽  
Author(s):  
Jeremy L. Warner ◽  
Peter Anick ◽  
Pengyu Hong ◽  
Nianwen Xue

The widespread adoption of electronic health records within the oncology community is creating rich databases that contain details of the cancer care continuum. Large portions of this information are locked up in free text, but several efforts are underway to address this.


2018 ◽  
Vol 23 (3) ◽  
pp. 175-191
Author(s):  
Anneke Annassia Putri Siswadi ◽  
Avinanta Tarigan

To fulfill the prospective student's information need about student admission, Gunadarma University has already many kinds of services which are time limited, such as website, book, registration place, Media Information Center, and Question Answering’s website (UG-Pedia). It needs a service that can serve them anytime and anywhere. Therefore, this research is developing the UGLeo as a web based QA intelligence chatbot application for Gunadarma University's student admission portal. UGLeo is developed by MegaHal style which implements the Markov Chain method. In this research, there are some modifications in MegaHal style, those modifications are the structure of natural language processing and the structure of database. The accuracy of UGLeo reply is 65%. However, to increase the accuracy there are some improvements to be applied in UGLeo system, both improvement in natural language processing and improvement in MegaHal style.


Author(s):  
Mario Jojoa Acosta ◽  
Gema Castillo-Sánchez ◽  
Begonya Garcia-Zapirain ◽  
Isabel de la Torre Díez ◽  
Manuel Franco-Martín

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.


2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


Sign in / Sign up

Export Citation Format

Share Document