scholarly journals Clinical Text Mining of Electronic Health Records to Classify Leprosy Patients Cases

Leprosy is one of the major public health problems and listed among the neglected tropical diseases in India. It is also called Hansen's Diseases (HD), which is a long haul contamination by microorganisms Mycobacterium leprae or Mycobacterium lepromatosis.Untreated, leprosy can dynamic and changeless harm to the skin, nerves, appendages, and eyes. This paper intends to depict classification of leprosy cases from the main indication of side effects. Electronic Health Records (EHRs) of Leprosy Patients from verified sources have been generated. The clinical notes included in EHRs have been processed through Natural Language Processing Tools. In order to predict type of leprosy, Rule based classification method has been proposed in this paper. Further our approach is compared with various Machine Learning (ML) algorithms like Support Vector Machine (SVM), Logistic regression (LR) and performance parameters are compared.

2021 ◽  
Author(s):  
Ye Seul Bae ◽  
Kyung Hwan Kim ◽  
Han Kyul Kim ◽  
Sae Won Choi ◽  
Taehoon Ko ◽  
...  

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.


2021 ◽  
Author(s):  
Marika Cusick ◽  
Sumithra Velupillai ◽  
Johnny Downs ◽  
Thomas Campion ◽  
Rina Dutta ◽  
...  

Abstract In the global effort to prevent death by suicide, many academic medical institutions are implementing natural language processing (NLP) approaches to detect suicidality from unstructured clinical text in electronic health records (EHRs), with the hope of targeting timely, preventative interventions to individuals most at risk of suicide. Despite the international need, the development of these NLP approaches in EHRs has been largely local and not shared across healthcare systems. In this study, we developed a process to share NLP approaches that were individually developed at King’s College London (KCL), UK and Weill Cornell Medicine (WCM), US - two academic medical centers based in different countries with vastly different healthcare systems. After a successful technical porting of the NLP approaches, our quantitative evaluation determined that independently developed NLP approaches can detect suicidality at another healthcare organization with a different EHR system, clinical documentation processes, and culture, yet do not achieve the same level of success as at the institution where the NLP algorithm was developed (KCL approach: F1-score 0.85 vs. 0.68, WCM approach: F1-score 0.87 vs. 0.72). Shared use of these NLP approaches is a critical step forward towards improving data-driven algorithms for early suicide risk identification and timely prevention.


2019 ◽  
Vol 28 (8) ◽  
pp. 1143-1151 ◽  
Author(s):  
Brian Hazlehurst ◽  
Carla A. Green ◽  
Nancy A. Perrin ◽  
John Brandes ◽  
David S. Carrell ◽  
...  

2020 ◽  
Vol 23 (1) ◽  
pp. 21-26 ◽  
Author(s):  
Nemanja Vaci ◽  
Qiang Liu ◽  
Andrey Kormilitzin ◽  
Franco De Crescenzo ◽  
Ayse Kurtulmus ◽  
...  

BackgroundUtilisation of routinely collected electronic health records from secondary care offers unprecedented possibilities for medical science research but can also present difficulties. One key issue is that medical information is presented as free-form text and, therefore, requires time commitment from clinicians to manually extract salient information. Natural language processing (NLP) methods can be used to automatically extract clinically relevant information.ObjectiveOur aim is to use natural language processing (NLP) to capture real-world data on individuals with depression from the Clinical Record Interactive Search (CRIS) clinical text to foster the use of electronic healthcare data in mental health research.MethodsWe used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, we built and fine-tuned the statistical models using active learning procedures.FindingsResults show a high degree of accuracy in the extraction of drug-related information. Contrastingly, a much lower degree of accuracy is demonstrated in relation to auxiliary variables. In combination with state-of-the-art active learning paradigms, the performance of the model increases considerably.ConclusionsThis study illustrates the feasibility of using the natural language processing models and proposes a research pipeline to be used for accurately extracting information from electronic health records.Clinical implicationsReal-world, individual patient data are an invaluable source of information, which can be used to better personalise treatment.


2020 ◽  
Author(s):  
Nicholas B. Link ◽  
Selena Huang ◽  
Tianrun Cai ◽  
Zeling He ◽  
Jiehuan Sun ◽  
...  

ABSTRACTObjectiveThe use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce an unsupervised method for acronym disambiguation, the task of classifying the correct sense of acronyms in the clinical EHR notes.MethodsWe developed an unsupervised ensemble machine learning (CASEml) algorithm to automatically classify acronyms by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard unsupervised method and a baseline metric selecting the most frequent acronym sense. We additionally evaluated the effects of RA disambiguation on NLP-driven phenotyping of rheumatoid arthritis.ResultsCASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art unsupervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis.ConclusionCASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and unsupervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.


Author(s):  
Jonah Kenei ◽  
Elisha Opiyo ◽  
Robert Oboko

The increasing use of Electronic Health Records (EHRs) in healthcare delivery settings has led to increase availability of electronic clinical data. They generate a lot of patients’ clinical data each day, requiring physicians to review them to find clinically relevant information of different patients during care episodes. The availability of electronically collected healthcare data has created the need of computational tools to analyze them. One of the types of data which doctors have access to is clinical notes that resides in electronic health records. These notes are useful as they provide comprehensive information about patients’ health histories with many practical uses. For example, doctors always review these notes during care episodes to appraise themselves about the health history of a patient. These reviews are currently manual where a doctor reads a patient’s chart while looking for specific clinical information. Without the proper support, this manual process leads to information overload and increases physician cognitive workload. Current electronic health records (EHRs) do not provide support to help physicians reduce cognitive workload when completing clinical tasks. This is especially true for long clinical documents which require quick review at the point of care. The growing amount of clinical documentation available in EHRs has arose the need of tools that support synthesize of information in EHRs. The use of visual analytics to explore healthcare data is one such research direction to address this problem. However, existing visualization techniques are mainly based on structured electronic health record and rarely support therapeutic activities. Therefore, visualization of unstructured clinical records to support clinical practice is required. In this paper we propose a unique approach for graphically representing and visualizing the semantic structure of a clinical text document to aid doctors in reviewing electronic clinical notes. A user evaluation demonstrates that the proposed method for visualizing and navigating a document’s semantic structure facilitates a user’s document information exploration.


Author(s):  
Rebecka Weegar ◽  
Alicia Pérez ◽  
Arantza Casillas ◽  
Maite Oronoz

Abstract Background Text mining and natural language processing of clinical text, such as notes from electronic health records, requires specific consideration of the specialized characteristics of these texts. Deep learning methods could potentially mitigate domain specific challenges such as limited access to in-domain tools and data sets. Methods A bi-directional Long Short-Term Memory network is applied to clinical notes in Spanish and Swedish for the task of medical named entity recognition. Several types of embeddings, both generated from in-domain and out-of-domain text corpora, and a number of generation and combination strategies for embeddings have been evaluated in order to investigate different input representations and the influence of domain on the final results. Results For Spanish, a micro averaged F1-score of 75.25 was obtained and for Swedish, the corresponding score was 76.04. The best results for both languages were achieved using embeddings generated from in-domain corpora extracted from electronic health records, but embeddings generated from related domains were also found to be beneficial. Conclusions A recurrent neural network with in-domain embeddings improved the medical named entity recognition compared to shallow learning methods, showing this combination to be suitable for entity recognition in clinical text for both languages.


2015 ◽  
Vol 22 (6) ◽  
pp. 1220-1230 ◽  
Author(s):  
Huan Mo ◽  
William K Thompson ◽  
Luke V Rasmussen ◽  
Jennifer A Pacheco ◽  
Guoqian Jiang ◽  
...  

Abstract Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.


2019 ◽  
Author(s):  
Philip Held ◽  
Randy A Boley ◽  
Walter G Faig ◽  
John A O'Toole ◽  
Imran Desai ◽  
...  

UNSTRUCTURED Electronic health records (EHRs) offer opportunities for research and improvements in patient care. However, challenges exist in using data from EHRs due to the volume of information existing within clinical notes, which can be labor intensive and costly to transform into usable data with existing strategies. This case report details the collaborative development and implementation of the postencounter form (PEF) system into the EHR at the Road Home Program at Rush University Medical Center in Chicago, IL to address these concerns with limited burden to clinical workflows. The PEF system proved to be an effective tool with over 98% of all clinical encounters including a completed PEF within 5 months of implementation. In addition, the system has generated over 325,188 unique, readily-accessible data points in under 4 years of use. The PEF system has since been deployed to other settings demonstrating that the system may have broader clinical utility.


Sign in / Sign up

Export Citation Format

Share Document