scholarly journals An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records

2021 ◽  
Author(s):  
Varvara Koshman ◽  
Anastasia Funkner ◽  
Sergey Kovalchuk

Electronic Medical Records (EMR) contain a lot of valuable data about patients, which is however unstructured. There is a lack of labeled medical text data in Russian and there are no tools for automatic annotation. We present an unsupervised approach to medical data annotation. Morphological and syntactical analyses of initial sentences produce syntactic trees, from which similar subtrees are then grouped by Word2Vec and labeled using dictionaries and Wikidata categories. This method can be used to automatically label EMRs in Russian and proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabularies.

2022 ◽  
Vol 12 (1) ◽  
pp. 25
Author(s):  
Varvara Koshman ◽  
Anastasia Funkner ◽  
Sergey Kovalchuk

Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary.


Author(s):  
Karen Tu ◽  
Julie Klein-Geltink ◽  
Tezeta F Mitiku ◽  
Chiriac Mihai ◽  
Joel Martin

2019 ◽  
Author(s):  
Hsien-Liang Huang ◽  
Yun-Cheng Tsai ◽  
Shi-Hao Hong ◽  
Ya-Mei Hsueh

BACKGROUND Smoking is a complex behavior associated with multiple factors such as personality, environment, genetics, and emotions. Text data is a rich source of information. However, pure text data requires substantial human resources and time to extract and apply the information, resulting in many details not being discovered and used. OBJECTIVE This study proposes a novel approach that explores a text mining flow to capture the behavior of smokers quitting tobacco from their free-text medical records. More importantly, the paper explores the impact of these changes on smokers. The goal is to help smokers quit smoking. Therefore, the paper develops an algorithm for analyzing smoking cessation treatment plans documented in free-text medical records. METHODS The approach involves the development of an information extraction flow that uses a combination of data mining techniques, including text mining. It can be used not only to help others quit smoking but also for other medical records with similar data elements. RESULTS In the paper, the most visible areas for the medical application of text mining are the integration and transfer of advances made in basic sciences, as well as a better understanding of the processes involved in smoking cessation. CONCLUSIONS Text mining may also be useful for supporting decision-making processes associated with smoking cessation.


2021 ◽  
Author(s):  
Youcheng Pan ◽  
Chenghao Wang ◽  
Baotian Hu ◽  
Yang Xiang ◽  
Xiaolong Wang ◽  
...  

BACKGROUND Electronic medical records (EMRs) are usually stored in relational databases that require structured query language (SQL) queries to retrieve information of interest. Effectively completing such queries is usually a challenging task for medical experts due to the barriers in expertise. However, existing text-to-SQL generation studies have not been fully embraced in the medical domain. OBJECTIVE The objective of this study was to propose a neural generation model, which can jointly consider the characteristics of medical text and the SQL structure, to automatically transform medical texts to SQL queries for EMRs. METHODS In contrast to regarding the SQL query as an ordinary word sequence, the syntax tree, introduced as an intermediate representation, is more in line with the tree-structure nature of SQL and also can effectively reduce the search space during generation. We proposed a medical text-to-SQL model (MedTS), which employed a pre-trained BERT as the encoder and leveraged a grammar-based LSTM as the decoder to predict the tree-structured intermediate representation that can be easily transformed to the final SQL query. Experiments are conducted on the MIMICSQL dataset and five competitor methods are compared. RESULTS Experimental results demonstrated that MedTS achieved the accuracy of 0.770 and 0.888 on the test set in terms of logic form and execution respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and has substantial improvements. CONCLUSIONS The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potentials to be applied in the real medical scenario.


2019 ◽  
Vol 10 (S1) ◽  
Author(s):  
Hegler Tissot ◽  
Richard Dobson

Abstract Background There is an increasing amount of unstructured medical data that can be analysed for different purposes. However, information extraction from free text data may be particularly inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, coupled with a supporting dictionary. However, they are not rich enough to encode both typing and phonetic misspellings. Results Experimental results showed a joint string and language-dependent phonetic similarity is more accurate than traditional string distance metrics when identifying misspelt names of drugs in a set of medical records written in Portuguese. Conclusion We present a hybrid approach to efficiently perform similarity match that overcomes the loss of information inherit from using either exact match search or string based similarity search methods.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0247404
Author(s):  
Akshaya V. Annapragada ◽  
Marcella M. Donaruma-Kwoh ◽  
Ananth V. Annapragada ◽  
Zbigniew A. Starosolski

Child physical abuse is a leading cause of traumatic injury and death in children. In 2017, child abuse was responsible for 1688 fatalities in the United States, of 3.5 million children referred to Child Protection Services and 674,000 substantiated victims. While large referral hospitals maintain teams trained in Child Abuse Pediatrics, smaller community hospitals often do not have such dedicated resources to evaluate patients for potential abuse. Moreover, identification of abuse has a low margin of error, as false positive identifications lead to unwarranted separations, while false negatives allow dangerous situations to continue. This context makes the consistent detection of and response to abuse difficult, particularly given subtle signs in young, non-verbal patients. Here, we describe the development of artificial intelligence algorithms that use unstructured free-text in the electronic medical record—including notes from physicians, nurses, and social workers—to identify children who are suspected victims of physical abuse. Importantly, only the notes from time of first encounter (e.g.: birth, routine visit, sickness) to the last record before child protection team involvement were used. This allowed us to develop an algorithm using only information available prior to referral to the specialized child protection team. The study was performed in a multi-center referral pediatric hospital on patients screened for abuse within five different locations between 2015 and 2019. Of 1123 patients, 867 records were available after data cleaning and processing, and 55% were abuse-positive as determined by a multi-disciplinary team of clinical professionals. These electronic medical records were encoded with three natural language processing (NLP) algorithms—Bag of Words (BOW), Word Embeddings (WE), and Rules-Based (RB)—and used to train multiple neural network architectures. The BOW and WE encodings utilize the full free-text, while RB selects crucial phrases as identified by physicians. The best architecture was selected by average classification accuracy for the best performing model from each train-test split of a cross-validation experiment. Natural language processing coupled with neural networks detected cases of likely child abuse using only information available to clinicians prior to child protection team referral with average accuracy of 0.90±0.02 and average area under the receiver operator characteristic curve (ROC-AUC) 0.93±0.02 for the best performing Bag of Words models. The best performing rules-based models achieved average accuracy of 0.77±0.04 and average ROC-AUC 0.81±0.05, while a Word Embeddings strategy was severely limited by lack of representative embeddings. Importantly, the best performing model had a false positive rate of 8%, as compared to rates of 20% or higher in previously reported studies. This artificial intelligence approach can help screen patients for whom an abuse concern exists and streamline the identification of patients who may benefit from referral to a child protection team. Furthermore, this approach could be applied to develop computer-aided-diagnosis platforms for the challenging and often intractable problem of reliably identifying pediatric patients suffering from physical abuse.


Sign in / Sign up

Export Citation Format

Share Document