GLOBAL RULE INDUCTION FOR INFORMATION EXTRACTION

2004 ◽  
Vol 13 (04) ◽  
pp. 813-828 ◽  
Author(s):  
JING XIAO ◽  
TAT-SENG CHUA ◽  
JIMIN LIU

The ability to extract desired pieces of information from natural language texts is an important task with a growing number of potential applications. This paper presents a novel pattern rule induction learning system, GRID, which emphasizes the use of global feature distribution in all of the training instances in order to make better decision on rule induction. GRID uses chunks as contextual units instead of tokens, and incorporates features at lexical, syntactical and semantic levels simultaneously. The features chosen in GRID are general and they were applied successfully to both semi-structured text and free text. Our experimental results on some publicly available webpage corpora and MUC-4 test set indicate that our approach is effective.

JAMIA Open ◽  
2021 ◽  
Vol 4 (3) ◽  
Author(s):  
Craig H Ganoe ◽  
Weiyi Wu ◽  
Paul J Barr ◽  
William Haslett ◽  
Michelle D Dannenberg ◽  
...  

Abstract Objectives The objective of this study is to build and evaluate a natural language processing approach to identify medication mentions in primary care visit conversations between patients and physicians. Materials and Methods Eight clinicians contributed to a data set of 85 clinic visit transcripts, and 10 transcripts were randomly selected from this data set as a development set. Our approach utilizes Apache cTAKES and Unified Medical Language System controlled vocabulary to generate a list of medication candidates in the transcribed text and then performs multiple customized filters to exclude common false positives from this list while including some additional common mentions of the supplements and immunizations. Results Sixty-five transcripts with 1121 medication mentions were randomly selected as an evaluation set. Our proposed method achieved an F-score of 85.0% for identifying the medication mentions in the test set, significantly outperforming existing medication information extraction systems for medical records with F-scores ranging from 42.9% to 68.9% on the same test set. Discussion Our medication information extraction approach for primary care visit conversations showed promising results, extracting about 27% more medication mentions from our evaluation set while eliminating many false positives in comparison to existing baseline systems. We made our approach publicly available on the web as an open-source software. Conclusion Integration of our annotation system with clinical recording applications has the potential to improve patients’ understanding and recall of key information from their clinic visits, and, in turn, to positively impact health outcomes.


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Zhongyu Anna Liu ◽  
Muhammad Mamdani ◽  
Richard Aviv ◽  
Chloe Pou-Prom ◽  
Amy Yu

Introduction: Diagnostic imaging reports contain important data for stroke surveillance and clinical research but converting a large amount of free-text data into structured data with manual chart abstraction is resource-intensive. We determined the accuracy of CHARTextract, a natural language processing (NLP) tool, to extract relevant stroke-related attributes from full reports of computed tomograms (CT), CT angiograms (CTA), and CT perfusion (CTP) performed at a tertiary stroke centre. Methods: We manually extracted data from full reports of 1,320 consecutive CT/CTA/CTP performed between October 2017 and January 2019 in patients presenting with acute stroke. Trained chart abstractors collected data on the presence of anterior proximal occlusion, basilar occlusion, distal intracranial occlusion, established ischemia, haemorrhage, the laterality of these lesions, and ASPECT scores, all of which were used as a reference standard. Reports were then randomly split into a training set (n= 921) and validation set (n= 399). We used CHARTextract to extract the same attributes by creating rule-based information extraction pipelines. The rules were human-defined and created through an iterative process in the training sample and then validated in the validation set. Results: The prevalence of anterior proximal occlusion was 12.3% in the dataset (n=86 left, n=72 right, and n=4 bilateral). In the training sample, CHARTextract identified this attribute with an overall accuracy of 97.3% (PPV 84.1% and NPV 99.4%, sensitivity 95.5% and specificity 97.5%). In the validation set, the overall accuracy was 95.2% (PPV 76.3% and NPV 98.5%, sensitivity 90.0% and specificity 96.0%). Conclusions: We showed that CHARTextract can identify the presence of anterior proximal vessel occlusion with high accuracy, suggesting that NLP can be used to automate the process of data collection for stroke research. We will present the accuracy of CHARTextract for the remaining neurological attributes at ISC 2020.


Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 107 ◽  
Author(s):  
Teruaki Hayashi ◽  
Yukio Ohsawa

This article addresses the task of inferring elements in the attributes of data. Extracting data related to our interests is a challenging task. Although data on the web can be accessed through free text queries, it is difficult to obtain results that accurately correspond to user intentions because users might not express their objects of interest using exact terms (variables, outlines of data, etc.) found in the data. In other words, users do not always have sufficient knowledge of the data to formulate an effective query. Hence, we propose a method that enables the type, format, and variable elements to be inferred as attributes of data when a natural language summary of the data is provided as a free text query. To evaluate the proposed method, we used the Data Jacket’s datasets whose metadata is written in natural language. The experimental results indicate that our method outperforms those obtained from string matching and word embedding. Applications based on this study can support users who wish to retrieve or acquire new data.


2020 ◽  
Author(s):  
Maciej Rybinski ◽  
Xiang Dai ◽  
Sonit Singh ◽  
Sarvnaz Karimi ◽  
Anthony Nguyen

BACKGROUND The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. OBJECTIVE The aim of this study is to develop automated methods that enable access to FH data through natural language processing. METHODS We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. RESULTS Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (<i>P</i><.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. CONCLUSIONS Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.


1977 ◽  
Vol 16 (03) ◽  
pp. 144-153 ◽  
Author(s):  
E. Vaccari ◽  
W. Delaney ◽  
A. Chiesa

A software system for the automatic free-text analysis and retrieval of radiological reports is presented. Such software involves: (1) automatic translation of the specific natural language in a formalized metalanguage in order to transform the radiological report in a »normalized report« analyzable by computer; (2) content processing of the normalized report to select desired information. The approach used to accomplish point (1) is described in detail referring to a specific application.


Author(s):  
Mario Jojoa Acosta ◽  
Gema Castillo-Sánchez ◽  
Begonya Garcia-Zapirain ◽  
Isabel de la Torre Díez ◽  
Manuel Franco-Martín

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.


2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 183-183
Author(s):  
Javad Razjouyan ◽  
Jennifer Freytag ◽  
Edward Odom ◽  
Lilian Dindo ◽  
Aanand Naik

Abstract Patient Priorities Care (PPC) is a model of care that aligns health care recommendations with priorities of older adults with multiple chronic conditions. Social workers (SW), after online training, document PPC in the patient’s electronic health record (EHR). Our goal is to identify free-text notes with PPC language using a natural language processing (NLP) model and to measure PPC adoption and effect on long term services and support (LTSS) use. Free-text notes from the EHR produced by trained SWs passed through a hybrid NLP model that utilized rule-based and statistical machine learning. NLP accuracy was validated against chart review. Patients who received PPC were propensity matched with patients not receiving PPC (control) on age, gender, BMI, Charlson comorbidity index, facility and SW. The change in LTSS utilization 6-month intervals were compared by groups with univariate analysis. Chart review indicated that 491 notes out of 689 had PPC language and the NLP model reached to precision of 0.85, a recall of 0.90, an F1 of 0.87, and an accuracy of 0.91. Within group analysis shows that intervention group used LTSS 1.8 times more in the 6 months after the encounter compared to 6 months prior. Between group analysis shows that intervention group has significant higher number of LTSS utilization (p=0.012). An automated NLP model can be used to reliably measure the adaptation of PPC by SW. PPC seems to encourage use of LTSS that may delay time to long term care placement.


Sign in / Sign up

Export Citation Format

Share Document