scholarly journals CardioNet: a manually curated database for artificial intelligence-based research on cardiovascular diseases

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Imjin Ahn ◽  
Wonjun Na ◽  
Osung Kwon ◽  
Dong Hyun Yang ◽  
Gyung-Min Park ◽  
...  

Abstract Background Cardiovascular diseases (CVDs) are difficult to diagnose early and have risk factors that are easy to overlook. Early prediction and personalization of treatment through the use of artificial intelligence (AI) may help clinicians and patients manage CVDs more effectively. However, to apply AI approaches to CVDs data, it is necessary to establish and curate a specialized database based on electronic health records (EHRs) and include pre-processed unstructured data. Methods To build a suitable database (CardioNet) for CVDs that can utilize AI technology, contributing to the overall care of patients with CVDs. First, we collected the anonymized records of 748,474 patients who had visited the Asan Medical Center (AMC) or Ulsan University Hospital (UUH) because of CVDs. Second, we set clinically plausible criteria to remove errors and duplication. Third, we integrated unstructured data such as readings of medical examinations with structured data sourced from EHRs to create the CardioNet. We subsequently performed natural language processing to structuralize the significant variables associated with CVDs because most results of the principal CVD-related medical examinations are free-text readings. Additionally, to ensure interoperability for convergent multi-center research, we standardized the data using several codes that correspond to the common data model. Finally, we created the descriptive table (i.e., dictionary of the CardioNet) to simplify access and utilization of data for clinicians and engineers and continuously validated the data to ensure reliability. Results CardioNet is a comprehensive database that can serve as a training set for AI models and assist in all aspects of clinical management of CVDs. It comprises information extracted from EHRs and results of readings of CVD-related digital tests. It consists of 27 tables, a code-master table, and a descriptive table. Conclusions CardioNet database specialized in CVDs was established, with continuing data collection. We are actively supporting multi-center research, which may require further data processing, depending on the subject of the study. CardioNet will serve as the fundamental database for future CVD-related research projects.

2021 ◽  
pp. 1063293X2098297
Author(s):  
Ivar Örn Arnarsson ◽  
Otto Frost ◽  
Emil Gustavsson ◽  
Mats Jirstrand ◽  
Johan Malmqvist

Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested.


2017 ◽  
Vol 9 (1) ◽  
Author(s):  
Dino P. Rumoro ◽  
Shital C. Shah ◽  
Gillian S. Gibbs ◽  
Marilyn M. Hallock ◽  
Gordon M. Trenholme ◽  
...  

ObjectiveTo explain the utility of using an automated syndromic surveillanceprogram with advanced natural language processing (NLP) to improveclinical quality measures reporting for influenza immunization.IntroductionClinical quality measures (CQMs) are tools that help measure andtrack the quality of health care services. Measuring and reportingCQMs helps to ensure that our health care system is deliveringeffective, safe, efficient, patient-centered, equitable, and timely care.The CQM for influenza immunization measures the percentage ofpatients aged 6 months and older seen for a visit between October1 and March 31 who received (or reports previous receipt of) aninfluenza immunization. Centers for Disease Control and Preventionrecommends that everyone 6 months of age and older receive aninfluenza immunization every season, which can reduce influenza-related morbidity and mortality and hospitalizations.MethodsPatients at a large academic medical center who had a visit toan affiliated outpatient clinic during June 1 - 8, 2016 were initiallyidentified using their electronic medical record (EMR). The 2,543patients who were selected did not have documentation of influenzaimmunization in a discrete field of the EMR. All free text notes forthese patients between August 1, 2015 and March 31, 2016 wereretrieved and analyzed using the sophisticated NLP built withinGeographic Utilization of Artificial Intelligence in Real-Timefor Disease Identification and Alert Notification (GUARDIAN)– a syndromic surveillance program – to identify any mention ofinfluenza immunization. The goal was to identify additional cases thatmet the CQM measure for influenza immunization and to distinguishdocumented exceptions. The patients with influenza immunizationmentioned were further categorized by GUARDIAN NLP intoReceived, Recommended, Refused, Allergic, and Unavailable.If more than one category was applicable for a patient, they wereindependently counted in their respective categories. A descriptiveanalysis was conducted, along with manual review of a sample ofcases per each category.ResultsFor the 2,543 patients who did not have influenza immunizationdocumentation in a discrete field of the EMR, a total of 78,642 freetext notes were processed using GUARDIAN. Four hundred fiftythree (17.8%) patients had some mention of influenza immunizationwithin the notes, which could potentially be utilized to meet the CQMinfluenza immunization requirement. Twenty two percent (n=101)of patients mentioned already having received the immunizationwhile 34.7% (n=157) patients refused it during the study time frame.There were 27 patients with the mention of influenza immunization,who could not be differentiated into a specific category. The numberof patients placed into a single category of influenza immunizationwas 351 (77.5%), while 75 (16.6%) were classified into more thanone category. See Table 1.ConclusionsUsing GUARDIAN’s NLP can identify additional patients whomay meet the CQM measure for influenza immunization or whomay be exempt. This tool can be used to improve CQM reportingand improve overall influenza immunization coverage by using it toalert providers. Next steps involve further refinement of influenzaimmunization categories, automating the process of using the NLPto identify and report additional cases, as well as using the NLP forother CQMs.Table 1. Categorization of influenza immunization documentation within freetext notes of 453 patients using NLP


2019 ◽  
Vol 33 (1) ◽  
pp. 34-38 ◽  
Author(s):  
Noah H. Crampton

Studies show that clinicians are increasingly burning out in large part from the clerical burden associated with using Electronic Medical Record (EMR) systems. At the same time, recently developed health data analytic algorithms struggle with poor quality free-text entered data in these systems. We developed AutoScribe using artificial intelligence–based natural language processing tools to automate these clerical tasks and to output high-quality EMR data. In this article, we describe the benefits and drawbacks of our technology. Furthermore, we describe how we are positioning our company’s culture within the existing healthcare system and suggest steps leaders of the system should consider in order to ensure that potentially transformative artificial intelligence–based technologies like ours are optimally adopted.


Author(s):  
Carlos Del Rio-Bermudez ◽  
Ignacio H. Medrano ◽  
Laura Yebes ◽  
Jose Luis Poveda

Abstract The digitalization of health and medicine and the growing availability of electronic health records (EHRs) has encouraged healthcare professionals and clinical researchers to adopt cutting-edge methodologies in the realms of artificial intelligence (AI) and big data analytics to exploit existing large medical databases. In Hospital and Health System pharmacies, the application of natural language processing (NLP) and machine learning to access and analyze the unstructured, free-text information captured in millions of EHRs (e.g., medication safety, patients’ medication history, adverse drug reactions, interactions, medication errors, therapeutic outcomes, and pharmacokinetic consultations) may become an essential tool to improve patient care and perform real-time evaluations of the efficacy, safety, and comparative effectiveness of available drugs. This approach has an enormous potential to support share-risk agreements and guide decision-making in pharmacy and therapeutics (P&T) Committees.


2021 ◽  
Vol 8 ◽  
Author(s):  
Han Qian ◽  
Bin Dong ◽  
Jia-jun Yuan ◽  
Fan Yin ◽  
Zhao Wang ◽  
...  

Artificial intelligence (AI) has been deeply applied in the medical field and has shown broad application prospects. Pre-consultation system is an important supplement to the traditional face-to-face consultation. The combination of the AI and the pre-consultation system can help to raise the efficiency of the clinical work. However, it is still challenging for the AI to analyze and process the complicated electronic health record (EHR) data. Our pre-consultation system uses an automated natural language processing (NLP) system to communicate with the patients through the mobile terminals, applying the deep learning (DL) techniques to extract the symptomatic information, and finally outputs the structured electronic medical records. From November 2019 to May 2020, a total of 2,648 pediatric patients used our model to provide their medical history and get the primary diagnosis before visiting the physicians in the outpatient department of the Shanghai Children's Medical Center. Our task is to evaluate the ability of the AI and doctors to obtain the primary diagnosis and to analyze the effect of the consistency between the medical history described by our model and the physicians on the diagnostic performance. The results showed that if we do not consider whether the medical history recorded by the AI and doctors was consistent or not, our model performed worse compared to the physicians and had a lower average F1 score (0.825 vs. 0.912). However, when the chief complaint or the history of present illness described by the AI and doctors was consistent, our model had a higher average F1 score and was closer to the doctors. Finally, when the AI had the same diagnostic conditions with doctors, our model achieved a higher average F1 score (0.931) compared to the physicians (0.92). This study demonstrated that our model could obtain a more structured medical history and had a good diagnostic logic, which would help to improve the diagnostic accuracy of the outpatient doctors and reduce the misdiagnosis and missed diagnosis. But, our model still needs a good deal of training to obtain more accurate symptomatic information.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Alistair E. W. Johnson ◽  
Tom J. Pollard ◽  
Seth J. Berkowitz ◽  
Nathaniel R. Greenbaum ◽  
Matthew P. Lungren ◽  
...  

AbstractChest radiography is an extremely powerful imaging modality, allowing for a detailed inspection of a patient’s chest, but requires specialized training for proper interpretation. With the advent of high performance general purpose computer vision algorithms, the accurate automated analysis of chest radiographs is becoming increasingly of interest to researchers. Here we describe MIMIC-CXR, a large dataset of 227,835 imaging studies for 65,379 patients presenting to the Beth Israel Deaconess Medical Center Emergency Department between 2011–2016. Each imaging study can contain one or more images, usually a frontal view and a lateral view. A total of 377,110 images are available in the dataset. Studies are made available with a semi-structured free-text radiology report that describes the radiological findings of the images, written by a practicing radiologist contemporaneously during routine clinical care. All images and reports have been de-identified to protect patient privacy. The dataset is made freely available to facilitate and encourage a wide range of research in computer vision, natural language processing, and clinical data mining.


10.2196/22806 ◽  
2021 ◽  
Vol 23 (3) ◽  
pp. e22806
Author(s):  
Zhijun Yin ◽  
Yongtai Liu ◽  
Allison B McCoy ◽  
Bradley A Malin ◽  
Patricia R Sengstack

Background Documentation burden is a common problem with modern electronic health record (EHR) systems. To reduce this burden, various recording methods (eg, voice recorders or motion sensors) have been proposed. However, these solutions are in an early prototype phase and are unlikely to transition into practice in the near future. A more pragmatic alternative is to directly modify the implementation of the existing functionalities of an EHR system. Objective This study aims to assess the nature of free-text comments entered into EHR flowsheets that supplement quantitative vital sign values and examine opportunities to simplify functionality and reduce documentation burden. Methods We evaluated 209,055 vital sign comments in flowsheets that were generated in the Epic EHR system at the Vanderbilt University Medical Center in 2018. We applied topic modeling, as well as the natural language processing Clinical Language Annotation, Modeling, and Processing software system, to extract generally discussed topics and detailed medical terms (expressed as probability distribution) to investigate the stories communicated in these comments. Results Our analysis showed that 63.33% (6053/9557) of the users who entered vital signs made at least one free-text comment in vital sign flowsheet entries. The user roles that were most likely to compose comments were registered nurse, technician, and licensed nurse. The most frequently identified topics were the notification of a result to health care providers (0.347), the context of a measurement (0.307), and an inability to obtain a vital sign (0.224). There were 4187 unique medical terms that were extracted from 46,029 (0.220) comments, including many symptom-related terms such as “pain,” “upset,” “dizziness,” “coughing,” “anxiety,” “distress,” and “fever” and drug-related terms such as “tylenol,” “anesthesia,” “cannula,” “oxygen,” “motrin,” “rituxan,” and “labetalol.” Conclusions Considering that flowsheet comments are generally not displayed or automatically pulled into any clinical notes, our findings suggest that the flowsheet comment functionality can be simplified (eg, via structured response fields instead of a text input dialog) to reduce health care provider effort. Moreover, rich and clinically important medical terms such as medications and symptoms should be explicitly recorded in clinical notes for better visibility.


Author(s):  
Arron S Lacey ◽  
Beata Fonferko-Shadrach ◽  
Ronan A Lyons ◽  
Mike P Kerr ◽  
David V Ford ◽  
...  

ABSTRACT BackgroundFree text documents in healthcare settings contain a wealth of information not captured in electronic healthcare records (EHRs). Epilepsy clinic letters are an example of an unstructured data source containing a large amount of intricate disease information. Extracting meaningful and contextually correct clinical information from free text sources, to enhance EHRs, remains a significant challenge. SCANR (Swansea University Collaborative in the Analysis of NLP Research) was set up to use natural language processing (NLP) technology to extract structured data from unstructured sources. IBM Watson Content Analytics software (ICA) uses NLP technology. It enables users to define annotations based on dictionaries and language characteristics to create parsing rules that highlight relevant items. These include clinical details such as symptoms and diagnoses, medication and test results, as well as personal identifiers.   ApproachTo use ICA to build a pipeline to accurately extract detailed epilepsy information from clinic letters. MethodsWe used ICA to retrieve important epilepsy information from 41 pseudo-anonymized unstructured epilepsy clinic letters. The 41 letters consisted of 13 ‘new’ and 28 ‘follow-up’ letters (for 15 different patients) written by 12 different doctors in different styles. We designed dictionaries and annotators to enable ICA to extract epilepsy type (focal, generalized or unclassified), epilepsy cause, age of onset, investigation results (EEG, CT and MRI), medication, and clinic date. Epilepsy clinicians assessed the accuracy of the pipeline. ResultsThe accuracy (sensitivity, specificity) of each concept was: epilepsy diagnosis 98% (97%, 100%), focal epilepsy 100%, generalized epilepsy 98% (93%, 100%), medication 95% (93%, 100%), age of onset 100% and clinic date 95% (95%, 100%). Precision and recall for each concept were respectively, 98% and 97% for epilepsy diagnosis, 100% each for focal epilepsy, 100% and 93% for generalized epilepsy, 100% each for age of onset, 100% and 93% for medication, 100% and 96% for EEG results, 100% and 83% for MRI scan results, and 100% and 95% for clinic date. Conclusions ICA is capable of extracting detailed, structured epilepsy information from unstructured clinic letters to a high degree of accuracy. This data can be used to populate relational databases and be linked to EHRs. Researchers can build in custom rules to identify concepts of interest from letters and produce structured information. We plan to extend our work to hundreds and then thousands of clinic letters, to provide phenotypically rich epilepsy data to link with other anonymised, routinely collected data.


Author(s):  
Marja Härkänen ◽  
Kaisa Haatainen ◽  
Katri Vehviläinen-Julkunen ◽  
Merja Miettinen

The purpose of this study was to describe incident reporters’ views identified by artificial intelligence concerning the prevention of medication incidents that were assessed, causing serious or moderate harm to patients. The information identified the most important risk management areas in these medication incidents. This was a retrospective record review using medication-related incident reports from one university hospital in Finland between January 2017 and December 2019 (n = 3496). Of these, incidents that caused serious or moderate harm to patients (n = 137) were analysed using artificial intelligence. Artificial intelligence classified reporters’ views on preventing incidents under the following main categories: (1) treatment, (2) working, (3) practices, and (4) setting and multiple sub-categories. The following risk management areas were identified: (1) verification, documentation and up-to-date drug doses, drug lists and other medication information, (2) carefulness and accuracy in managing medications, (3) ensuring the flow of information and communication regarding medication information and safeguarding continuity of patient care, (4) availability, update and compliance with instructions and guidelines, (5) multi-professional cooperation, and (6) adequate human resources, competence and suitable workload. Artificial intelligence was found to be useful and effective to classifying text-based data, such as the free text of incident reports.


Author(s):  
Ivar Örn Arnarsson ◽  
Otto Frost ◽  
Emil Gustavsson ◽  
Daniel Stenholm ◽  
Mats Jirstrand ◽  
...  

AbstractProduct development companies are collecting data in form of Engineering Change Requests for logged design issues and Design Guidelines to accumulate best practices. These documents are rich in unstructured data (e.g., free text) and previous research has pointed out that product developers find current it systems lacking capabilities to accurately retrieve relevant documents with unstructured data. In this research we compare the performance of Search Engine & Natural Language Processing algorithms in order to find fast related documents from two databases with Engineering Change Request and Design Guideline documents. The aim is to turn hours of manual documents searching into seconds by utilizing such algorithms to effectively search for related engineering documents and rank them in order of significance. Domain knowledge experts evaluated the results and it shows that the models applied managed to find relevant documents with up to 90% accuracy of the cases tested. But accuracy varies based on selected algorithm and length of query.


Sign in / Sign up

Export Citation Format

Share Document