scholarly journals Clinical Validation of the UKMS Register Minimal Dataset utilising Natural Language Processing

Author(s):  
Rod Middleton ◽  
Ashley Akbari ◽  
Hazel Lockhart-Jones ◽  
Jemma Jones ◽  
Charlotte Owen ◽  
...  

ABSTRACT ObjectivesThe UK MS Register is a research project that aims to capture real world data about living with Multiple Sclerosis(MS) in the UK. Launched in 2011, identified data sources were: Directly from People with MS (PwMS) via the internet, from NHS treatment centers via ‘traditional’ database capture and by linkage to routine datasets from the SAIL databank. Data received from the NHS, though ‘gold standard’ in terms of diagnosis, is dependent on clinical staff finding both time and information to enter into a clinical system. System implementations across the NHS are variable, as is clinical time. Therefore, we looked to other complementary methodologies. ApproachThe Clix enrich natural language processing (NLP) software was chosen to see if it could capture a portion of the MS Register minimum clinical dataset, the software matches clinical phrases against SNOMED-CT.40 letters, from 2 NHS Trusts, from 28 patients were loaded. The letters were a mix of MS patients with differing disease subtypes and were dictated by Neurologists, Specialist General Practitioners and MS Specialist Nurses. 20 of the letters were in docx format and 20 as PDF. The letters were parsed by a domain expert for clinical content, scored by data item for sensitivity and specificity. Next the output from the software was scored by another researcher to see if the 12 relevant clinical concepts from the Register dataset had been elicited. Lastly a ruleset was created to look for particular clinical concepts and scored in the same way. ResultsOf the 40 letters one failed to load, the rest were analysed for the specific data items. Date related items were clearly challenging, with only 7% of appointment dates being matched and 22% for date of diagnosis. MS Type (93.3%) and EDSS score (93.75%) were well recognised, additionally symptoms of MS that would be poorly reported in traditional databases were recognised, with fatigue being well highlighted (78.5%) and gait and walking issues (68.7%) Of concern, were a number of false positive results in DMT’s with 15% patients being identified as being on a DMT when this was just being ‘considered’. ConclusionThe NLP pathway could be extremely useful for obtaining hard to capture clinical data for the Register. Further work is needed to reduce errors, even with the current minimal configuration, it's possible to ascertain MS Type, functional score of MS, current medication and potentially disabling symptomology within the condition.

Heart ◽  
2021 ◽  
pp. heartjnl-2021-319769
Author(s):  
Meghan Reading Turchioe ◽  
Alexander Volodarskiy ◽  
Jyotishman Pathak ◽  
Drew N Wright ◽  
James Enlou Tcheng ◽  
...  

Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.


Author(s):  
Kanza Noor Syeda ◽  
Syed Noorulhassan Shirazi ◽  
Syed Asad Ali Naqvi ◽  
Howard J Parkinson ◽  
Gary Bamford

Due to modern powerful computing and the explosion in data availability and advanced analytics, there should be opportunities to use a Big Data approach to proactively identify high risk scenarios on the railway. In this chapter, we comprehend the need for developing machine intelligence to identify heightened risk on the railway. In doing so, we have explained a potential for a new data driven approach in the railway, we then focus the rest of the chapter on Natural Language Processing (NLP) and its potential for analysing accident data. We review and analyse investigation reports of railway accidents in the UK, published by the Rail Accident Investigation Branch (RAIB), aiming to reveal the presence of entities which are informative of causes and failures such as human, technical and external. We give an overview of a framework based on NLP and machine learning to analyse the raw text from RAIB reports which would assist the risk and incident analysis experts to study causal relationship between causes and failures towards the overall safety in the rail industry.


Author(s):  
S. Jeffrey ◽  
J. Richards ◽  
F. Ciravegna ◽  
S. Waller ◽  
S. Chapman ◽  
...  

This paper describes ‘Archaeotools’, a major e-Science project in archaeology. The aim of the project is to use faceted classification and natural language processing to create an advanced infrastructure for archaeological research. The project aims to integrate over 1×10 6 structured database records referring to archaeological sites and monuments in the UK, with information extracted from semi-structured grey literature reports, and unstructured antiquarian journal accounts, in a single faceted browser interface. The project has illuminated the variable level of vocabulary control and standardization that currently exists within national and local monument inventories. Nonetheless, it has demonstrated that the relatively well-defined ontologies and thesauri that exist in archaeology mean that a high level of success can be achieved using information extraction techniques. This has great potential for unlocking and making accessible the information held in grey literature and antiquarian accounts, and has lessons for allied disciplines.


2021 ◽  
Author(s):  
Andrew L Walker ◽  
Cheri Watson ◽  
Ryan Butcher ◽  
Ryan Butcher ◽  
Mark Yandell ◽  
...  

Background: Real-world evidence derived from the electronic medical record (EMR) is increasingly prevalent. How best to ascertain cardiovascular outcomes from EMRs is unknown. We sought to validate a commercially available natural language processing (NLP) software to extract bleeding events. Methods: We included patients with atrial fibrillation and cancer seen at our cancer center from 1/1/2016 to 12/31/2019. A query set based on SNOMED CT expressions was created to represent bleeding from 11 different organ systems. We ran the query against the clinical notes and randomly selected a sample of notes for physician validation. The primary outcome was the positive predictive value (PPV) of the software to identify bleeding events stratified by organ system. Results: We included 1370 patients with mean age 72 years old (SD 1.5) and 35% female. We processed 66,130 notes; the NLP software identified 6522 notes including 654 unique patients with possible bleeding events. Among 1269 randomly selected notes, the PPV of the software ranged from 0.921 for neurologic bleeds to 0.571 for OB/GYN bleeds. Patterns related to false positive bleeding events identified by the software included historic bleeds, hypothetical bleeds, missed negatives, and word errors. Conclusions: NLP may provide an alternative for population-level screening for bleeding outcomes in cardiovascular studies. Human validation is still needed, but an NLP-driven screening approach may improve efficiency. 


2021 ◽  
Vol 1 ◽  
pp. 110
Author(s):  
Irene Buselli ◽  
Luca Oneto ◽  
Carlo Dambra ◽  
Christian Verdonk Gallego ◽  
Miguel García Martínez ◽  
...  

Background: The air traffic management (ATM) system has historically coped with a global increase in traffic demand ultimately leading to increased operational complexity. When dealing with the impact of this increasing complexity on system safety it is crucial to automatically analyse the loss of separation (LoS) using tools able to extract meaningful and actionable information from safety reports. Current research in this field mainly exploits natural language processing (NLP) to categorise the reports, with the limitations that the considered categories need to be manually annotated by experts and that general taxonomies are seldom exploited. Methods: To address the current gaps, authors propose to perform exploratory data analysis on safety reports combining state-of-the-art techniques like topic modelling and clustering and then to develop an algorithm able to extract the Toolkit for ATM Occurrence Investigation (TOKAI) taxonomy factors from the free-text safety reports based on syntactic analysis. TOKAI is a general taxonomy developed by EUROCONTROL and intended to become a standard and harmonised approach to future investigations. Results: Leveraging on the LoS events reported in the public databases of the Comisión de Estudio y Análisis de Notificaciones de Incidentes de Tránsito Aéreo and the United Kingdom Airprox Board, authors show how their proposal is able to automatically extract meaningful and actionable information from safety reports and to classify them according to the TOKAI taxonomy. The quality of the approach is also indirectly validated by checking the connection between the identified factors and the main contributor of the incidents. Conclusions: Authors' results are a promising first step toward the full automation of a general analysis of LoS reports supported by results on real world data coming from two different sources. In the future, authors' proposal could be extended to other taxonomies or tailored to identify factors to be included in the safety taxonomies.


2021 ◽  
pp. 193229682110008
Author(s):  
Alexander Turchin ◽  
Luisa F. Florez Builes

Background: Real-world evidence research plays an increasingly important role in diabetes care. However, a large fraction of real-world data are “locked” in narrative format. Natural language processing (NLP) technology offers a solution for analysis of narrative electronic data. Methods: We conducted a systematic review of studies of NLP technology focused on diabetes. Articles published prior to June 2020 were included. Results: We included 38 studies in the analysis. The majority (24; 63.2%) described only development of NLP tools; the remainder used NLP tools to conduct clinical research. A large fraction (17; 44.7%) of studies focused on identification of patients with diabetes; the rest covered a broad range of subjects that included hypoglycemia, lifestyle counseling, diabetic kidney disease, insulin therapy and others. The mean F1 score for all studies where it was available was 0.882. It tended to be lower (0.817) in studies of more linguistically complex concepts. Seven studies reported findings with potential implications for improving delivery of diabetes care. Conclusion: Research in NLP technology to study diabetes is growing quickly, although challenges (e.g. in analysis of more linguistically complex concepts) remain. Its potential to deliver evidence on treatment and improving quality of diabetes care is demonstrated by a number of studies. Further growth in this area would be aided by deeper collaboration between developers and end-users of natural language processing tools as well as by broader sharing of the tools themselves and related resources.


2019 ◽  
Vol 26 (1) ◽  
pp. e100009 ◽  
Author(s):  
Christopher Pearce ◽  
Adam McLeod ◽  
Jon Patrick ◽  
Jason Ferrigi ◽  
Michael Michael Bainbridge ◽  
...  

BackgroundData, particularly ‘big’ data are increasingly being used for research in health. Using data from electronic medical records optimally requires coded data, but not all systems produce coded data.ObjectiveTo design a suitable, accurate method for converting large volumes of narrative diagnoses from Australian general practice records to codify them into SNOMED-CT-AU. Such codification will make them clinically useful for aggregation for population health and research purposes.MethodThe developed method consisted of using natural language processing to automatically code the texts, followed by a manual process to correct codes and subsequent natural language processing re-computation. These steps were repeated for four iterations until 95% of the records were coded. The coded data were then aggregated into classes considered to be useful for population health analytics.ResultsCoding the data effectively covered 95% of the corpus. Problems with the use of SNOMED CT-AU were identified and protocols for creating consistent coding were created. These protocols can be used to guide further development of SNOMED CT-AU (SCT). The coded values will be immensely useful for the development of population health analytics for Australia, and the lessons learnt applicable elsewhere.


2019 ◽  
pp. 781-809
Author(s):  
Kanza Noor Syeda ◽  
Syed Noorulhassan Shirazi ◽  
Syed Asad Ali Naqvi ◽  
Howard J Parkinson ◽  
Gary Bamford

Due to modern powerful computing and the explosion in data availability and advanced analytics, there should be opportunities to use a Big Data approach to proactively identify high risk scenarios on the railway. In this chapter, we comprehend the need for developing machine intelligence to identify heightened risk on the railway. In doing so, we have explained a potential for a new data driven approach in the railway, we then focus the rest of the chapter on Natural Language Processing (NLP) and its potential for analysing accident data. We review and analyse investigation reports of railway accidents in the UK, published by the Rail Accident Investigation Branch (RAIB), aiming to reveal the presence of entities which are informative of causes and failures such as human, technical and external. We give an overview of a framework based on NLP and machine learning to analyse the raw text from RAIB reports which would assist the risk and incident analysis experts to study causal relationship between causes and failures towards the overall safety in the rail industry.


2021 ◽  
Author(s):  
Melissa P. Resnick ◽  
Frank LeHouillier ◽  
Steven H. Brown ◽  
Keith E. Campbell ◽  
Diane Montella ◽  
...  

Objective: One important concept in informatics is data which meets the principles of Findability, Accessibility, Interoperability and Reusability (FAIR). Standards, such as terminologies (findability), assist with important tasks like interoperability, Natural Language Processing (NLP) (accessibility) and decision support (reusability). One terminology, Solor, integrates SNOMED CT, LOINC and RxNorm. We describe Solor, HL7 Analysis Normal Form (ANF), and their use with the high definition natural language processing (HD-NLP) program. Methods: We used HD-NLP to process 694 clinical narratives prior modeled by human experts into Solor and ANF. We compared HD-NLP output to the expert gold standard for 20% of the sample. Each clinical statement was judged “correct” if HD-NLP output matched ANF structure and Solor concepts, or “incorrect” if any ANF structure or Solor concepts were missing or incorrect. Judgements were summed to give totals for “correct” and “incorrect”. Results: 113 (80.7%) correct, 26 (18.6%) incorrect, and 1 error. Inter-rater reliability was 97.5% with Cohen’s kappa of 0.948. Conclusion: The HD-NLP software provides useable complex standards-based representations for important clinical statements designed to drive CDS.


Sign in / Sign up

Export Citation Format

Share Document