scholarly journals Clinical Text De-identification and Other Large Scale Processing Tasks in Resource Constrained Environments

Author(s):  
Richard Jackson ◽  
Richard Dobson ◽  
Robert Stewart

ABSTRACT ObjectivesClinical text de-identification is a common requirement of the ‘enclave’ governance model of ethical EHR research. However, there is often little consideration of the engineering task that is required to scale these approaches across the hundreds of millions of clinical documents containing personal identifiers that are resident in the data repositories of a typical NHS Trust. Similarly, natural language processing is an increasingly important field of clinical data science, yet it requires fault tolerant approaches to data processing. This work concerns the development of “turbo-laser” - a distributed document processing architecture based upon the popular ‘battle hardened’ Spring Batch framework - an industry standard for large scale processing tasks. ApproachUsing Spring Batch, we developed a highly scalable unstructured data processing framework, using the concept of remote partitioning. Remote partitioning allows us to offload processing tasks to any and all computers in a network. With this approach, it is possible to harness the entire compute available of an organisation, whether it be an office of 15 desktop PCs that go unused overnight, or a compute cluster of a thousand processors. This method is especially valuable in the NHS, where the provision of sufficient compute to make large scale analytics possible are often hindered by the lack of available hardware, or difficulties in navigating technical governance policies ill equipped for the demands of modern data science. ResultsTurbo-laser was developed in consideration of the processing challenges common in the NHS. Currently, four types of ‘job’ are available - De-identification, using the Cognition algorithm, generic GATE output, text extraction from binary files such as MS Office, PDF and scanned documents, and a document re-compiler to deal with EHR legacy issues. Examples of turbo-laser usage include processing 9 million binary documents on modest hardware, within 48 hours. ConclusionTurbo-laser is an enterprise grade processing tool, in keeping with the software engineering pattern of ‘batch processing’ that has been at the forefront of the informatics movement. An open source project, it is hoped that others may contribute and extend its principles, lowering the barrier of large scale data processing throughout the NHS.

2021 ◽  
Author(s):  
Min Chen

Abstract Deep learning (DL) techniques, more specifically Convolutional Neural Networks (CNNs), have become increasingly popular in advancing the field of data science and have had great successes in a wide array of applications including computer vision, speech, natural language processing and etc. However, the training process of CNNs is computationally intensive and high computational cost, especially when the dataset is huge. To overcome these obstacles, this paper takes advantage of distributed frameworks and cloud computing to develop a parallel CNN algorithm. MapReduce is a scalable and fault-tolerant data processing tool that was developed to provide significant improvements in large-scale data-intensive applications in clusters. A MapReduce-based CNN (MCNN) is developed in this work to tackle the task of image classification. In addition, the proposed MCNN adopted the idea of adding dropout layers in the networks to tackle the overfitting problem. Close examination of the implementation of MCNN as well as how the proposed algorithm accelerates learning are discussed and demonstrated through experiments. Results reveal high classification accuracy and significant improvements in speedup, scaleup and sizeup compared to the standard algorithms.


2008 ◽  
Vol 25 (5) ◽  
pp. 287-300 ◽  
Author(s):  
B. Martin ◽  
A. Al‐Shabibi ◽  
S.M. Batraneanu ◽  
Ciobotaru ◽  
G.L. Darlea ◽  
...  

2014 ◽  
Vol 26 (6) ◽  
pp. 1316-1331 ◽  
Author(s):  
Gang Chen ◽  
Tianlei Hu ◽  
Dawei Jiang ◽  
Peng Lu ◽  
Kian-Lee Tan ◽  
...  

2018 ◽  
Vol 7 (2.31) ◽  
pp. 240
Author(s):  
S Sujeetha ◽  
Veneesa Ja ◽  
K Vinitha ◽  
R Suvedha

In the existing scenario, a patient has to go to the hospital to take necessary tests, consult a doctor and buy prescribed medicines or use specified healthcare applications. Hence time is wasted at hospitals and in medical shops. In the case of healthcare applications, face to face interaction with the doctor is not available. The downside of the existing scenario can be improved by the Medimate: Ailment diffusion control system with real time large scale data processing. The purpose of medimate is to establish a Tele Conference Medical System that can be used in remote areas. The medimate is configured for better diagnosis and medical treatment for the rural people. The system is installed with Heart Beat Sensor, Temperature Sensor, Ultrasonic Sensor and Load Cell to monitor the patient’s health parameters. The voice instructions are updated for easier access.  The application for enabling video and voice communication with the doctor through Camera and Headphone is installed at both the ends. The doctor examines the patient and prescribes themedicines. The medical dispenser delivers medicine to the patient as per the prescription. The QR code will be generated for each prescription by medimate and that QR code can be used forthe repeated medical conditions in the future. Medical details are updated in the server periodically.  


2019 ◽  
Vol 12 (12) ◽  
pp. 2290-2299
Author(s):  
Azza Abouzied ◽  
Daniel J. Abadi ◽  
Kamil Bajda-Pawlikowski ◽  
Avi Silberschatz

2016 ◽  
Vol 34 (7_suppl) ◽  
pp. 196-196
Author(s):  
Kathryn S. Egan ◽  
Gary H. Lyman ◽  
Karma L. Kreizenbeck ◽  
Catherine R. Fedorenko ◽  
April Alfiler ◽  
...  

196 Background: Natural language processing (NLP) has the potential to significantly ease the burden of manual abstraction of unstructured electronic text when measuring adherence to national guidelines. We incorporated NLP into standard data processing techniques such as manual abstraction and database queries in order to more efficiently evaluate a regional oncology clinic’s adherence to ASCO’s Choosing Wisely colony stimulating factor (CSF) recommendation using clinical, billing, and cancer registry data. Methods: Database queries on the clinic’s cancer registry yielded the study population of patients with stage II-IV breast, non-small cell lung (NSCL), and colorectal cancer. We manually abstracted chemotherapy regimens from paper prescription records. CSF orders were collected through queries on the clinic’s facility billing data, when available; otherwise through a custom NLP program and manual abstraction of the electronic medical record. The NLP program was designed to identify clinical note text containing CSF information, which was then manually abstracted. Results: Out of 31,725 clinical notes for the eligible population, the NLP program identified 1,487 clinical notes with CSF-related language, effectively reducing the number of notes requiring abstraction by up to 95%. Between 1/1/2012-12/31/2014, adherence to the ASCO CW CSF recommendation at the regional oncology clinic was 89% for a population of 322 patients. Conclusions: NLP significantly reduced the burden of manual abstraction by singling out relevant clinical text for abstractors. Abstraction is often necessary due to the complexity of data collection tasks or the use of paper records. However, NLP is a valuable addition to the suite of data processing techniques traditionally used to measure adherence to national guidelines.


Sign in / Sign up

Export Citation Format

Share Document