scholarly journals A Framework for Criteria-Based Selection and Processing of Fast Healthcare Interoperability Resources (FHIR) Data for Statistical Analysis: Design and Implementation Study

10.2196/25645 ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. e25645
Author(s):  
Julian Gruendner ◽  
Christian Gulden ◽  
Marvin Kampf ◽  
Sebastian Mate ◽  
Hans-Ulrich Prokosch ◽  
...  

Background The harmonization and standardization of digital medical information for research purposes is a challenging and ongoing collaborative effort. Current research data repositories typically require extensive efforts in harmonizing and transforming original clinical data. The Fast Healthcare Interoperability Resources (FHIR) format was designed primarily to represent clinical processes; therefore, it closely resembles the clinical data model and is more widely available across modern electronic health records. However, no common standardized data format is directly suitable for statistical analyses, and data need to be preprocessed before statistical analysis. Objective This study aimed to elucidate how FHIR data can be queried directly with a preprocessing service and be used for statistical analyses. Methods We propose that the binary JavaScript Object Notation format of the PostgreSQL (PSQL) open source database is suitable for not only storing FHIR data, but also extending it with preprocessing and filtering services, which directly transform data stored in FHIR format into prepared data subsets for statistical analysis. We specified an interface for this preprocessor, implemented and deployed it at University Hospital Erlangen-Nürnberg, generated 3 sample data sets, and analyzed the available data. Results We imported real-world patient data from 2016 to 2018 into a standard PSQL database, generating a dataset of approximately 35.5 million FHIR resources, including “Patient,” “Encounter,” “Condition” (diagnoses specified using International Classification of Diseases codes), “Procedure,” and “Observation” (laboratory test results). We then integrated the developed preprocessing service with the PSQL database and the locally installed web-based KETOS analysis platform. Advanced statistical analyses were feasible using the developed framework using 3 clinically relevant scenarios (data-driven establishment of hemoglobin reference intervals, assessment of anemia prevalence in patients with cancer, and investigation of the adverse effects of drugs). Conclusions This study shows how the standard open source database PSQL can be used to store FHIR data and be integrated with a specifically developed preprocessing and analysis framework. This enables dataset generation with advanced medical criteria and the integration of subsequent statistical analysis. The web-based preprocessing service can be deployed locally at the hospital level, protecting patients’ privacy while being integrated with existing open source data analysis tools currently being developed across Germany.

2020 ◽  
Author(s):  
Julian Gruendner ◽  
Christian Gulden ◽  
Marvin Kampf ◽  
Sebastian Mate ◽  
Hans-Ulrich Prokosch ◽  
...  

BACKGROUND The harmonization and standardization of digital medical information for research purposes is a challenging and ongoing collaborative effort. Current research data repositories typically require extensive efforts in harmonizing and transforming original clinical data. The Fast Healthcare Interoperability Resources (FHIR) format was designed primarily to represent clinical processes; therefore, it closely resembles the clinical data model and is more widely available across modern electronic health records. However, no common standardized data format is directly suitable for statistical analyses, and data need to be preprocessed before statistical analysis. OBJECTIVE This study aimed to elucidate how FHIR data can be queried directly with a preprocessing service and be used for statistical analyses. METHODS We propose that the binary JavaScript Object Notation format of the PostgreSQL (PSQL) open source database is suitable for not only storing FHIR data, but also extending it with preprocessing and filtering services, which directly transform data stored in FHIR format into prepared data subsets for statistical analysis. We specified an interface for this preprocessor, implemented and deployed it at University Hospital Erlangen-Nürnberg, generated 3 sample data sets, and analyzed the available data. RESULTS We imported real-world patient data from 2016 to 2018 into a standard PSQL database, generating a dataset of approximately 35.5 million FHIR resources, including “Patient,” “Encounter,” “Condition” (diagnoses specified using International Classification of Diseases codes), “Procedure,” and “Observation” (laboratory test results). We then integrated the developed preprocessing service with the PSQL database and the locally installed web-based KETOS analysis platform. Advanced statistical analyses were feasible using the developed framework using 3 clinically relevant scenarios (data-driven establishment of hemoglobin reference intervals, assessment of anemia prevalence in patients with cancer, and investigation of the adverse effects of drugs). CONCLUSIONS This study shows how the standard open source database PSQL can be used to store FHIR data and be integrated with a specifically developed preprocessing and analysis framework. This enables dataset generation with advanced medical criteria and the integration of subsequent statistical analysis. The web-based preprocessing service can be deployed locally at the hospital level, protecting patients’ privacy while being integrated with existing open source data analysis tools currently being developed across Germany.


10.2196/18735 ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. e18735
Author(s):  
Yun William Yu ◽  
Griffin M Weber

Background Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. Objective This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. Methods We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques. Results In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. Conclusions Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.


2012 ◽  
Vol 23 (1) ◽  
pp. 191-198 ◽  
Author(s):  
Dal-Ho Kim ◽  
Im-Hee Shin ◽  
Jung-Youn Choe ◽  
Sang-Gyung Kim ◽  
Chun-Woo Park ◽  
...  

2019 ◽  
pp. 1-16 ◽  
Author(s):  
Veli-Matti Isoviita ◽  
Liina Salminen ◽  
Jimmy Azar ◽  
Rainer Lehtonen ◽  
Pia Roering ◽  
...  

PURPOSE We have created a cloud-based machine learning system (CLOBNET) that is an open-source, lean infrastructure for electronic health record (EHR) data integration and is capable of extract, transform, and load (ETL) processing. CLOBNET enables comprehensive analysis and visualization of structured EHR data. We demonstrate the utility of CLOBNET by predicting primary therapy outcomes of patients with high-grade serous ovarian cancer (HGSOC) on the basis of EHR data. MATERIALS AND METHODS CLOBNET is built using open-source software to make data preprocessing, analysis, and model training user friendly. The source code of CLOBNET is available in GitHub. The HGSOC data set was based on a prospective cohort of 208 patients with HGSOC who were treated at Turku University Hospital, Finland, from 2009 to 2019 for whom comprehensive clinical and EHR data were available. RESULTS We trained machine learning (ML) models using clinical data, including a herein developed dissemination score that quantifies the disease burden at the time of diagnosis, to identify patients with progressive disease (PD) or a complete response (CR) on the basis of RECIST (version 1.1). The best performance was achieved with a logistic regression model, which resulted in an area under receiver operating characteristic curve (AUROC) of 0.86, with a specificity of 73% and a sensitivity of 89%, when it classified between patients who experienced PD and CR. CONCLUSION We have developed an open-source computational infrastructure, CLOBNET, that enables effective and rapid analysis of EHR and other clinical data. Our results demonstrate that CLOBNET allows predictions to be made on the basis of EHR data to address clinically relevant questions.


2020 ◽  
Author(s):  
Yun William Yu ◽  
Griffin M Weber

BACKGROUND Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to <i>link</i> patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. OBJECTIVE This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. METHODS We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is <i>tunable</i>, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, <i>k</i>-anonymity privacy risk (with <i>k</i>=10), and computational runtime of our algorithm with several existing techniques. RESULTS In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining <i>k</i>-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. CONCLUSIONS Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.


1993 ◽  
Vol 32 (05) ◽  
pp. 365-372 ◽  
Author(s):  
T. Timmeis ◽  
J. H. van Bemmel ◽  
E. M. van Mulligen

AbstractResults are presented of the user evaluation of an integrated medical workstation for support of clinical research. Twenty-seven users were recruited from medical and scientific staff of the University Hospital Dijkzigt, the Faculty of Medicine of the Erasmus University Rotterdam, and from other Dutch medical institutions; and all were given a written, self-contained tutorial. Subsequently, an experiment was done in which six clinical data analysis problems had to be solved and an evaluation form was filled out. The aim of this user evaluation was to obtain insight in the benefits of integration for support of clinical data analysis for clinicians and biomedical researchers. The problems were divided into two sets, with gradually more complex problems. In the first set users were guided in a stepwise fashion to solve the problems. In the second set each stepwise problem had an open counterpart. During the evaluation, the workstation continuously recorded the user’s actions. From these results significant differences became apparent between clinicians and non-clinicians for the correctness (means 54% and 81%, respectively, p = 0.04), completeness (means 64% and 88%, respectively, p = 0.01), and number of problems solved (means 67% and 90%, respectively, p = 0.02). These differences were absent for the stepwise problems. Physicians tend to skip more problems than biomedical researchers. No statistically significant differences were found between users with and without clinical data analysis experience, for correctness (means 74% and 72%, respectively, p = 0.95), and completeness (means 82% and 79%, respectively, p = 0.40). It appeared that various clinical research problems can be solved easily with support of the workstation; the results of this experiment can be used as guidance for the development of the successor of this prototype workstation and serve as a reference for the assessment of next versions.


Sign in / Sign up

Export Citation Format

Share Document