scholarly journals Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation

10.2196/18735 ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. e18735
Author(s):  
Yun William Yu ◽  
Griffin M Weber

Background Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. Objective This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. Methods We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques. Results In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. Conclusions Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.


2020 ◽  
Author(s):  
Yun William Yu ◽  
Griffin M Weber

BACKGROUND Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to <i>link</i> patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. OBJECTIVE This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. METHODS We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is <i>tunable</i>, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, <i>k</i>-anonymity privacy risk (with <i>k</i>=10), and computational runtime of our algorithm with several existing techniques. RESULTS In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining <i>k</i>-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. CONCLUSIONS Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.



2020 ◽  
Author(s):  
Julian Gruendner ◽  
Christian Gulden ◽  
Marvin Kampf ◽  
Sebastian Mate ◽  
Hans-Ulrich Prokosch ◽  
...  

BACKGROUND The harmonization and standardization of digital medical information for research purposes is a challenging and ongoing collaborative effort. Current research data repositories typically require extensive efforts in harmonizing and transforming original clinical data. The Fast Healthcare Interoperability Resources (FHIR) format was designed primarily to represent clinical processes; therefore, it closely resembles the clinical data model and is more widely available across modern electronic health records. However, no common standardized data format is directly suitable for statistical analyses, and data need to be preprocessed before statistical analysis. OBJECTIVE This study aimed to elucidate how FHIR data can be queried directly with a preprocessing service and be used for statistical analyses. METHODS We propose that the binary JavaScript Object Notation format of the PostgreSQL (PSQL) open source database is suitable for not only storing FHIR data, but also extending it with preprocessing and filtering services, which directly transform data stored in FHIR format into prepared data subsets for statistical analysis. We specified an interface for this preprocessor, implemented and deployed it at University Hospital Erlangen-Nürnberg, generated 3 sample data sets, and analyzed the available data. RESULTS We imported real-world patient data from 2016 to 2018 into a standard PSQL database, generating a dataset of approximately 35.5 million FHIR resources, including “Patient,” “Encounter,” “Condition” (diagnoses specified using International Classification of Diseases codes), “Procedure,” and “Observation” (laboratory test results). We then integrated the developed preprocessing service with the PSQL database and the locally installed web-based KETOS analysis platform. Advanced statistical analyses were feasible using the developed framework using 3 clinically relevant scenarios (data-driven establishment of hemoglobin reference intervals, assessment of anemia prevalence in patients with cancer, and investigation of the adverse effects of drugs). CONCLUSIONS This study shows how the standard open source database PSQL can be used to store FHIR data and be integrated with a specifically developed preprocessing and analysis framework. This enables dataset generation with advanced medical criteria and the integration of subsequent statistical analysis. The web-based preprocessing service can be deployed locally at the hospital level, protecting patients’ privacy while being integrated with existing open source data analysis tools currently being developed across Germany.





2019 ◽  
Author(s):  
Yun William Yu ◽  
Griffin M Weber

AbstractResearchers use large federated clinical data networks that connect dozens of healthcare organizations to access data on millions of patients. However, because patients often receive care from multiple sites in the network, queries frequently double-count patients. Using the probabilistic streaming algorithm HyperLogLog and adding obfuscation, we developed a scalable method for estimating the number of distinct lives that match a query, which balances accuracy and privacy in a “tunable” way.



10.2196/25645 ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. e25645
Author(s):  
Julian Gruendner ◽  
Christian Gulden ◽  
Marvin Kampf ◽  
Sebastian Mate ◽  
Hans-Ulrich Prokosch ◽  
...  

Background The harmonization and standardization of digital medical information for research purposes is a challenging and ongoing collaborative effort. Current research data repositories typically require extensive efforts in harmonizing and transforming original clinical data. The Fast Healthcare Interoperability Resources (FHIR) format was designed primarily to represent clinical processes; therefore, it closely resembles the clinical data model and is more widely available across modern electronic health records. However, no common standardized data format is directly suitable for statistical analyses, and data need to be preprocessed before statistical analysis. Objective This study aimed to elucidate how FHIR data can be queried directly with a preprocessing service and be used for statistical analyses. Methods We propose that the binary JavaScript Object Notation format of the PostgreSQL (PSQL) open source database is suitable for not only storing FHIR data, but also extending it with preprocessing and filtering services, which directly transform data stored in FHIR format into prepared data subsets for statistical analysis. We specified an interface for this preprocessor, implemented and deployed it at University Hospital Erlangen-Nürnberg, generated 3 sample data sets, and analyzed the available data. Results We imported real-world patient data from 2016 to 2018 into a standard PSQL database, generating a dataset of approximately 35.5 million FHIR resources, including “Patient,” “Encounter,” “Condition” (diagnoses specified using International Classification of Diseases codes), “Procedure,” and “Observation” (laboratory test results). We then integrated the developed preprocessing service with the PSQL database and the locally installed web-based KETOS analysis platform. Advanced statistical analyses were feasible using the developed framework using 3 clinically relevant scenarios (data-driven establishment of hemoglobin reference intervals, assessment of anemia prevalence in patients with cancer, and investigation of the adverse effects of drugs). Conclusions This study shows how the standard open source database PSQL can be used to store FHIR data and be integrated with a specifically developed preprocessing and analysis framework. This enables dataset generation with advanced medical criteria and the integration of subsequent statistical analysis. The web-based preprocessing service can be deployed locally at the hospital level, protecting patients’ privacy while being integrated with existing open source data analysis tools currently being developed across Germany.



Author(s):  
Tak-Ming Chan ◽  
Yuxi Li ◽  
Choo-Chiap Chiau ◽  
Jane Zhu ◽  
Jie Jiang ◽  
...  


2019 ◽  
Vol 98 (5) ◽  
pp. 494-497 ◽  
Author(s):  
M. A. Yahyaev ◽  
Shamil K. Salikhov ◽  
S. O. Abdulkadyrova ◽  
A. Sh. Aselderova ◽  
Z. Z. Surkhayeva ◽  
...  

Introduction. Study of the interrelation between magnesium content in biosphere objects (soil, natural water, plants) with arterial hypertension (AH) incidence among the population living in the territory of the plain Dagestan (Babayurtovsky, Kizlyar, Tarumovsky and Nogai districts of the Republic). material and Methods. To assess the development of hypertension, the data were obtained from the medical information center of the Ministry of Health of Dagestan. Samples of soil, water, plants with the determination of the magnesium content in them were processed by the photometric method in the biogeochemical laboratory of the Prikaspiyskiy Institute of Biology Resources of Daghestan Scientific Centre of the Russian Academy of Sciences. For analyzes, the material was selected in the summer months. The correlation coefficients are calculated by Pearson. Results. When comparing the incidence rates of AH for 2005-2007 with the magnesium content in soils, natural waters, plants, a negative average correlation of magnesium concentration in the study sites with the incidence of AH population was established in the study area. A number of patients in the studied years was also noted to be changing, but the dependence of the incidence of AH of the population on the magnesium content in the biosphere objects is preserved. Conclusions. One of the factors of occurrence and course of hypertension is the magnesium status of the population, which depends on the geochemical features of the territory. As a result of the study, it was found that the higher the magnesium content in biosphere objects, the lower the incidence of AH population. In order to reduce the AH values among the population, it is necessary to correct the lack of magnesium in the human body with magnesium additives, which contribute to the regulation of blood pressure and reduce the risk of cardiovascular diseases. In connection with the possible subclinical magnesium deficiency, an important factor in informing about the possible morbidity of hypertension is information on the magnesium content in environmental objects.



Author(s):  
Anna Babu ◽  
Sonal Ayyappan

Health care institution demands exchange of medical images of number of patients to sought opinions from different experts. In order to reduce storage and for secure transmission of the medical images, Crypto-Watermarking techniques are adopted. The system is considered to be combinations of encryption technique with watermarking or steganography means adopted for safe transfer of medical images along with embedding of optional medical information. The Digital Watermarking is the process of embedding data to multimedia content. This can be done in spatial as well as frequency domain of the cover image to be transmitted. The robustness against attacks is ensured while embedding the encrypted data into transform domain, the encrypted data can be any secret key for the content recovery or patient record or the image itself. This chapter presents basic aspects of crypto-watermarking technique, as an application. It gives a detailed assessment on different approaches of crypto-watermarking for secure transmission of medical images and elaborates a case study on it.



2014 ◽  
Vol 3 (3) ◽  
pp. 52 ◽  
Author(s):  
Thomas Lanni ◽  
Gail Elliott Patricolo

The number of patients seeking complementary and alternative medicines combined with conventional treatments has grown considerably over the past decade. To meet the growing demand, a dedicated oncology integrative medicine program was initiated in the Beaumont Health System to address the needs of this patient population. Due to its resounding success and patient satisfaction, as evidenced by patient utilization and testimonials and physician referrals, the program was expanded across the healthcare system to every medical specialty. This study outlines how the program was implemented and its business model. A number of methods were used to evaluate the feasibility of starting the program and determine the services required. Financial analyses were developed to understand the costs associated with starting the program without financial assistance. In 2006, an Integrative Medicine program was launched in the Beaumont Cancer Institute (Royal Oak, MI). The initial offering for patients was clinical massage; however, the program rapidly expanded. Currently, services include clinical massage, a clinical massage training program, Reiki, guided imagery, acupuncture, and naturopathic medicine. Patients and physicians expressed satisfaction with the increasing number of complementary services offered at the institution, and the services are heavily utilized. In 2012, the program had more than 18,000 patient visits, of which, 10,191 were for clinical massage, 6,515 for acupuncture, and 1,030 for naturopathic medicine. In this study of developing and implementing an Integrative Medicine program in a large healthcare system, it is shown that a successful program could be initiated with the appropriate planning and support from administration. The program is shown to be financially viable, as the Integrative Medicine (IM) department has become self-sufficient and no longer requires financial support from other hospital departments, and the numerous testimonials indicate that the program has been rewarding for practitioners, staff, and patients. 



Sign in / Sign up

Export Citation Format

Share Document