scholarly journals Fold-stratified cross-validation for unbiased and privacy-preserving federated learning

2020 ◽  
Vol 27 (8) ◽  
pp. 1244-1251
Author(s):  
Romain Bey ◽  
Romain Goussault ◽  
François Grolleau ◽  
Mehdi Benchoufi ◽  
Raphaël Porcher

Abstract Objective We introduce fold-stratified cross-validation, a validation methodology that is compatible with privacy-preserving federated learning and that prevents data leakage caused by duplicates of electronic health records (EHRs). Materials and Methods Fold-stratified cross-validation complements cross-validation with an initial stratification of EHRs in folds containing patients with similar characteristics, thus ensuring that duplicates of a record are jointly present either in training or in validation folds. Monte Carlo simulations are performed to investigate the properties of fold-stratified cross-validation in the case of a model data analysis using both synthetic data and MIMIC-III (Medical Information Mart for Intensive Care-III) medical records. Results In situations in which duplicated EHRs could induce overoptimistic estimations of accuracy, applying fold-stratified cross-validation prevented this bias, while not requiring full deduplication. However, a pessimistic bias might appear if the covariate used for the stratification was strongly associated with the outcome. Discussion Although fold-stratified cross-validation presents low computational overhead, to be efficient it requires the preliminary identification of a covariate that is both shared by duplicated records and weakly associated with the outcome. When available, the hash of a personal identifier or a patient’s date of birth provides such a covariate. On the contrary, pseudonymization interferes with fold-stratified cross-validation, as it may break the equality of the stratifying covariate among duplicates. Conclusion Fold-stratified cross-validation is an easy-to-implement methodology that prevents data leakage when a model is trained on distributed EHRs that contain duplicates, while preserving privacy.

Blood ◽  
2009 ◽  
Vol 114 (22) ◽  
pp. 1407-1407
Author(s):  
Nikita E Shklovskiy-Kordi ◽  
Boris V Zingerman ◽  
Lyuba Varticovski ◽  
Alexandra Kremenetskya ◽  
Andrei Vorobjov

Abstract Abstract 1407 Poster Board I-429 Purpose. To compare the requirements of physicians and patients for the Internet interactive service which allows patients to manage their own medical records and communicate with physician via the Internet. Background. USA federal rule defining “the Meaningful Use of Electronic Health Records” is similar to that of the National Standard of Russian Federation “The Electronic Case History (EHR)”, operating since 2008. This National Standard was developed based on experience of EHR system at the National Center for Hematology in Moscow (NCH). In 2009, we started the Personal Health Records service (PHR service) that allows patients to manage their own medical records and have internet-based communication with physicians. Simple interface for patients which blocks the full capacity of the PHR service is similar to that of EHR system of NCH. It permits integrated data presentations on a uniform axis of time and access to additional information (reported to ASH in 2001). The PHR service raises question of “meaningful use” requirements not only for EHR provider organization, but for the service users - patients and doctors. Methods. Using questionnaires and interviews, we compared expectations and acceptance of the PHR service by doctors and their patients. Results and Discussion. Preliminary results indicate that doctors are more likely to use the PHR service than the System of HER. Although the entire format of PHR service is familiar to physicians at NCH, they mostly use its information capabilities (viewing the results of the analysis, making appointments for research and planning patient's visits). The patients use PHR service with great enthusiasm (increasing with younger age and higher level of education). The complexity of integration interfaces, which we leave for the patients in the second term, gives them more inspiration than that the physicians. However, few patients take seriously the responsibilities that exist in relation to the accurate maintenance of their records. Conclusion. PHR can be widely used if integration of sources for medical information and unification format can simplify the “manual” work of PHR management. Key Words: Telemedicine, PHR, EHR Disclosures: No relevant conflicts of interest to declare.


2014 ◽  
Vol 989-994 ◽  
pp. 5524-5527
Author(s):  
Ning Liu

Residents' health records is different from general hospital medical records, because it is not just about people receive medical service records, or a continuous, sustained, long-term, comprehensive, more extensive information about health information. This article according to the actual situation to the residents of electronic health records system database design are analyzed, and the residents' information table, the doctor information table, a medical information table, health file information table, travel information table and announcement information table 6 design of basic data table made a specific description.


2020 ◽  
Vol 27 (3) ◽  
pp. 407-418 ◽  
Author(s):  
Hannah L Weeks ◽  
Cole Beck ◽  
Elizabeth McNeer ◽  
Michael L Williams ◽  
Cosmin A Bejan ◽  
...  

Abstract Objective We developed medExtractR, a natural language processing system to extract medication information from clinical notes. Using a targeted approach, medExtractR focuses on individual drugs to facilitate creation of medication-specific research datasets from electronic health records. Materials and Methods Written using the R programming language, medExtractR combines lexicon dictionaries and regular expressions to identify relevant medication entities (eg, drug name, strength, frequency). MedExtractR was developed on notes from Vanderbilt University Medical Center, using medications prescribed with varying complexity. We evaluated medExtractR and compared it with 3 existing systems: MedEx, MedXN, and CLAMP (Clinical Language Annotation, Modeling, and Processing). We also demonstrated how medExtractR can be easily tuned for better performance on an outside dataset using the MIMIC-III (Medical Information Mart for Intensive Care III) database. Results On 50 test notes per development drug and 110 test notes for an additional drug, medExtractR achieved high overall performance (F-measures >0.95), exceeding performance of the 3 existing systems across all drugs. MedExtractR achieved the highest F-measure for each individual entity, except drug name and dose amount for allopurinol. With tuning and customization, medExtractR achieved F-measures >0.90 in the MIMIC-III dataset. Discussion The medExtractR system successfully extracted entities for medications of interest. High performance in entity-level extraction provides a strong foundation for developing robust research datasets for pharmacological research. When working with new datasets, medExtractR should be tuned on a small sample of notes before being broadly applied. Conclusions The medExtractR system achieved high performance extracting specific medications from clinical text, leading to higher-quality research datasets for drug-related studies than some existing general-purpose medication extraction tools.


Author(s):  
Aashish Bhardwaj ◽  
Vikas Kumar

Patient data is very valuable and must be protected from misuse by the third parties. Also, the rights of patient like privacy, confidentiality of medical information, information about possible risks of medical treatment, to consent or refuse a treatment are very much important. Individuals should have the right to access their health records and get these deleted from hospital records after completing the treatment. Traditional ways of keeping paper-based health records are being replaced by electronic health records as they increase portability and accessibility to medical records. Governments and hospitals across the world and putting huge efforts to implement the electronic health records. The present work explores the different aspects of health privacy and health records. Most important stakeholders, technological and legal aspects have been presented from both the Indian and international perspectives. A comparative analysis has been presented for the available EHR standards with a focus on their roles and implementation challenges.


2021 ◽  
Vol 12 ◽  
pp. 31
Author(s):  
Masahito Katsuki ◽  
Norio Narita ◽  
Naoya Ishida ◽  
Ohmi Watanabe ◽  
Siqi Cai ◽  
...  

Background: Chronologically meteorological and calendar factors were risks of stroke occurrence. However, the prediction of stroke occurrences is difficult depending on only meteorological and calendar factors. We tried to make prediction models for stroke occurrences using deep learning (DL) software, Prediction One (Sony Network Communications Inc., Tokyo, Japan), with those variables. Methods: We retrospectively investigated the daily stroke occurrences between 2017 and 2019. We used Prediction One software to make the prediction models for daily stroke occurrences (present or absent) using 221 chronologically meteorological and calendar factors. We made a prediction models from the 3-year dataset and evaluated their accuracies using the internal cross-validation. Areas under the curves (AUCs) of receiver operating characteristic curves were used as accuracies. Results: The 371 cerebral infarction (CI), 184 intracerebral hemorrhage (ICH), and 53 subarachnoid hemorrhage patients were included in the study. The AUCs of the several DL-based prediction models for all stroke occurrences were 0.532–0.757. Those for CI were 0.600–0.782. Those for ICH were 0.714–0.988. Conclusion: Our preliminary results suggested a probability of the DL-based prediction models for stroke occurrence only by meteorological and calendar factors. In the future, by synchronizing a variety of medical information among the electronic medical records and personal smartphones as well as integrating the physical activities or meteorological conditions in real time, the prediction of stroke occurrence could be performed with high accuracy, to save medical resources, to have patients care for themselves, and to perform efficient medicine.


Author(s):  
Raghavendra Ganiga ◽  
Radhika M. Pai ◽  
Manohara Pai M. M. ◽  
Rajesh Kumar Sinha

Health records are an integral aspect of any Hospital Management System. With newer innovations in technology, there has been a shift in the way of recording health information. Medical records which used to be managed using various paper charts have now become easier to organize and maintain, thereby increasing the efficiency of medical staff. The Electronic Health Records (EHR) System is becoming a high-tech medical management technology developed for the economic or emerging economic countries like India. In a national health system, the EHR integrates the Electronic Medical Records (EMR) in all collaborating hospitals through different networks. EHR gives healthcare professionals a way to share and manage patient data quickly and effectively. Due to the mass storage of confidential patient data, healthcare organizations are considered as one of the most targeted sectors by intruders. This paper proposes a security framework for EHR system, which takes into consideration the integrity, availability, and confidentiality of health records. The threats posed to the EHR system are modeled by STRIDE modeling tool, and the amount of risk is calculated using DREAD. The paper also suggests the security mechanism and countermeasures based on security standards, which can be utilized in an EHR environment. The paper shows that the utilization of the proposed methods effectively addresses security concerns such as breach of sensitive medical information.


1970 ◽  
Vol 09 (03) ◽  
pp. 149-160 ◽  
Author(s):  
E. Van Brunt ◽  
L. S. Davis ◽  
J. F. Terdiman ◽  
S. Singer ◽  
E. Besag ◽  
...  

A pilot medical information system is being implemented and currently is providing services for limited categories of patient data. In one year, physicians’ diagnoses for 500,000 office visits, 300,000 drug prescriptions for outpatients, one million clinical laboratory tests, and 60,000 multiphasic screening examinations are being stored in and retrieved from integrated, direct access, patient computer medical records.This medical information system is a part of a long-term research and development program. Its major objective is the development of a multifacility computer-based system which will support eventually the medical data requirements of a population of one million persons and one thousand physicians. The strategy employed provides for modular development. The central system, the computer-stored medical records which are therein maintained, and a satellite pilot medical data system in one medical facility are described.


1967 ◽  
Vol 06 (01) ◽  
pp. 1-6
Author(s):  
P. Hall ◽  
Ch. Mellner ◽  
T. Danielsson

A system for medical information has been developed. The system is a general and flexible one which without reprogramming or new programs can accept any alphabetic and/or numeric information. Coded concepts and natural language can be read, stored, decoded and written out. Medical records or parts of records (diagnosis, operations, therapy, laboratory tests, symptoms etc.) can be retrieved and selected. The system can process simple statistics but even make linear pattern recognition analysis.The system described has been used for in-patients, outpatients and individuals in health examinations.The use of computers in hospitals, health examinations or health care systems is a problem of storing information in a general and flexible form. This problem has been solved, and now it is possible to add new routines like booking and follow-up-systems.


2021 ◽  
Vol 13 (4) ◽  
pp. 94
Author(s):  
Haokun Fang ◽  
Quan Qian

Privacy protection has been an important concern with the great success of machine learning. In this paper, it proposes a multi-party privacy preserving machine learning framework, named PFMLP, based on partially homomorphic encryption and federated learning. The core idea is all learning parties just transmitting the encrypted gradients by homomorphic encryption. From experiments, the model trained by PFMLP has almost the same accuracy, and the deviation is less than 1%. Considering the computational overhead of homomorphic encryption, we use an improved Paillier algorithm which can speed up the training by 25–28%. Moreover, comparisons on encryption key length, the learning network structure, number of learning clients, etc. are also discussed in detail in the paper.


Sign in / Sign up

Export Citation Format

Share Document