Logistic regression model training based on the approximate homomorphic encryption

Background Data sharing in multicenter medical research can improve the generalizability of research, accelerate progress, enhance collaborations among institutions, and lead to new discoveries from data pooled from multiple sources. Despite these benefits, many medical institutions are unwilling to share their data, as sharing may cause sensitive information to be leaked to researchers, other institutions, and unauthorized users. Great progress has been made in the development of secure machine learning frameworks based on homomorphic encryption in recent years; however, nearly all such frameworks use a single secret key and lack a description of how to securely evaluate the trained model, which makes them impractical for multicenter medical applications. Objective The aim of this study is to provide a privacy-preserving machine learning protocol for multiple data providers and researchers (eg, logistic regression). This protocol allows researchers to train models and then evaluate them on medical data from multiple sources while providing privacy protection for both the sensitive data and the learned model. Methods We adapted a novel threshold homomorphic encryption scheme to guarantee privacy requirements. We devised new relinearization key generation techniques for greater scalability and multiplicative depth and new model training strategies for simultaneously training multiple models through x-fold cross-validation. Results Using a client-server architecture, we evaluated the performance of our protocol. The experimental results demonstrated that, with 10-fold cross-validation, our privacy-preserving logistic regression model training and evaluation over 10 attributes in a data set of 49,152 samples took approximately 7 minutes and 20 minutes, respectively. Conclusions We present the first privacy-preserving multiparty logistic regression model training and evaluation protocol based on threshold homomorphic encryption. Our protocol is practical for real-world use and may promote multicenter medical research to some extent.

Download Full-text

Web-Based Privacy-Preserving Multicenter Medical Data Analysis Tools Via Threshold Homomorphic Encryption: Design and Development Study (Preprint)

10.2196/preprints.22555 ◽

2020 ◽

Author(s):

Yao Lu ◽

Tianshu Zhou ◽

Yu Tian ◽

Shiqiang Zhu ◽

Jingsong Li

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Logistic Regression Model ◽

Cross Validation ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Medical Data ◽

Multiple Sources ◽

Model Training ◽

Fold Cross Validation

BACKGROUND Data sharing in multicenter medical research can improve the generalizability of research, accelerate progress, enhance collaborations among institutions, and lead to new discoveries from data pooled from multiple sources. Despite these benefits, many medical institutions are unwilling to share their data, as sharing may cause sensitive information to be leaked to researchers, other institutions, and unauthorized users. Great progress has been made in the development of secure machine learning frameworks based on homomorphic encryption in recent years; however, nearly all such frameworks use a single secret key and lack a description of how to securely evaluate the trained model, which makes them impractical for multicenter medical applications. OBJECTIVE The aim of this study is to provide a privacy-preserving machine learning protocol for multiple data providers and researchers (eg, logistic regression). This protocol allows researchers to train models and then evaluate them on medical data from multiple sources while providing privacy protection for both the sensitive data and the learned model. METHODS We adapted a novel threshold homomorphic encryption scheme to guarantee privacy requirements. We devised new relinearization key generation techniques for greater scalability and multiplicative depth and new model training strategies for simultaneously training multiple models through x-fold cross-validation. RESULTS Using a client-server architecture, we evaluated the performance of our protocol. The experimental results demonstrated that, with 10-fold cross-validation, our privacy-preserving logistic regression model training and evaluation over 10 attributes in a data set of 49,152 samples took approximately 7 minutes and 20 minutes, respectively. CONCLUSIONS We present the first privacy-preserving multiparty logistic regression model training and evaluation protocol based on threshold homomorphic encryption. Our protocol is practical for real-world use and may promote multicenter medical research to some extent.

Download Full-text

Multi-Party Privacy-Preserving Logistic Regression with Poor Quality Data Filtering for IoT Contributors

Electronics ◽

10.3390/electronics10172049 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2049

Author(s):

Kennedy Edemacu ◽

Jong Wook Kim

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Data Quality ◽

Logistic Regression Model ◽

Homomorphic Encryption ◽

Poor Quality ◽

Privacy Preserving ◽

Quality Data ◽

Data Filtering ◽

Poor Quality Data

Nowadays, the internet of things (IoT) is used to generate data in several application domains. A logistic regression, which is a standard machine learning algorithm with a wide application range, is built on such data. Nevertheless, building a powerful and effective logistic regression model requires large amounts of data. Thus, collaboration between multiple IoT participants has often been the go-to approach. However, privacy concerns and poor data quality are two challenges that threaten the success of such a setting. Several studies have proposed different methods to address the privacy concern but to the best of our knowledge, little attention has been paid towards addressing the poor data quality problems in the multi-party logistic regression model. Thus, in this study, we propose a multi-party privacy-preserving logistic regression framework with poor quality data filtering for IoT data contributors to address both problems. Specifically, we propose a new metric gradient similarity in a distributed setting that we employ to filter out parameters from data contributors with poor quality data. To solve the privacy challenge, we employ homomorphic encryption. Theoretical analysis and experimental evaluations using real-world datasets demonstrate that our proposed framework is privacy-preserving and robust against poor quality data.

Download Full-text

Independent predictors for functional outcome after drainage of chronic subdural hematoma identified using a logistic regression model

Journal of Neurosurgical Sciences ◽

10.23736/s0390-5616.17.04056-5 ◽

2020 ◽

Vol 64 (2) ◽

Cited By ~ 2

Author(s):

Sotirios Katsigiannis ◽

Christina Hamisch ◽

Boris Krischek ◽

Marco Timmer ◽

Anastasios Mpotsaris ◽

...

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Functional Outcome ◽

Subdural Hematoma ◽

Logistic Regression Model ◽

Chronic Subdural Hematoma

Download Full-text

Survey on turnover intention of scientific and technological workers based on the binary logistic regression model—a case study of XPCC

Information Management and Management Engineering ◽

10.2495/imme140591 ◽

2014 ◽

Author(s):

Zhui Liu ◽

Honglu Gou ◽

Lingying Kong

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Turnover Intention ◽

Logistic Regression Model ◽

Binary Logistic Regression ◽

Binary Logistic Regression Model

Download Full-text

The Effects of Major Stakeholders on SMME's Performance Turnaround -- Empirical Analysis Based on the Ordinal Logistic Regression Model

SSRN Electronic Journal ◽

10.2139/ssrn.2282214 ◽

2013 ◽

Author(s):

Li Qi

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Empirical Analysis ◽

Logistic Regression Model ◽

Ordinal Logistic Regression ◽

Ordinal Logistic Regression Model

Download Full-text

Logistic Regression Model for Business Failures Prediction of Technology Industry in Thailand

SSRN Electronic Journal ◽

10.2139/ssrn.2932026 ◽

2012 ◽

Author(s):

Sittichai Puagwatana

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Technology Industry ◽

Business Failures

Download Full-text

Logistic Regression Model of Relationship between Breast Cancer Pathology Diagnosis with Metastasis

Journal of Physics Conference Series ◽

10.1088/1742-6596/1752/1/012026 ◽

2021 ◽

Vol 1752 (1) ◽

pp. 012026

Author(s):

M N Bustan ◽

B Poerwanto

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Cancer Pathology ◽

Breast Cancer Pathology

Download Full-text

Work absence and multimorbidity in Portugal: results from the 1st National Health Examination Survey

European Journal of Public Health ◽

10.1093/eurpub/ckaa166.1390 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

J Matos ◽

C Matias Dias ◽

A Félix

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Chronic Diseases ◽

National Health ◽

Logistic Regression Model ◽

Health Examination ◽

Work Absence ◽

Health Examination Survey ◽

Absence From Work ◽

The Impact

Abstract Background Studies on the impact of patients with multimorbidity in the absence of work indicate that the number and type of chronic diseases may increase absenteeism and that the risk of absence from work is higher in people with two or more chronic diseases. This study analyzed the association between multimorbidity and greater frequency and duration of work absence in the portuguese population between the ages of 25 and 65 during 2015. Methods This is an epidemiological, observational, cross-sectional study with an analytical component that has its source of information from the 1st National Health Examination Survey. The study analyzed univariate, bivariate and multivariate variables under study. A multivariate logistic regression model was constructed. Results The prevalence of absenteeism was 55,1%. Education showed an association with absence of work (p = 0,0157), as well as professional activity (p = 0,0086). It wasn't possible to verify association between the presence of chronic diseases (p = 0,9358) or the presence of multimorbidity (p = 0,4309) with absence of work. The prevalence of multimorbidity was 31,8%. There was association between age (p < 0,0001), education (p < 0,001) and yield (p = 0,0009) and multimorbidity. There is no increase in the number of days of absence from work due to the increase in the number of chronic diseases. In the optimized logistic regression model the only variables that demonstrated association with the variable labor absence were age (p = 0,0391) and education (0,0089). Conclusions The scientific evidence generated will contribute to the current discussion on the need for the health and social security system to develop policies to patients with multimorbidity. Key messages The prevalence of absenteeism and multimorbidity in Portugal was respectively 55,1% and 31,8%. In the optimized model age and education demonstrated association with the variable labor absence.

Download Full-text

Logistic regression model with TreeNet and association rules analysis: applications with medical datasets

Communications in Statistics - Simulation and Computation ◽

10.1080/03610918.2021.1912764 ◽

2021 ◽

pp. 1-25

Author(s):

Pannapa Changpetch

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Association Rules ◽

Logistic Regression Model ◽

Association Rules Analysis

Download Full-text