Differentially Private Confidence Intervals for Empirical Risk Minimization

The process of data mining with differential privacy produces results that are affected by two types of noise: sampling noise due to data collection and privacy noise that is designed to prevent the reconstruction of sensitive information. In this paper, we consider the problem of designing confidence intervals for the parameters of a variety of differentially private machine learning models. The algorithms can provide confidence intervals that satisfy differential privacy (as well as the more recently proposed concentrated differential privacy) and can be used with existing differentially private mechanisms that train models using objective perturbation and output perturbation.

Download Full-text

A Logical Approach for Empirical Risk Minimization in Machine Learning for Data Stratification

Research Journal of Mathematics and Computer Science ◽

10.28933/rjmcs-2017-10-3002 ◽

2017 ◽

Author(s):

Taiwo, O. O. ◽

◽

Awodele O. ◽

Kuyoro, S. O. ◽

◽

...

Keyword(s):

Machine Learning ◽

Empirical Risk Minimization ◽

Risk Minimization ◽

Logical Approach ◽

Empirical Risk

Download Full-text

A Survey of Predicting Heart Disease

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.4.2.365 ◽

2020 ◽

Vol 4 (2) ◽

Author(s):

M Preethi ◽

J Selvakumar

Keyword(s):

Machine Learning ◽

Data Mining ◽

Cardiovascular Disease ◽

Heart Disease ◽

Medical System ◽

Learning Models ◽

Clinical Area ◽

Survey Paper ◽

Critical Challenge ◽

Machine Learning Models

This paper describes various methods of data mining, big data and machine learning models for predicting the heart disease. Data mining and machine learning plays an important role in building an important model for medical system to predict heart disease or cardiovascular disease. Medical experts can help the patients by detecting the cardiovascular disease before occurring. Now-a-days heart disease is one of the most significant causes of fatality. The prediction of heart disease is a critical challenge in the clinical area. But time to time, several techniques are discovered to predict the heart disease in data mining. In this survey paper, many techniques were described for predicting the heart disease.

Download Full-text

Extracting Knowledge in Large Synthetic Datasets Using Educational Data Mining and Machine Learning Models

Soft Computing for Intelligent Systems - Algorithms for Intelligent Systems ◽

10.1007/978-981-16-1048-6_13 ◽

2021 ◽

pp. 167-175

Author(s):

Jaikumar M. Patil ◽

Sunil R. Gupta

Keyword(s):

Machine Learning ◽

Data Mining ◽

Educational Data Mining ◽

Learning Models ◽

Synthetic Datasets ◽

Machine Learning Models

Download Full-text

Meta-Feature Based Data Mining Service Selection and Recommendation Using Machine Learning Models

2018 IEEE 15th International Conference on e-Business Engineering (ICEBE) ◽

10.1109/icebe.2018.00014 ◽

2018 ◽

Cited By ~ 1

Author(s):

Bayan Alghofaily ◽

Chen Ding

Keyword(s):

Machine Learning ◽

Data Mining ◽

Service Selection ◽

Learning Models ◽

Feature Based ◽

Machine Learning Models

Download Full-text

Target classification using machine learning approaches with applications to clinical studies

Biometrics & Biostatistics International Journal ◽

10.15406/bbij.2020.09.00305 ◽

2020 ◽

Vol 9 (3) ◽

pp. 91-95

Author(s):

Chen Qian ◽

Jayesh P. Rai ◽

Jianmin Pan ◽

Aruni Bhatnagar ◽

Craig J. McClain ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Framingham Heart Study ◽

Parametric Method ◽

Group Classification ◽

Learning Models ◽

Visual Data ◽

Visual Data Mining ◽

Heart Study ◽

Machine Learning Models

Machine learning has been a trending topic for which almost every research area would like to incorporate some of the technique in their studies. In this paper, we demonstrate several machine learning models using two different data sets. One data set is the thermograms time series data on a cancer study that was conducted at the University of Louisville Hospital, and the other set is from the world-renowned Framingham Heart Study. Thermograms can be used to determine a patient’s health status, yet the difficulty of analyzing such a high-dimensional dataset makes it rarely applied, especially in cancer research. Previously, Rai et al.1 proposed an approach for data reduction along with comparison between parametric method, non-parametric method (KNN), and semiparametric method (DTW-KNN) for group classification. They concluded that the performance of two-group classification is better than the three-group classification. In addition, the classifications between types of cancer are somewhat challenging. The Framingham Heart Study is a famous longitudinal dataset which includes risk factors that could potentially lead to the heart disease. Previously, Weng et al.2 and Alaa et al.3 concluded that machine learning could significantly improve the accuracy of cardiovascular risk prediction. Since the original Framingham data have been thoroughly analyzed, it would be interesting to see how machine learning models could improve prediction. In this manuscript, we further analyze both the thermogram and the Framingham Heart Study datasets with several learning models such as gradient boosting, neural network, and random forest by using SAS Visual Data Mining and Machine Learning on SAS Viya. Each method is briefly discussed along with a model comparison. Based on the Youden’s index and misclassification rate, we select the best learning model. For big data inference, SAS Visual Data Mining and Machine Learning on SAS Viya, a cloud computing and structured statistical solution, may become a choice of computing.

Download Full-text

Between Pure and Approximate Differential Privacy

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v7i2.648 ◽

2017 ◽

Vol 7 (2) ◽

Cited By ~ 6

Author(s):

Thomas Steinke ◽

Jonathan Ullman

Keyword(s):

Lower Bound ◽

Differential Privacy ◽

High Dimensional ◽

Empirical Risk Minimization ◽

Sample Complexity ◽

Risk Minimization ◽

Worst Case ◽

Logarithmic Factor ◽

Empirical Risk ◽

Statistical Queries

We show a new lower bound on the sample complexity of (ε,δ)-differentially private algorithms that accurately answer statistical queries on high-dimensional databases. The novelty of our bound is that it depends optimally on the parameter δ, which loosely corresponds to the probability that the algorithm fails to be private, and is the first to smoothly interpolate between approximate differential privacy (δ >0) and pure differential privacy (δ= 0). Specifically, we consider a database D ∈{±1}n×d and its one-way marginals, which are the d queries of the form “What fraction of individual records have the i-th bit set to +1?” We show that in order to answer all of these queries to within error ±α (on average) while satisfying (ε,δ)-differential privacy for some function δ such that δ≥2−o(n) and δ≤1/n1+Ω(1), it is necessary that \[n≥Ω (\frac{√dlog(1/δ)}{αε}).\] This bound is optimal up to constant factors. This lower bound implies similar new bounds for problems like private empirical risk minimization and private PCA. To prove our lower bound, we build on the connection between fingerprinting codes and lower bounds in differential privacy (Bun, Ullman, and Vadhan, STOC’14). In addition to our lower bound, we give new purely and approximately differentially private algorithms for answering arbitrary statistical queries that improve on the sample complexity of the standard Laplace and Gaussian mechanisms for achieving worst-case accuracy guarantees by a logarithmic factor.

Download Full-text

Shear stress distribution prediction in symmetric compound channels using data mining and machine learning models

Frontiers of Structural and Civil Engineering ◽

10.1007/s11709-020-0634-3 ◽

2020 ◽

Vol 14 (5) ◽

pp. 1097-1109

Author(s):

Zohreh Sheikh Khozani ◽

Khabat Khosravi ◽

Mohammadamin Torabi ◽

Amir Mosavi ◽

Bahram Rezaei ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Shear Stress ◽

Stress Distribution ◽

Shear Stress Distribution ◽

Learning Models ◽

Compound Channels ◽

Using Data ◽

Machine Learning Models

Download Full-text

Federated personalized random forest for human activity recognition

Mathematical Biosciences and Engineering ◽

10.3934/mbe.2022044 ◽

2021 ◽

Vol 19 (1) ◽

pp. 953-971

Author(s):

Songfeng Liu ◽

◽

Jinyan Wang ◽

Wenliang Zhang ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Activity Recognition ◽

Human Activity ◽

Differential Privacy ◽

Recognition Task ◽

Human Activity Recognition ◽

Locality Sensitive Hashing ◽

Learning Models ◽

Machine Learning Models

<abstract><p>User data usually exists in the organization or own local equipment in the form of data island. It is difficult to collect these data to train better machine learning models because of the General Data Protection Regulation (GDPR) and other laws. The emergence of federated learning enables users to jointly train machine learning models without exposing the original data. Due to the fast training speed and high accuracy of random forest, it has been applied to federated learning among several data institutions. However, for human activity recognition task scenarios, the unified model cannot provide users with personalized services. In this paper, we propose a privacy-protected federated personalized random forest framework, which considers to solve the personalized application of federated random forest in the activity recognition task. According to the characteristics of the activity recognition data, the locality sensitive hashing is used to calculate the similarity of users. Users only train with similar users instead of all users and the model is incrementally selected using the characteristics of ensemble learning, so as to train the model in a personalized way. At the same time, user privacy is protected through differential privacy during the training stage. We conduct experiments on commonly used human activity recognition datasets to analyze the effectiveness of our model.</p></abstract>

Download Full-text