Study of a Privacy Preserving Logistic Regression Algorithm (PPLRA) For Data Privacy in the Context of Big Data

Qiang Chen; Meiling Deng

doi:10.1088/1742-6596/2083/3/032059

Study of a Privacy Preserving Logistic Regression Algorithm (PPLRA) For Data Privacy in the Context of Big Data

Journal of Physics Conference Series ◽

10.1088/1742-6596/2083/3/032059 ◽

2021 ◽

Vol 2083 (3) ◽

pp. 032059

Author(s):

Qiang Chen ◽

Meiling Deng

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Privacy Protection ◽

Data Privacy ◽

Absolute Error ◽

Average Absolute Error ◽

Regression Algorithms ◽

Hadoop Platform ◽

Logistic Regression Algorithm ◽

Computing Speed

Abstract Regression algorithms are commonly used in machine learning. Based on encryption and privacy protection methods, the current key hot technology regression algorithm and the same encryption technology are studied. This paper proposes a PPLAR based algorithm. The correlation between data items is obtained by logistic regression formula. The algorithm is distributed and parallelized on Hadoop platform to improve the computing speed of the cluster while ensuring the average absolute error of the algorithm.

Download Full-text

Using a Machine Learning Logistic Regression Algorithm to Classify Nanomedicine Clinical Trials in a Known Repository

Communications in Computer and Information Science - Computer and Communication Engineering ◽

10.1007/978-3-030-12018-4_8 ◽

2019 ◽

pp. 98-110

Author(s):

Charles M. Pérez-Espinoza ◽

Nuvia Beltran-Robayo ◽

Teresa Samaniego-Cobos ◽

Abel Alarcón-Salvatierra ◽

Ana Rodriguez-Mendez ◽

...

Keyword(s):

Machine Learning ◽

Clinical Trials ◽

Logistic Regression ◽

Logistic Regression Algorithm

Download Full-text

Deep Learning Technique to Predict Heart Disease using IoT Based ECG Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7166.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2559-2562

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Rule Based ◽

Learning Techniques ◽

Learning Technique ◽

Logistic Regression Algorithm ◽

Target Output

Scientific Knowledge and Electronic devices are growing day by day. In this aspect, many expert systems are involved in the healthcare industry using machine learning algorithms. Deep neural networks beat the machine learning techniques and often take raw data i.e., unrefined data to calculate the target output. Deep learning or feature learning is used to focus on features which is very important and gives a complete understanding of the model generated. Existing methodology used data mining technique like rule based classification algorithm and machine learning algorithm like hybrid logistic regression algorithm to preprocess data and extract meaningful insights of data. This is, however a supervised data. The proposed work is based on unsupervised data that is there is no labelled data and deep neural techniques is deployed to get the target output. Machine learning algorithms are compared with proposed deep learning techniques using TensorFlow and Keras in the aspect of accuracy. Deep learning methodology outfits the existing rule based classification and hybrid logistic regression algorithm in terms of accuracy. The designed methodology is tested on the public MIT-BIH arrhythmia database, classifying four kinds of abnormal beats. The proposed approach based on deep learning technique offered a better performance, improving the results when compared to machine learning approaches of the state-of-the-art

Download Full-text

When Machine Learning Meets Privacy

ACM Computing Surveys ◽

10.1145/3436755 ◽

2021 ◽

Vol 54 (2) ◽

pp. 1-36

Author(s):

Bo Liu ◽

Ming Ding ◽

Sina Shaham ◽

Wenny Rahayu ◽

Farhad Farokhi ◽

...

Keyword(s):

Machine Learning ◽

Privacy Protection ◽

Data Privacy ◽

Privacy Preservation ◽

Research Progress ◽

Future Research ◽

Surveillance Systems ◽

Smart Healthcare ◽

Wide Range ◽

Depth Analysis

The newly emerged machine learning (e.g., deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the context of machine learning is quite different from that in traditional data privacy protection, as machine learning can act as both friend and foe. Currently, the work on the preservation of privacy and machine learning are still in an infancy stage, as most existing solutions only focus on privacy problems during the machine learning process. Therefore, a comprehensive study on the privacy preservation problems and machine learning is required. This article surveys the state of the art in privacy issues and solutions for machine learning. The survey covers three categories of interactions between privacy and machine learning: (i) private machine learning, (ii) machine learning-aided privacy protection, and (iii) machine learning-based privacy attack and corresponding protection schemes. The current research progress in each category is reviewed and the key challenges are identified. Finally, based on our in-depth analysis of the area of privacy and machine learning, we point out future research directions in this field.

Download Full-text

Regressor Fitting Of Feature Importance For Customer Segment Prediction With Ensembling Schemes Using Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8255.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 952-956 ◽

Cited By ~ 2

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Manufacturing Companies ◽

Data Set ◽

Feature Importance ◽

Customer Segment ◽

Feature Scaling

Prediction of client behavior and their feedback remains as a challenging task in today’s world for all the manufacturing companies. The companies are struggling to increase their profit and annual turnover due to the lack of exact prediction of customer like and dislike. This leads to the accomplishment of machine learning algorithms for the prediction of customer demands. This paper attempts to identify the important features of the wine data set extracted from UCI Machine learning repository for the prediction of customer segment. The important features are extracted for the various ensembling methods like Ada boost regressor, Ada boost classifier, Random forest regressor, Extra Trees Regressor, Gradient booster regressor. The extracted feature importance of each of the ensembling methods is then fitted with logistic regression to analyze the performance. The same extracted feature importance of each of the ensembling methods are subjected to feature scaling and then fitted with logistic regression to analyze the performance. The Performance analysis is done with the performance metric such as Mean Squared error (MSE), Mean Absolute error (MAE), R2 Score, Explained Variance Score (EVS) and Mean Squared Log Error (MSLE). Experimental results shows that after applying feature scaling, the feature importance extracted from the Extra Tree Regressor is found to be effective with the MSE of 0.04, MAE of 0.03, R2 Score of 94%, EVS of 0.9 and MSLE of 0.01 as compared to other ensembling methods.

Download Full-text

Machine Learning Algorithm’s Measurement and Analytical Visualization of User’s Reviews for Google Play Store

10.20944/preprints202003.0249.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Abdul Karim ◽

Azhari Azhari ◽

Samir Brahim Belhaouri ◽

Ali Adil Qureshi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Ve Bayes ◽

Android Apps ◽

Online Marketplace ◽

Document Frequency ◽

Logistic Regression Algorithm ◽

Google Play

The fact is quite transparent that almost everybody around the world is using android apps. Half of the population of this planet is associated with messaging, social media, gaming, and browsers. This online marketplace provides free and paid access to users. On the Google Play store, users are encouraged to download countless of applications belonging to predefined categories. In this research paper, we have scrapped thousands of users reviews and app ratings. We have scrapped 148 apps’ reviews from 14 categories. We have collected 506259 reviews from Google play store and subsequently checked the semantics of reviews about some applications form users to determine whether reviews are positive, negative, or neutral. We have evaluated the results by using different machine learning algorithms like Naïve Bayes, Random Forest, and Logistic Regression algorithm. we have calculated Term Frequency (TF) and Inverse Document Frequency (IDF) with different parameters like accuracy, precision, recall, and F1 and compared the statistical result of these algorithms. We have visualized these statistical results in the form of a bar chart. In this paper, the analysis of each algorithm is performed one by one, and the results have been compared. Eventually, We've discovered that Logistic Regression is the best algorithm for a review-analysis of all Google play store. We have proved that Logistic Regression gets the speed of precision, accuracy, recall, and F1 in both after preprocessing and data collection of this dataset.

Download Full-text

Application of the Preoperative Assistant System Based on Machine Learning in Hepatocellular Carcinoma Resection

Journal of Healthcare Engineering ◽

10.1155/2021/4757668 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Shouyun Lv ◽

Shizong Li ◽

Zhiwei Yu ◽

Kaiqiong Wang ◽

Xin Qiao ◽

...

Keyword(s):

Machine Learning ◽

Hepatocellular Carcinoma ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Liver Volume ◽

Surgical Margin ◽

Extensive Hepatectomy ◽

Significant Difference ◽

Logistic Regression Algorithm

To conduct better research in hepatocellular carcinoma resection, this paper used 3D machine learning and logistic regression algorithm to study the preoperative assistance of patients undergoing hepatectomy. In this study, the logistic regression model was analyzed to find the influencing factors for the survival and recurrence of patients. The clinical data of 50 HCC patients who underwent extensive hepatectomy (≥4 segments of the liver) admitted to our hospital from June 2020 to December 2020 were selected to calculate the liver volume, simulated surgical resection volume, residual liver volume, surgical margin, etc. The results showed that the simulated liver volume of 50 patients was 845.2 + 285.5 mL, and the actual liver volume of 50 patients was 826.3 ± 268.1 mL, and there was no significant difference between the two groups (t = 0.425; P > 0.05). Compared with the logistic regression model, the machine learning method has a better prediction effect, but the logistic regression model has better interpretability. The analysis of the relationship between the liver tumour and hepatic vessels in practical problems has specific clinical application value for accurately evaluating the volume of liver resection and surgical margin.

Download Full-text

Evaluating Resilience of Infrastructures Towards Endogenous Events by Non-Destructive High-Performance Techniques and Machine Learning Regression Algorithms

10.5194/egusphere-egu2020-21183 ◽

2020 ◽

Author(s):

Nicholas Fiorentini ◽

Pietro Leandri ◽

Massimo Losa

Keyword(s):

Machine Learning ◽

High Performance ◽

Goodness Of Fit ◽

Performance Metrics ◽

Predictive Performance ◽

Absolute Error ◽

Road Maintenance ◽

Endogenous Factors ◽

Regression Algorithms ◽

Non Destructive

In order to plan infrastructure maintenance strategies, Non-Destructive Techniques (NDT) have been largely employed in recent years, achieving outstanding results in the identification of infrastructural deficiencies. Nevertheless, the extensive combination of different NDT that can cover various factors affecting infrastructure durability has not yet been thoroughly investigated.This paper proposes a methodology for evaluating the resilience of infrastructures towards endogenous factors by combining different NDT outcomes. Machine Learning (ML) Regression algorithms have been used to predict the pavement surface roughness connected to a set of potential endogenous conditioning factors. The development, application, and comparison of two different regression algorithms, specifically Regression Tree (RT) and Random Forest (RF) have been carried out.The study area involves 4 testing sites, both in the rural and urban context, for a total length of 11400 m. In addition to the International Roughness Index (IRI) calculated by profilometric measurements, a set of endogenous features of the infrastructure were collected by using NDT such as Falling Weight Deflectometer (FWD), and Ground Penetrating Radar (GPR). Moreover, a set of topographical data of roadside areas, information on properties of materials composing the subgrade and the pavement structure, traffic flow, rainfall, temperature, and age of infrastructure were gathered.The database was randomly split into a Training (70%) and Test sets (30%). With the Training set, through a 10-Fold Cross-Validation (CV), the models have been trained and validated. A set of three performance metrics, namely Correlation Coefficient (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MSE), has been used for the Goodness-of-Fit (GoF) assessment. Also, with the Test set, the Predictive Performance (PP) of the models has been evaluated.Results indicate that the suggested methodology is satisfactory for supporting processes on planning road maintenance by National Road Authorities (NRA) and allows decision-makers to pursue better solutions.

Download Full-text

Logistic Regression Prediction Model for Cardiovascular Disease

International Journal of New Media Technology ◽

10.31937/ijnmt.v7i1.1340 ◽

2020 ◽

Vol 7 (1) ◽

pp. 33-38

Author(s):

Tania Ciu ◽

Raymond Sunardi Oetama

Keyword(s):

Cardiovascular Disease ◽

Logistic Regression ◽

Analysis Data ◽

Research Area ◽

Unhealthy Lifestyle ◽

Patient Medical Record ◽

Regression Algorithms ◽

Main Factors ◽

Class Labels ◽

Logistic Regression Algorithm

— It is undeniable that cardiovascular disease is the number one cause of death in the world. Various factors such as age, cholesterol level, and unhealthy lifestyle can trigger cardiovascular disease. The symptoms of cardiovascular disease are also challenging to identify. It takes careful understanding and analysis related to patient medical record data and identification of the parameters that cause this disease. This study was conducted to predict the main factors causing cardiovascular disease. In this study, a dataset consisting of 14 attributes with class labels was used as the basis for identification as a link between factors that cause cardiovascular disease. The research area used is the area of analysis data where the analyzed data are on factors that influence the presence of cardiovascular disease in the State of Cleveland. In predicting cardiovascular disease, a logistic regression algorithm will be used to see the interrelation between the dependent variable and the independent variables involved. With this research, it is expected to be able to increase readers' knowledge and insight related to how to analyze cardiovascular disease using logistic regression algorithms and the main factors that cause cardiovascular disease.

Download Full-text

Application of Logistic Regression Methods to Retinal Damage Detection on Digital Fundus Images

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206217 ◽

2020 ◽

pp. 103-109

Author(s):

Umniy Salamah

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Damage Detection ◽

Service Providers ◽

Learning Algorithm ◽

Care Service ◽

Diabetic Patients ◽

Number Of Patients ◽

Classification Evaluation ◽

Logistic Regression Algorithm

The predictions about the number of people with diabetes will be increased which leads to a reduced balanced ratio between the quality of the eye care service providers with the number of patients. The alternative to solve this problem is to provide early detection service for the last condition of eye health in the diabetic patients. To detect the damage of the retina can be done help machine learning algorithm of the logistics regression. The justification for selection the logistic regression algorithm for retina damage detection in this research is that it has been widely used in a variety of machine learning problems where LR can describe the response variables with one or more variables predictors well. The methodology of research contained five phases, including preparation, feature extraction, normalization, classification, evaluation for processing dataset of digital fundus image were provided by EyePACS using scikit-learn as machine learning library and the Python as programming language. As result, we found the accuracy of retina damage detection using logistic regression is 0.7392 with following by F1-score 0.6317, Recall 0.7392, Precision 0.6043 and Kappa 0.0051.

Download Full-text

Machine Learning Based Prediction and Prevention of Malicious Inventory Occupied Orders

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2014100104 ◽

2014 ◽

Vol 6 (4) ◽

pp. 56-72

Author(s):

Qinghong Yang ◽

Xiangquan Hu ◽

Zhichao Cheng ◽

Kang Miao

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Case Studies ◽

Best Practice ◽

Learning Technology ◽

Prediction And Prevention ◽

Sample Data ◽

Logistic Regression Algorithm ◽

Technical Solutions

MIOOs are orders created temporarily for the purpose of occupying the inventories of sellers. MIOOs disrupt normal business activities and harm both sellers and consumers. This study aims to determine the best practice and model of the technical solutions that can effectively and systematically limit malicious inventory occupied orders (MIOOs), using the methods of analytical mining and case studies. This work contains three contributions. Firstly, this work solves MIOOs problem by using machine learning technology. The result of the study indicates that 93% of MIOOs from the sample data are actually predictable and preventable. Secondly, this work presents a methodology of solving MIOOs problem which can be applied by other companies. The methodology in this paper consists of four major steps, namely doing statistics concerning MIOOs, using logistic regression algorithm to train a mode, optimizing the model, and applying the model. Finally, this work finds unique features of MIOOs, and they can help better understanding the behind logic of MIOO producers.

Download Full-text