scholarly journals Study of a Privacy Preserving Logistic Regression Algorithm (PPLRA) For Data Privacy in the Context of Big Data

2021 ◽  
Vol 2083 (3) ◽  
pp. 032059
Author(s):  
Qiang Chen ◽  
Meiling Deng

Abstract Regression algorithms are commonly used in machine learning. Based on encryption and privacy protection methods, the current key hot technology regression algorithm and the same encryption technology are studied. This paper proposes a PPLAR based algorithm. The correlation between data items is obtained by logistic regression formula. The algorithm is distributed and parallelized on Hadoop platform to improve the computing speed of the cluster while ensuring the average absolute error of the algorithm.

Author(s):  
Charles M. Pérez-Espinoza ◽  
Nuvia Beltran-Robayo ◽  
Teresa Samaniego-Cobos ◽  
Abel Alarcón-Salvatierra ◽  
Ana Rodriguez-Mendez ◽  
...  

Scientific Knowledge and Electronic devices are growing day by day. In this aspect, many expert systems are involved in the healthcare industry using machine learning algorithms. Deep neural networks beat the machine learning techniques and often take raw data i.e., unrefined data to calculate the target output. Deep learning or feature learning is used to focus on features which is very important and gives a complete understanding of the model generated. Existing methodology used data mining technique like rule based classification algorithm and machine learning algorithm like hybrid logistic regression algorithm to preprocess data and extract meaningful insights of data. This is, however a supervised data. The proposed work is based on unsupervised data that is there is no labelled data and deep neural techniques is deployed to get the target output. Machine learning algorithms are compared with proposed deep learning techniques using TensorFlow and Keras in the aspect of accuracy. Deep learning methodology outfits the existing rule based classification and hybrid logistic regression algorithm in terms of accuracy. The designed methodology is tested on the public MIT-BIH arrhythmia database, classifying four kinds of abnormal beats. The proposed approach based on deep learning technique offered a better performance, improving the results when compared to machine learning approaches of the state-of-the-art


2021 ◽  
Vol 54 (2) ◽  
pp. 1-36
Author(s):  
Bo Liu ◽  
Ming Ding ◽  
Sina Shaham ◽  
Wenny Rahayu ◽  
Farhad Farokhi ◽  
...  

The newly emerged machine learning (e.g., deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the context of machine learning is quite different from that in traditional data privacy protection, as machine learning can act as both friend and foe. Currently, the work on the preservation of privacy and machine learning are still in an infancy stage, as most existing solutions only focus on privacy problems during the machine learning process. Therefore, a comprehensive study on the privacy preservation problems and machine learning is required. This article surveys the state of the art in privacy issues and solutions for machine learning. The survey covers three categories of interactions between privacy and machine learning: (i) private machine learning, (ii) machine learning-aided privacy protection, and (iii) machine learning-based privacy attack and corresponding protection schemes. The current research progress in each category is reviewed and the key challenges are identified. Finally, based on our in-depth analysis of the area of privacy and machine learning, we point out future research directions in this field.


Prediction of client behavior and their feedback remains as a challenging task in today’s world for all the manufacturing companies. The companies are struggling to increase their profit and annual turnover due to the lack of exact prediction of customer like and dislike. This leads to the accomplishment of machine learning algorithms for the prediction of customer demands. This paper attempts to identify the important features of the wine data set extracted from UCI Machine learning repository for the prediction of customer segment. The important features are extracted for the various ensembling methods like Ada boost regressor, Ada boost classifier, Random forest regressor, Extra Trees Regressor, Gradient booster regressor. The extracted feature importance of each of the ensembling methods is then fitted with logistic regression to analyze the performance. The same extracted feature importance of each of the ensembling methods are subjected to feature scaling and then fitted with logistic regression to analyze the performance. The Performance analysis is done with the performance metric such as Mean Squared error (MSE), Mean Absolute error (MAE), R2 Score, Explained Variance Score (EVS) and Mean Squared Log Error (MSLE). Experimental results shows that after applying feature scaling, the feature importance extracted from the Extra Tree Regressor is found to be effective with the MSE of 0.04, MAE of 0.03, R2 Score of 94%, EVS of 0.9 and MSLE of 0.01 as compared to other ensembling methods.


Author(s):  
Abdul Karim ◽  
Azhari Azhari ◽  
Samir Brahim Belhaouri ◽  
Ali Adil Qureshi

The fact is quite transparent that almost everybody around the world is using android apps. Half of the population of this planet is associated with messaging, social media, gaming, and browsers. This online marketplace provides free and paid access to users. On the Google Play store, users are encouraged to download countless of applications belonging to predefined categories. In this research paper, we have scrapped thousands of users reviews and app ratings. We have scrapped 148 apps’ reviews from 14 categories. We have collected 506259 reviews from Google play store and subsequently checked the semantics of reviews about some applications form users to determine whether reviews are positive, negative, or neutral. We have evaluated the results by using different machine learning algorithms like Naïve Bayes, Random Forest, and Logistic Regression algorithm. we have calculated Term Frequency (TF) and Inverse Document Frequency (IDF) with different parameters like accuracy, precision, recall, and F1 and compared the statistical result of these algorithms. We have visualized these statistical results in the form of a bar chart. In this paper, the analysis of each algorithm is performed one by one, and the results have been compared. Eventually, We've discovered that Logistic Regression is the best algorithm for a review-analysis of all Google play store. We have proved that Logistic Regression gets the speed of precision, accuracy, recall, and F1 in both after preprocessing and data collection of this dataset.


2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Shouyun Lv ◽  
Shizong Li ◽  
Zhiwei Yu ◽  
Kaiqiong Wang ◽  
Xin Qiao ◽  
...  

To conduct better research in hepatocellular carcinoma resection, this paper used 3D machine learning and logistic regression algorithm to study the preoperative assistance of patients undergoing hepatectomy. In this study, the logistic regression model was analyzed to find the influencing factors for the survival and recurrence of patients. The clinical data of 50 HCC patients who underwent extensive hepatectomy (≥4 segments of the liver) admitted to our hospital from June 2020 to December 2020 were selected to calculate the liver volume, simulated surgical resection volume, residual liver volume, surgical margin, etc. The results showed that the simulated liver volume of 50 patients was 845.2 + 285.5 mL, and the actual liver volume of 50 patients was 826.3 ± 268.1 mL, and there was no significant difference between the two groups (t = 0.425; P  > 0.05). Compared with the logistic regression model, the machine learning method has a better prediction effect, but the logistic regression model has better interpretability. The analysis of the relationship between the liver tumour and hepatic vessels in practical problems has specific clinical application value for accurately evaluating the volume of liver resection and surgical margin.


2020 ◽  
Author(s):  
Nicholas Fiorentini ◽  
Pietro Leandri ◽  
Massimo Losa

<p>In order to plan infrastructure maintenance strategies, Non-Destructive Techniques (NDT) have been largely employed in recent years, achieving outstanding results in the identification of infrastructural deficiencies. Nevertheless, the extensive combination of different NDT that can cover various factors affecting infrastructure durability has not yet been thoroughly investigated.</p><p>This paper proposes a methodology for evaluating the resilience of infrastructures towards endogenous factors by combining different NDT outcomes. Machine Learning (ML) Regression algorithms have been used to predict the pavement surface roughness connected to a set of potential endogenous conditioning factors. The development, application, and comparison of two different regression algorithms, specifically Regression Tree (RT) and Random Forest (RF) have been carried out.</p><p>The study area involves 4 testing sites, both in the rural and urban context, for a total length of 11400 m. In addition to the International Roughness Index (IRI) calculated by profilometric measurements, a set of endogenous features of the infrastructure were collected by using NDT such as Falling Weight Deflectometer (FWD), and Ground Penetrating Radar (GPR). Moreover, a set of topographical data of roadside areas, information on properties of materials composing the subgrade and the pavement structure, traffic flow, rainfall, temperature, and age of infrastructure were gathered.</p><p>The database was randomly split into a Training (70%) and Test sets (30%). With the Training set, through a 10-Fold Cross-Validation (CV), the models have been trained and validated. A set of three performance metrics, namely Correlation Coefficient (R<sup>2</sup>), Root Mean Square Error (RMSE), and Mean Absolute Error (MSE), has been used for the Goodness-of-Fit (GoF) assessment. Also, with the Test set, the Predictive Performance (PP) of the models has been evaluated.</p><p>Results indicate that the suggested methodology is satisfactory for supporting processes on planning road maintenance by National Road Authorities (NRA) and allows decision-makers to pursue better solutions.</p>


2020 ◽  
Vol 7 (1) ◽  
pp. 33-38
Author(s):  
Tania Ciu ◽  
Raymond Sunardi Oetama

— It is undeniable that cardiovascular disease is the number one cause of death in the world. Various factors such as age, cholesterol level, and unhealthy lifestyle can trigger cardiovascular disease. The symptoms of cardiovascular disease are also challenging to identify. It takes careful understanding and analysis related to patient medical record data and identification of the parameters that cause this disease. This study was conducted to predict the main factors causing cardiovascular disease. In this study, a dataset consisting of 14 attributes with class labels was used as the basis for identification as a link between factors that cause cardiovascular disease. The research area used is the area of ​​analysis data where the analyzed data are on factors that influence the presence of cardiovascular disease in the State of Cleveland. In predicting cardiovascular disease, a logistic regression algorithm will be used to see the interrelation between the dependent variable and the independent variables involved. With this research, it is expected to be able to increase readers' knowledge and insight related to how to analyze cardiovascular disease using logistic regression algorithms and the main factors that cause cardiovascular disease.


Author(s):  
Umniy Salamah

The predictions about the number of people with diabetes will be increased which leads to a reduced balanced ratio between the quality of the eye care service providers with the number of patients. The alternative to solve this problem is to provide early detection service for the last condition of eye health in the diabetic patients. To detect the damage of the retina can be done help machine learning algorithm of the logistics regression. The justification for selection the logistic regression algorithm for retina damage detection in this research is that it has been widely used in a variety of machine learning problems where LR can describe the response variables with one or more variables predictors well. The methodology of research contained five phases, including preparation, feature extraction, normalization, classification, evaluation for processing dataset of digital fundus image were provided by EyePACS using scikit-learn as machine learning library and the Python as programming language. As result, we found the accuracy of retina damage detection using logistic regression is 0.7392 with following by F1-score 0.6317, Recall 0.7392, Precision 0.6043 and Kappa 0.0051.


Author(s):  
Qinghong Yang ◽  
Xiangquan Hu ◽  
Zhichao Cheng ◽  
Kang Miao

MIOOs are orders created temporarily for the purpose of occupying the inventories of sellers. MIOOs disrupt normal business activities and harm both sellers and consumers. This study aims to determine the best practice and model of the technical solutions that can effectively and systematically limit malicious inventory occupied orders (MIOOs), using the methods of analytical mining and case studies. This work contains three contributions. Firstly, this work solves MIOOs problem by using machine learning technology. The result of the study indicates that 93% of MIOOs from the sample data are actually predictable and preventable. Secondly, this work presents a methodology of solving MIOOs problem which can be applied by other companies. The methodology in this paper consists of four major steps, namely doing statistics concerning MIOOs, using logistic regression algorithm to train a mode, optimizing the model, and applying the model. Finally, this work finds unique features of MIOOs, and they can help better understanding the behind logic of MIOO producers.


Sign in / Sign up

Export Citation Format

Share Document