SoK: Efficient Privacy-preserving Clustering

Abstract Clustering is a popular unsupervised machine learning technique that groups similar input elements into clusters. It is used in many areas ranging from business analysis to health care. In many of these applications, sensitive information is clustered that should not be leaked. Moreover, nowadays it is often required to combine data from multiple sources to increase the quality of the analysis as well as to outsource complex computation to powerful cloud servers. This calls for efficient privacy-preserving clustering. In this work, we systematically analyze the state-of-the-art in privacy-preserving clustering. We implement and benchmark today’s four most efficient fully private clustering protocols by Cheon et al. (SAC’19), Meng et al. (ArXiv’19), Mohassel et al. (PETS’20), and Bozdemir et al. (ASIACCS’21) with respect to communication, computation, and clustering quality. We compare them, assess their limitations for a practical use in real-world applications, and conclude with open challenges.

Download Full-text

Assessing the soil quality of Bansloi river basin, eastern India using soil-quality indices (SQIs) and Random Forest machine learning technique

Ecological Indicators ◽

10.1016/j.ecolind.2020.106804 ◽

2020 ◽

Vol 118 ◽

pp. 106804

Author(s):

Gopal Chandra Paul ◽

Sunil Saha ◽

Krishna Gopal Ghosh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Soil Quality ◽

River Basin ◽

Eastern India ◽

Quality Indices ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Steel Quality Prediction using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35407 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 1535-1538

Author(s):

Bhavesh Chaudhari

Keyword(s):

Machine Learning ◽

3D Printing ◽

19Th Century ◽

Quality Prediction ◽

Machine Learning Technique ◽

Nano Technology ◽

Learning Technique ◽

Conventional Methods ◽

Steel Industries

These days, just like other industries mechanical industries are also shifting towards the automation by using various techniques like machine learning, nano technology, 3D printing, etc. From 19th century steel has been widely used for construction purposes especially TMT rod(thermo mechanically treated rod).In steel industries conventional methods have been widely used for predicting the quality of steel.These conventional methods are not so accurate as well as some times they are unable to identify the errors along with this they consume a large amount of time. we have proposed a machine learning technique by which microstructures of steel are compared from any dataset of images, in order to find the differences and from the obtained differences ,the component which have less amount of defects can be obtained.

Download Full-text

Comparing Transport Quality Perception among Different Travellers in European Cities through Co-Cluster Analysis

Sustainability ◽

10.3390/su11247159 ◽

2019 ◽

Vol 11 (24) ◽

pp. 7159 ◽

Cited By ~ 1

Author(s):

Miriam Pirra ◽

Ruggero G. Pensa

Keyword(s):

Performance Monitoring ◽

Clustering Algorithm ◽

Support Service ◽

Transport Service ◽

Local Authorities ◽

Machine Learning Technique ◽

Common View ◽

Learning Technique ◽

The Common

The quality of the transport system offered at city level constitutes an important and challenging goal for society, for local authorities, and transport operators. Therefore, appropriate evaluation of travellers’ satisfaction is required to support service performance monitoring, benchmarking, and market analysis. This aspect implies the collection of satisfaction levels for different passengers’ groups, as it could provide interesting suggestions for identifying priority areas of action. To this end, an original study aimed at understanding the main aspects affecting the common view of satisfaction among different kinds of travellers at European level is presented in this paper. A specific survey investigating how travellers perceive the quality of their journey is proposed to people living in cities characterised by different sizes. Data are then analysed through a multi-view co-clustering algorithm, an innovative machine learning technique that highlights clusters of respondents grouped according to various categories of features. Such results could be used by local authorities and transport providers to understand the specific actions to be operated to improve the quality of transport service offered in a market segmentation dimension.

Download Full-text

Multi-Party Verifiable Privacy-Preserving Federated k-Means Clustering in Outsourced Environment

Security and Communication Networks ◽

10.1155/2021/3630312 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Ruiqi Hou ◽

Fei Tang ◽

Shikai Liang ◽

Guowei Ling

Keyword(s):

Hash Function ◽

Clustering Algorithm ◽

High Accuracy ◽

Privacy Preserving ◽

Sensitive Information ◽

Encrypted Data ◽

Multiple Data ◽

Private Data ◽

Data Volume ◽

Cloud Servers

As a commonly used algorithm in data mining, clustering has been widely applied in many fields, such as machine learning, information retrieval, and pattern recognition. In reality, data to be analyzed are often distributed to multiple parties. Moreover, the rapidly increasing data volume puts heavy computing pressure on data owners. Thus, data owners tend to outsource their own data to cloud servers and obtain data analysis results for the federated data. However, the existing privacy-preserving outsourced k -means schemes cannot verify whether participants share consistent data. Considering the scenarios with multiple data owners and sensitive information security in an outsourced environment, we propose a verifiable privacy-preserving federated k -means clustering scheme. In this article, cloud servers and participants perform k -means clustering algorithm over encrypted data without exposing private data and intermediate results in each iteration. In particular, our scheme can verify the shares from participants when updating the cluster centers based on secret sharing, hash function and blockchain, so that our scheme can resist inconsistent share attacks by malicious participants. Finally, the security and experimental analysis are carried out to show that our scheme can protect private data and get high-accuracy clustering results.

Download Full-text

Mapping MacNew Heart Disease Quality of Life Questionnaire onto country-specific EQ-5D-5L utility scores: a comparison of traditional regression models with a machine learning technique

The European Journal of Health Economics ◽

10.1007/s10198-020-01259-9 ◽

2021 ◽

Vol 22 (2) ◽

pp. 341-350

Author(s):

Lan Gao ◽

Wei Luo ◽

Utsana Tonmukayakul ◽

Marj Moodie ◽

Gang Chen

Keyword(s):

Quality Of Life ◽

Machine Learning ◽

Regression Models ◽

Life Questionnaire ◽

Machine Learning Technique ◽

Quality Of Life Questionnaire ◽

Learning Technique ◽

Country Specific ◽

Utility Scores

Download Full-text

Privacy-Preserving Federated Learning Using Homomorphic Encryption

Applied Sciences ◽

10.3390/app12020734 ◽

2022 ◽

Vol 12 (2) ◽

pp. 734

Author(s):

Jaehyoung Park ◽

Hyuk Lim

Keyword(s):

Private Information ◽

Privacy Preservation ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Local Model ◽

Model Parameters ◽

Local Data ◽

Machine Learning Technique ◽

Analysis And Evaluation ◽

Learning Technique

Federated learning (FL) is a machine learning technique that enables distributed devices to train a learning model collaboratively without sharing their local data. FL-based systems can achieve much stronger privacy preservation since the distributed devices deliver only local model parameters trained with local data to a centralized server. However, there exists a possibility that a centralized server or attackers infer/extract sensitive private information using the structure and parameters of local learning models. We propose employing homomorphic encryption (HE) scheme that can directly perform arithmetic operations on ciphertexts without decryption to protect the model parameters. Using the HE scheme, the proposed privacy-preserving federated learning (PPFL) algorithm enables the centralized server to aggregate encrypted local model parameters without decryption. Furthermore, the proposed algorithm allows each node to use a different HE private key in the same FL-based system using a distributed cryptosystem. The performance analysis and evaluation of the proposed PPFL algorithm are conducted in various cloud computing-based FL service scenarios.

Download Full-text

The Importance of Syntactic Parsing and Inference in Semantic Role Labeling

Computational Linguistics ◽

10.1162/coli.2008.34.2.257 ◽

2008 ◽

Vol 34 (2) ◽

pp. 257-287 ◽

Cited By ~ 87

Author(s):

Vasin Punyakanok ◽

Dan Roth ◽

Wen-tau Yih

Keyword(s):

Structural Constraints ◽

Semantic Role ◽

Syntactic Parsing ◽

Semantic Role Labeling ◽

Inference Procedure ◽

Machine Learning Technique ◽

Joint Inference ◽

Learning Technique

We present a general framework for semantic role labeling. The framework combines a machine-learning technique with an integer linear programming-based inference procedure, which incorporates linguistic and structural constraints into a global decision process. Within this framework, we study the role of syntactic parsing information in semantic role labeling. We show that full syntactic parsing information is, by far, most relevant in identifying the argument, especially, in the very first stage—the pruning stage. Surprisingly, the quality of the pruning stage cannot be solely determined based on its recall and precision. Instead, it depends on the characteristics of the output candidates that determine the difficulty of the downstream problems. Motivated by this observation, we propose an effective and simple approach of combining different semantic role labeling systems through joint inference, which significantly improves its performance. Our system has been evaluated in the CoNLL-2005 shared task on semantic role labeling, and achieves the highest F1 score among 19 participants.

Download Full-text

A systematic review on privacy-preserving distributed data mining

Data Science ◽

10.3233/ds-210036 ◽

2021 ◽

Vol 4 (2) ◽

pp. 121-150

Author(s):

Chang Sun ◽

Lianne Ippel ◽

Andre Dekker ◽

Michel Dumontier ◽

Johan van Soest

Keyword(s):

Systematic Review ◽

Data Mining ◽

Real Life ◽

Past Research ◽

Privacy Preserving ◽

Distributed Data Mining ◽

Sensitive Information ◽

Distributed Data ◽

Multiple Sources ◽

Privacy And Security

Combining and analysing sensitive data from multiple sources offers considerable potential for knowledge discovery. However, there are a number of issues that pose problems for such analyses, including technical barriers, privacy restrictions, security concerns, and trust issues. Privacy-preserving distributed data mining techniques (PPDDM) aim to overcome these challenges by extracting knowledge from partitioned data while minimizing the release of sensitive information. This paper reports the results and findings of a systematic review of PPDDM techniques from 231 scientific articles published in the past 20 years. We summarize the state of the art, compare the problems they address, and identify the outstanding challenges in the field. This review identifies the consequence of the lack of standard criteria to evaluate new PPDDM methods and proposes comprehensive evaluation criteria with 10 key factors. We discuss the ambiguous definitions of privacy and confusion between privacy and security in the field, and provide suggestions of how to make a clear and applicable privacy description for new PPDDM techniques. The findings from our review enhance the understanding of the challenges of applying theoretical PPDDM methods to real-life use cases, and the importance of involving legal-ethical and social experts in implementing PPDDM methods. This comprehensive review will serve as a helpful guide to past research and future opportunities in the area of PPDDM.

Download Full-text

What Should Investors Care About? Mutual Fund Ratings by Analysts vs. Machine Learning Technique

SSRN Electronic Journal ◽

10.2139/ssrn.3702749 ◽

2020 ◽

Author(s):

Si Cheng ◽

Ruichang Lu ◽

Xiaojun Zhang

Keyword(s):

Machine Learning ◽

Mutual Fund ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

The Development of a Quantitative Precipitation Forecast Correction Technique Based on Machine Learning for Hydrological Applications

Atmosphere ◽

10.3390/atmos11010111 ◽

2020 ◽

Vol 11 (1) ◽

pp. 111 ◽

Cited By ~ 2

Author(s):

Chul-Min Ko ◽

Yeong Yun Jeong ◽

Young-Mi Lee ◽

Byung-Sik Kim

Keyword(s):

Machine Learning ◽

Heavy Rainfall ◽

Extreme Rainfall ◽

Machine Learning Techniques ◽

Precipitation Forecast ◽

Machine Learning Technique ◽

Rainfall Forecast ◽

Quantitative Precipitation Forecast ◽

Correction Technique ◽

Learning Technique

This study aimed to enhance the accuracy of extreme rainfall forecast, using a machine learning technique for forecasting hydrological impact. In this study, machine learning with XGBoost technique was applied for correcting the quantitative precipitation forecast (QPF) provided by the Korea Meteorological Administration (KMA) to develop a hydrological quantitative precipitation forecast (HQPF) for flood inundation modeling. The performance of machine learning techniques for HQPF production was evaluated with a focus on two cases: one for heavy rainfall events in Seoul and the other for heavy rainfall accompanied by Typhoon Kong-rey (1825). This study calculated the well-known statistical metrics to compare the error derived from QPF-based rainfall and HQPF-based rainfall against the observational data from the four sites. For the heavy rainfall case in Seoul, the mean absolute errors (MAE) of the four sites, i.e., Nowon, Jungnang, Dobong, and Gangnam, were 18.6 mm/3 h, 19.4 mm/3 h, 48.7 mm/3 h, and 19.1 mm/3 h for QPF and 13.6 mm/3 h, 14.2 mm/3 h, 33.3 mm/3 h, and 12.0 mm/3 h for HQPF, respectively. These results clearly indicate that the machine learning technique is able to improve the forecasting performance for localized rainfall. In addition, the HQPF-based rainfall shows better performance in capturing the peak rainfall amount and spatial pattern. Therefore, it is considered that the HQPF can be helpful to improve the accuracy of intense rainfall forecast, which is subsequently beneficial for forecasting floods and their hydrological impacts.

Download Full-text