scholarly journals SoK: Efficient Privacy-preserving Clustering

2021 ◽  
Vol 2021 (4) ◽  
pp. 225-248
Author(s):  
Aditya Hegde ◽  
Helen Möllering ◽  
Thomas Schneider ◽  
Hossein Yalame

Abstract Clustering is a popular unsupervised machine learning technique that groups similar input elements into clusters. It is used in many areas ranging from business analysis to health care. In many of these applications, sensitive information is clustered that should not be leaked. Moreover, nowadays it is often required to combine data from multiple sources to increase the quality of the analysis as well as to outsource complex computation to powerful cloud servers. This calls for efficient privacy-preserving clustering. In this work, we systematically analyze the state-of-the-art in privacy-preserving clustering. We implement and benchmark today’s four most efficient fully private clustering protocols by Cheon et al. (SAC’19), Meng et al. (ArXiv’19), Mohassel et al. (PETS’20), and Bozdemir et al. (ASIACCS’21) with respect to communication, computation, and clustering quality. We compare them, assess their limitations for a practical use in real-world applications, and conclude with open challenges.

Author(s):  
Bhavesh Chaudhari

These days, just like other industries mechanical industries are also shifting towards the automation by using various techniques like machine learning, nano technology, 3D printing, etc. From 19th century steel has been widely used for construction purposes especially TMT rod(thermo mechanically treated rod).In steel industries conventional methods have been widely used for predicting the quality of steel.These conventional methods are not so accurate as well as some times they are unable to identify the errors along with this they consume a large amount of time. we have proposed a machine learning technique by which microstructures of steel are compared from any dataset of images, in order to find the differences and from the obtained differences ,the component which have less amount of defects can be obtained.


2019 ◽  
Vol 11 (24) ◽  
pp. 7159 ◽  
Author(s):  
Miriam Pirra ◽  
Ruggero G. Pensa

The quality of the transport system offered at city level constitutes an important and challenging goal for society, for local authorities, and transport operators. Therefore, appropriate evaluation of travellers’ satisfaction is required to support service performance monitoring, benchmarking, and market analysis. This aspect implies the collection of satisfaction levels for different passengers’ groups, as it could provide interesting suggestions for identifying priority areas of action. To this end, an original study aimed at understanding the main aspects affecting the common view of satisfaction among different kinds of travellers at European level is presented in this paper. A specific survey investigating how travellers perceive the quality of their journey is proposed to people living in cities characterised by different sizes. Data are then analysed through a multi-view co-clustering algorithm, an innovative machine learning technique that highlights clusters of respondents grouped according to various categories of features. Such results could be used by local authorities and transport providers to understand the specific actions to be operated to improve the quality of transport service offered in a market segmentation dimension.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Ruiqi Hou ◽  
Fei Tang ◽  
Shikai Liang ◽  
Guowei Ling

As a commonly used algorithm in data mining, clustering has been widely applied in many fields, such as machine learning, information retrieval, and pattern recognition. In reality, data to be analyzed are often distributed to multiple parties. Moreover, the rapidly increasing data volume puts heavy computing pressure on data owners. Thus, data owners tend to outsource their own data to cloud servers and obtain data analysis results for the federated data. However, the existing privacy-preserving outsourced k -means schemes cannot verify whether participants share consistent data. Considering the scenarios with multiple data owners and sensitive information security in an outsourced environment, we propose a verifiable privacy-preserving federated k -means clustering scheme. In this article, cloud servers and participants perform k -means clustering algorithm over encrypted data without exposing private data and intermediate results in each iteration. In particular, our scheme can verify the shares from participants when updating the cluster centers based on secret sharing, hash function and blockchain, so that our scheme can resist inconsistent share attacks by malicious participants. Finally, the security and experimental analysis are carried out to show that our scheme can protect private data and get high-accuracy clustering results.


2022 ◽  
Vol 12 (2) ◽  
pp. 734
Author(s):  
Jaehyoung Park ◽  
Hyuk Lim

Federated learning (FL) is a machine learning technique that enables distributed devices to train a learning model collaboratively without sharing their local data. FL-based systems can achieve much stronger privacy preservation since the distributed devices deliver only local model parameters trained with local data to a centralized server. However, there exists a possibility that a centralized server or attackers infer/extract sensitive private information using the structure and parameters of local learning models. We propose employing homomorphic encryption (HE) scheme that can directly perform arithmetic operations on ciphertexts without decryption to protect the model parameters. Using the HE scheme, the proposed privacy-preserving federated learning (PPFL) algorithm enables the centralized server to aggregate encrypted local model parameters without decryption. Furthermore, the proposed algorithm allows each node to use a different HE private key in the same FL-based system using a distributed cryptosystem. The performance analysis and evaluation of the proposed PPFL algorithm are conducted in various cloud computing-based FL service scenarios.


2008 ◽  
Vol 34 (2) ◽  
pp. 257-287 ◽  
Author(s):  
Vasin Punyakanok ◽  
Dan Roth ◽  
Wen-tau Yih

We present a general framework for semantic role labeling. The framework combines a machine-learning technique with an integer linear programming-based inference procedure, which incorporates linguistic and structural constraints into a global decision process. Within this framework, we study the role of syntactic parsing information in semantic role labeling. We show that full syntactic parsing information is, by far, most relevant in identifying the argument, especially, in the very first stage—the pruning stage. Surprisingly, the quality of the pruning stage cannot be solely determined based on its recall and precision. Instead, it depends on the characteristics of the output candidates that determine the difficulty of the downstream problems. Motivated by this observation, we propose an effective and simple approach of combining different semantic role labeling systems through joint inference, which significantly improves its performance. Our system has been evaluated in the CoNLL-2005 shared task on semantic role labeling, and achieves the highest F1 score among 19 participants.


Data Science ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 121-150
Author(s):  
Chang Sun ◽  
Lianne Ippel ◽  
Andre Dekker ◽  
Michel Dumontier ◽  
Johan van Soest

Combining and analysing sensitive data from multiple sources offers considerable potential for knowledge discovery. However, there are a number of issues that pose problems for such analyses, including technical barriers, privacy restrictions, security concerns, and trust issues. Privacy-preserving distributed data mining techniques (PPDDM) aim to overcome these challenges by extracting knowledge from partitioned data while minimizing the release of sensitive information. This paper reports the results and findings of a systematic review of PPDDM techniques from 231 scientific articles published in the past 20 years. We summarize the state of the art, compare the problems they address, and identify the outstanding challenges in the field. This review identifies the consequence of the lack of standard criteria to evaluate new PPDDM methods and proposes comprehensive evaluation criteria with 10 key factors. We discuss the ambiguous definitions of privacy and confusion between privacy and security in the field, and provide suggestions of how to make a clear and applicable privacy description for new PPDDM techniques. The findings from our review enhance the understanding of the challenges of applying theoretical PPDDM methods to real-life use cases, and the importance of involving legal-ethical and social experts in implementing PPDDM methods. This comprehensive review will serve as a helpful guide to past research and future opportunities in the area of PPDDM.


Atmosphere ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 111 ◽  
Author(s):  
Chul-Min Ko ◽  
Yeong Yun Jeong ◽  
Young-Mi Lee ◽  
Byung-Sik Kim

This study aimed to enhance the accuracy of extreme rainfall forecast, using a machine learning technique for forecasting hydrological impact. In this study, machine learning with XGBoost technique was applied for correcting the quantitative precipitation forecast (QPF) provided by the Korea Meteorological Administration (KMA) to develop a hydrological quantitative precipitation forecast (HQPF) for flood inundation modeling. The performance of machine learning techniques for HQPF production was evaluated with a focus on two cases: one for heavy rainfall events in Seoul and the other for heavy rainfall accompanied by Typhoon Kong-rey (1825). This study calculated the well-known statistical metrics to compare the error derived from QPF-based rainfall and HQPF-based rainfall against the observational data from the four sites. For the heavy rainfall case in Seoul, the mean absolute errors (MAE) of the four sites, i.e., Nowon, Jungnang, Dobong, and Gangnam, were 18.6 mm/3 h, 19.4 mm/3 h, 48.7 mm/3 h, and 19.1 mm/3 h for QPF and 13.6 mm/3 h, 14.2 mm/3 h, 33.3 mm/3 h, and 12.0 mm/3 h for HQPF, respectively. These results clearly indicate that the machine learning technique is able to improve the forecasting performance for localized rainfall. In addition, the HQPF-based rainfall shows better performance in capturing the peak rainfall amount and spatial pattern. Therefore, it is considered that the HQPF can be helpful to improve the accuracy of intense rainfall forecast, which is subsequently beneficial for forecasting floods and their hydrological impacts.


Sign in / Sign up

Export Citation Format

Share Document