Secure Building Blocks for Data Privacy

Author(s):  
Shuguo Han

Rapid advances in automated data collection tools and data storage technology have led to the wide availability of huge amount of data. Data mining can extract useful and interesting rules or knowledge for decision making from large amount of data. In the modern world of business competition, collaboration between industries or companies is one form of alliance to maintain overall competitiveness. Two industries or companies may find that it is beneficial to collaborate in order to discover more useful and interesting patterns, rules or knowledge from their joint data collection, which they would not be able to derive otherwise. Due to privacy concerns, it is impossible for each party to share its own private data with one another if the data mining algorithms are not secure. Therefore, privacy-preserving data mining (PPDM) was proposed to resolve the data privacy concerns while yielding the utility of distributed data sets (Agrawal & Srikant, 2000; Lindell.Y. & Pinkas, 2000). Conventional PPDM makes use of Secure Multi-party Computation (Yao, 1986) or randomization techniques to allow the participating parties to preserve their data privacy during the mining process. It has been widely acknowledged that algorithms based on secure multi-party computation are able to achieve complete accuracy, albeit at the expense of efficiency.

Author(s):  
Philipp Sprengholz ◽  
Cornelia Betsch

AbstractBecause of the increasing popularity of voice-controlled virtual assistants, such as Amazon’s Alexa and Google Assistant, they should be considered a new medium for psychological and behavioral research. We developed Survey Mate, an extension of Google Assistant, and conducted two studies to analyze the reliability and validity of data collected through this medium. In the first study, we assessed validated procrastination and shyness scales as well as social desirability indicators for both the virtual assistant and an online questionnaire. The results revealed comparable internal consistency and construct and criterion validity. In the second study, five social psychological experiments, which have been successfully replicated by the Many Labs projects, were successfully reproduced using a virtual assistant for data collection. Comparable effects were observed for users of both smartphones and smart speakers. Our findings point to the applicability of virtual assistants in data collection independent of the device used. While we identify some limitations, including data privacy concerns and a tendency toward more socially desirable responses, we found that virtual assistants could allow the recruitment of participants who are hard to reach with established data collection techniques, such as people with visual impairment, dyslexia, or lower education. This new medium could also be suitable for recruiting samples from non-Western countries because of its wide availability and easily adaptable language settings. It could also support an increase in the generalizability of theories in the future.


Author(s):  
Qin Ding

With the growing usage of XML data for data storage and exchange, there is an imminent need to develop efficient algorithms to perform data mining on semistructured XML data. Mining on XML data is much more difficult than mining on relational data because of the complexity of structure in XML data. A naïve approach to mining on XML data is to first convert XML data into relational format. However the structure information may be lost during the conversion. It is desired to develop efficient and effective data mining algorithms that can be directly applied on XML data.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Ivan Kholod ◽  
Ilya Petukhov ◽  
Andrey Shorov

This paper describes the construction of a Cloud for Distributed Data Analysis (CDDA) based on the actor model. The design uses an approach to map the data mining algorithms on decomposed functional blocks, which are assigned to actors. Using actors allows users to move the computation closely towards the stored data. The process does not require loading data sets into the cloud and allows users to analyze confidential information locally. The results of experiments show that the efficiency of the proposed approach outperforms established solutions.


Author(s):  
Balazs Feil ◽  
Janos Abonyi

This chapter aims to give a comprehensive view about the links between fuzzy logic and data mining. It will be shown that knowledge extracted from simple data sets or huge databases can be represented by fuzzy rule-based expert systems. It is highlighted that both model performance and interpretability of the mined fuzzy models are of major importance, and effort is required to keep the resulting rule bases small and comprehensible. Therefore, in the previous years, soft computing based data mining algorithms have been developed for feature selection, feature extraction, model optimization, and model reduction (rule based simplification). Application of these techniques is illustrated using the wine data classification problem. The results illustrate that fuzzy tools can be applied in a synergistic manner through the nine steps of knowledge discovery.


2021 ◽  
Author(s):  
Rohit Ravindra Nikam ◽  
Rekha Shahapurkar

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.


passer ◽  
2019 ◽  
Vol 3 (1) ◽  
pp. 174-179
Author(s):  
Noor Bahjat ◽  
Snwr Jamak

Cancer is a common disease that threats the life of one of every three people. This dangerous disease urgently requires early detection and diagnosis. The recent progress in data mining methods, such as classification, has proven the need for machine learning algorithms to apply to large datasets. This paper mainly aims to utilise data mining techniques to classify cancer data sets into blood cancer and non-blood cancer based on pre-defined information and post-defined information obtained after blood tests and CT scan tests. This research conducted using the WEKA data mining tool with 10-fold cross-validation to evaluate and compare different classification algorithms, extract meaningful information from the dataset and accurately identify the most suitable and predictive model. This paper depicted that the most suitable classifier with the best ability to predict the cancerous dataset is Multilayer perceptron with an accuracy of 99.3967%.


Author(s):  
Trisna Yuniarti ◽  
Dahliyah Hayati

The oil palm is the most productive plantation product in Indonesia. Government strategies and policies related to oil palm plantations continue to be carried out considering that the plantation area is increasing every year. Segmentation of oil palm plantations based on area, production, and productivity aims to identify groups of potential oil palm plantations in the territory of Indonesia. This segmentation can provide consideration in formulating strategies and policies that will be made by the government. The segmentation method for grouping oil palm plantations uses the K-Means Clustering Data Mining technique with 3 clusters specified. Data mining stages start from data collection until representation is carried out, where 34 data sets are collected, only 25 data sets can be processed further. The results of this grouping obtained three plantation segments, namely 72% of the plantation group with low potential, 20% of the plantation group with medium potential, and 8% of the plantation group with high potential.


Author(s):  
Diana Luck

In recent times, customer relationship management (CRM) has been defined as relating to sales, marketing, and even services automation. Additionally, the concept is increasingly associated with cost savings and streamline processes as well as with the engendering, nurturing and tracking of relationships with customers. Much less associations appear to be attributed to the creation, storage and mining of data. Although successful CRM is in evidence based on a triad combination of technology, people and processes, the importance of data is unquestionable. Accordingly, this chapter seeks to illustrate how, although the product and service elements as well as organizational structure and strategies are central to CRM, data is the pivotal dimension around which the concept revolves in contemporary terms. Consequently, this chapter seeks to illustrate how the processes associated with data management, namely: data collection, data collation, data storage and data mining, are essential components of CRM in both theoretical and practical terms.


Author(s):  
Ersin Dincelli ◽  
Xin Zhou ◽  
Alper Yayla ◽  
Haadi Jafarian

Wearable devices have evolved over the years and shown significant increase in popularity. With the advances in sensor technologies, data collection capabilities, and data analytics, wearable devices now enable interaction among users, devices, and their environment seamlessly. Multifunctional nature of this technology enables users to track their daily physical activities, engage with other users through social networking capabilities, and log their lifestyle habits. In this chapter, the authors discuss the types of sensor technologies embedded in wearable devices and how the data collected through such devices can be further interpreted by data analytics. In parallel with abundance of personal data that can be collected via wearable devices, they also discuss issues related to data privacy, suggestions for users, developers, and policymakers regarding how to protect data privacy are also discussed.


2010 ◽  
Vol 1 (1) ◽  
pp. 60-92 ◽  
Author(s):  
Joaquín Derrac ◽  
Salvador García ◽  
Francisco Herrera

The use of Evolutionary Algorithms to perform data reduction tasks has become an effective approach to improve the performance of data mining algorithms. Many proposals in the literature have shown that Evolutionary Algorithms obtain excellent results in their application as Instance Selection and Instance Generation procedures. The purpose of this paper is to present a survey on the application of Evolutionary Algorithms to Instance Selection and Generation process. It will cover approaches applied to the enhancement of the nearest neighbor rule, as well as other approaches focused on the improvement of the models extracted by some well-known data mining algorithms. Furthermore, some proposals developed to tackle two emerging problems in data mining, Scaling Up and Imbalance Data Sets, also are reviewed.


Sign in / Sign up

Export Citation Format

Share Document