scholarly journals Modern Privacy Threats and Privacy Preservation Techniques in Data Analytics

2021 ◽  
Author(s):  
Ram Mohan Rao P ◽  
S Murali Krishna ◽  
AP Siva Kumar

Today we are living in a digital rich and technology driven world where extremely large amounts of data get generated every hour in the public domain, which also includes personal data. Applications like social media, e-commerce, smartphone apps, etc. collect a lot of personal data which can harm individual privacy if leaked, and hence ethical code of conduct is required to ensure data privacy. Some of the privacy threats include Digital profiling, cyberstalking, recommendation systems, etc. leading to the disclosure of sensitive data and sharing of data without the consent of the data owner. Data Privacy has gained significant importance in the recent times and it is evident from the privacy legislation passed in more than 100 countries. Firms dealing with data-sensitive applications need to abide by the privacy legislation of respective territorial regions. To overcome these privacy challenges by incorporating privacy regulations, we have designed guidelines for application development, incorporating key features of privacy regulations along with the implementation strategies which will help in developing data-sensitive applications which can offer strong and coherent privacy protection of personal data.

Author(s):  
Nancy Victor ◽  
Daphne Lopez

Data privacy plays a noteworthy part in today's digital world where information is gathered at exceptional rates from different sources. Privacy preserving data publishing refers to the process of publishing personal data without questioning the privacy of individuals in any manner. A variety of approaches have been devised to forfend consumer privacy by applying traditional anonymization mechanisms. But these mechanisms are not well suited for Big Data, as the data which is generated nowadays is not just structured in manner. The data which is generated at very high velocities from various sources includes unstructured and semi-structured information, and thus becomes very difficult to process using traditional mechanisms. This chapter focuses on the various challenges with Big Data, PPDM and PPDP techniques for Big Data and how well it can be scaled for processing both historical and real-time data together using Lambda architecture. A distributed framework for privacy preservation in Big Data by combining Natural language processing techniques is also proposed in this chapter.


2021 ◽  
Author(s):  
Rohit Ravindra Nikam ◽  
Rekha Shahapurkar

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.


2018 ◽  
Vol 42 (3) ◽  
pp. 290-303 ◽  
Author(s):  
Montserrat Batet ◽  
David Sánchez

Purpose To overcome the limitations of purely statistical approaches to data protection, the purpose of this paper is to propose Semantic Disclosure Control (SeDC): an inherently semantic privacy protection paradigm that, by relying on state of the art semantic technologies, rethinks privacy and data protection in terms of the meaning of the data. Design/methodology/approach The need for data protection mechanisms able to manage data from a semantic perspective is discussed and the limitations of statistical approaches are highlighted. Then, SeDC is presented by detailing how it can be enforced to detect and protect sensitive data. Findings So far, data privacy has been tackled from a statistical perspective; that is, available solutions focus just on the distribution of the data values. This contrasts with the semantic way by which humans understand and manage (sensitive) data. As a result, current solutions present limitations both in preventing disclosure risks and in preserving the semantics (utility) of the protected data. Practical implications SeDC captures more general, realistic and intuitive notions of privacy and information disclosure than purely statistical methods. As a result, it is better suited to protect heterogenous and unstructured data, which are the most common in current data release scenarios. Moreover, SeDC preserves the semantics of the protected data better than statistical approaches, which is crucial when using protected data for research. Social implications Individuals are increasingly aware of the privacy threats that the uncontrolled collection and exploitation of their personal data may produce. In this respect, SeDC offers an intuitive notion of privacy protection that users can easily understand. It also naturally captures the (non-quantitative) privacy notions stated in current legislations on personal data protection. Originality/value On the contrary to statistical approaches to data protection, SeDC assesses disclosure risks and enforces data protection from a semantic perspective. As a result, it offers more general, intuitive, robust and utility-preserving protection of data, regardless their type and structure.


Author(s):  
Tore Hoel ◽  
Weiqin Chen ◽  
Jan M. Pawlowski

Abstract There is a gap between people’s online sharing of personal data and their concerns about privacy. Till now, this gap is addressed by attempting to match individual privacy preferences with service providers’ options for data handling. This approach has ignored the role different contexts play in data sharing. This paper aims at giving privacy engineering a new direction putting context centre stage and exploiting the affordances of machine learning in handling contexts and negotiating data sharing policies. This research is explorative and conceptual, representing the first development cycle of a design science research project in privacy engineering. The paper offers a concise understanding of data privacy as a foundation for design extending the seminal contextual integrity theory of Helen Nissenbaum. This theory started out as a normative theory describing the moral appropriateness of data transfers. In our work, the contextual integrity model is extended to a socio-technical theory that could have practical impact in the era of artificial intelligence. New conceptual constructs such as ‘context trigger’, ‘data sharing policy’ and ‘data sharing smart contract’ are defined, and their application is discussed from an organisational and technical level. The constructs and design are validated through expert interviews; contributions to design science research are discussed, and the paper concludes with presenting a framework for further privacy engineering development cycles.


The process of deriving useful and knowledgeable information from enormous quantity of data is Data Mining. During mining procedures, handling of the sensitive data has become important to protect data against illegal attacks and malicious access either during transmission or at rest. Association rule algorithm is one of the rule extraction techniques. The rules determined are either to be transferred over the public networks or to be rested for further use.The main objective of the Field Level Security of the Sensitive Data in Large Datasets is to extract the strong association rules from the large data sets and the outcomes are crafted to conceal the sensitive data. The datasets and the association rules involving the attributes with relationships and dependencies are modified through several approaches and to see that no sensitive association rule is derived from it[1]. Privacy preservation of the sensitive association rules in large datasets is to provide secrecy for the sensitive data. Presently, it has become quite important to safeguard the privacy of the users’ personal data from unauthorized persons. The usage of association rules in voluminous datasets has emerged to be advantageous to organizations [2]. In this paper, we present a novel approach which is applied for hiding sensitive association rules by utilizing the techniques of compression, encryption method ology on the original dataset, providing dataset with better immunity.


Author(s):  
Nancy Victor ◽  
Daphne Lopez

Data privacy plays a noteworthy part in today's digital world where information is gathered at exceptional rates from different sources. Privacy preserving data publishing refers to the process of publishing personal data without questioning the privacy of individuals in any manner. A variety of approaches have been devised to forfend consumer privacy by applying traditional anonymization mechanisms. But these mechanisms are not well suited for Big Data, as the data which is generated nowadays is not just structured in manner. The data which is generated at very high velocities from various sources includes unstructured and semi-structured information, and thus becomes very difficult to process using traditional mechanisms. This chapter focuses on the various challenges with Big Data, PPDM and PPDP techniques for Big Data and how well it can be scaled for processing both historical and real-time data together using Lambda architecture. A distributed framework for privacy preservation in Big Data by combining Natural language processing techniques is also proposed in this chapter.


2019 ◽  
Vol 28 (2) ◽  
pp. 183-197 ◽  
Author(s):  
Paola Mavriki ◽  
Maria Karyda

Purpose User profiling with big data raises significant issues regarding privacy. Privacy studies typically focus on individual privacy; however, in the era of big data analytics, users are also targeted as members of specific groups, thus challenging their collective privacy with unidentified implications. Overall, this paper aims to argue that in the age of big data, there is a need to consider the collective aspects of privacy as well and to develop new ways of calculating privacy risks and identify privacy threats that emerge. Design/methodology/approach Focusing on a collective level, the authors conducted an extensive literature review related to information privacy and concepts of social identity. They also examined numerous automated data-driven profiling techniques analyzing at the same time the involved privacy issues for groups. Findings This paper identifies privacy threats for collective entities that stem from data-driven profiling, and it argues that privacy-preserving mechanisms are required to protect the privacy interests of groups as entities, independently of the interests of their individual members. Moreover, this paper concludes that collective privacy threats may be different from threats for individuals when they are not members of a group. Originality/value Although research evidence indicates that in the age of big data privacy as a collective issue is becoming increasingly important, the pluralist character of privacy has not yet been adequately explored. This paper contributes to filling this gap and provides new insights with regard to threats for group privacy and their impact on collective entities and society.


2017 ◽  
Vol 41 (3) ◽  
pp. 298-310 ◽  
Author(s):  
David Sánchez ◽  
Alexandre Viejo

Purpose The purpose of this paper is to propose a privacy-preserving paradigm for open data sharing based on the following foundations: subjects have unique privacy requirements; personal data are usually published incrementally in different sources; and privacy has a time-dependent element. Design/methodology/approach This study first discusses the privacy threats related to open data sharing. Next, these threats are tackled by proposing a new privacy-preserving paradigm. The main challenges related to the enforcement of the paradigm are discussed, and some suitable solutions are identified. Findings Classic privacy-preserving mechanisms are ineffective against observers constantly monitoring and aggregating pieces of personal data released through the internet. Moreover, these methods do not consider individual privacy needs. Research limitations/implications This study characterizes the challenges to the tackled by a new paradigm and identifies some promising works, but further research proposing specific technical solutions is suggested. Practical implications This work provides a natural solution to dynamic and heterogeneous open data sharing scenarios that require user-controlled personalized privacy protection. Social implications There is an increasing social understanding of the privacy threats that the uncontrolled collection and exploitation of personal data may produce. The new paradigm allows subjects to be aware of the risks inherent to their data and to control their release. Originality/value Contrary to classic data protection mechanisms, the new proposal centers privacy protection on the individuals, and considers the privacy risks through the whole life cycle of the data release.


2018 ◽  
Vol 6 (3) ◽  
pp. 1-8
Author(s):  
Lars Magnusson ◽  
Patrik Elm ◽  
Anita Mirijamdotter

Today, still, ICT Governance is being regarded as a departmental concern, not an overall organizational concern. History has shown us that implementation strategies, which are based on departments, results in fractional implementations leading to ad hoc solutions with no central control and stagnation for the in-house ICT strategy. Further, this recently has created an opinion trend; many are talking about the ICT department as being redundant, a dying out breed, which should be replaced by on-demand specialized external services. Clearly, the evermore changing surroundings do force organizations to accelerate the pace of new adaptations within their ICT plans, more vivacious than most organizations currently is able to. This leads to that ICT departments tend to be reactive rather than acting proactively and take the lead in the increased transformation pace in which organizations find themselves. Simultaneously, the monolithic systems of the 1980ies/1990ies is often very dominating in an organization, consume too much of the yearly IT budget, leaving healthy system development behind. These systems were designed before data became an organizational all-encompassing resource; the systems were designed more or less in isolation in regards to the surrounding environment. These solutions make data sharing costly and not at all optimal. Additionally, in strives to adapt to the organization’s evolution, the initial architecture has become disrupted and built up in shreds. Adding to this, on May 25, 2018, an upgraded EU Privacy Regulation on General Data Protection Regulation (GDPR) will be activated. This upgraded privacy regulation includes a substantial strengthening of 1994’s data privacy regulation, which will profoundly affect EU organizations. This regulation will, among other things, limit the right to collect and process personal data and will give the data subject all rights to his/her data sets, independentof where this data is/has been collected and by whom. Such regulation force data collecting and processingorganizations to have total control over any personal data collected and processed. This includes detailedunderstanding of data flows, including who did what and when and under who’s authorization, and how data istransported and stored. Concerning data/information flows, maps are a mandatory part of the system documentation. This encompasses all systems, including outsourced such as cloud services. Hence, individual departments cannot any longer claim they “own” data. Further, since mid-2000, we have seen aglobal inter-organizational data integration, independent of organizations, public or private. If this integration ceasesto exist, the result will be a threat to the survival of the organization. Additionally, if the organization fails to providea transparent documentation according to the GDPR, substantial economic risk is at stake. So, the discussion aboutthe ICT departments’ demise is inapt. Any organizational change will require costly and time-consuming ICTdevelopment efforts to adapt to the legislation of today’s situation. Further, since data nowadays is interconnectedand transformed at all levels, interacting at multiple intersections all over the organization, and becoming a unifiedbase of all operative decisions, an ICT governance model for the organization is required.


Author(s):  
G. Murugaboopathi ◽  
V. Gowthami

Privacy preservation in data publishing is the major topic of research in the field of data security. Data publication in privacy preservation provides methodologies for publishing useful information; simultaneously the privacy of the sensitive data has to be preserved. This work can handle any number of sensitive attributes. The major security breaches are membership, identity and attribute disclosure. In this paper, a novel approach based on slicing that adheres to the principle of k-anonymity and l-diversity is introduced. The proposed work withstands all the privacy threats by the incorporation of k-means and cuckoo-search algorithm. The experimental results with respect to suppression ratio, execution time and information loss are satisfactory, when compared with the existing approaches.


Sign in / Sign up

Export Citation Format

Share Document