scholarly journals Data Privacy Preservation and Security Approaches for Sensitive Data in Big Data

2021 ◽  
Author(s):  
Rohit Ravindra Nikam ◽  
Rekha Shahapurkar

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.

Author(s):  
Scott Nicholson ◽  
Jeffrey Stanton

Most people think of a library as the little brick building in the heart of their community or the big brick building in the center of a campus. These notions greatly oversimplify the world of libraries, however. Most large commercial organizations have dedicated in-house library operations, as do schools, non-governmental organizations, as well as local, state, and federal governments. With the increasing use of the Internet and the World Wide Web, digital libraries have burgeoned, and these serve a huge variety of different user audiences. With this expanded view of libraries, two key insights arise. First, libraries are typically embedded within larger institutions. Corporate libraries serve their corporations, academic libraries serve their universities, and public libraries serve taxpaying communities who elect overseeing representatives. Second, libraries play a pivotal role within their institutions as repositories and providers of information resources. In the provider role, libraries represent in microcosm the intellectual and learning activities of the people who comprise the institution. This fact provides the basis for the strategic importance of library data mining: By ascertaining what users are seeking, bibliomining can reveal insights that have meaning in the context of the library’s host institution. Use of data mining to examine library data might be aptly termed bibliomining. With widespread adoption of computerized catalogs and search facilities over the past quarter century, library and information scientists have often used bibliometric methods (e.g., the discovery of patterns in authorship and citation within a field) to explore patterns in bibliographic information. During the same period, various researchers have developed and tested data mining techniques—advanced statistical and visualization methods to locate non-trivial patterns in large data sets. Bibliomining refers to the use of these bibliometric and data mining techniques to explore the enormous quantities of data generated by the typical automated library.


Author(s):  
Mafruz Ashrafi ◽  
David Taniar ◽  
Kate Smith

With the advancement of storage, retrieval, and network technologies today, the amount of information available to each organization is literally exploding. Although it is widely recognized that the value of data as an organizational asset often becomes a liability because of the cost to acquire and manage those data is far more than the value that is derived from it. Thus, the success of modern organizations not only relies on their capability to acquire and manage their data but their efficiency to derive useful actionable knowledge from it. To explore and analyze large data repositories and discover useful actionable knowledge from them, modern organizations have used a technique known as data mining, which analyzes voluminous digital data and discovers hidden but useful patterns from such massive digital data. However, discovery of hidden patterns has statistical meaning and may often disclose some sensitive information. As a result, privacy becomes one of the prime concerns in the data-mining research community. Since distributed data mining discovers rules by combining local models from various distributed sites, breaching data privacy happens more often than it does in centralized environments.


2020 ◽  
Vol 1 (1) ◽  
pp. 31-40
Author(s):  
Hina Afzal ◽  
Arisha Kamran ◽  
Asifa Noreen

The market nowadays, due to the rapid changes happening in the technologies requires a high level of interaction between the educators and the fresher coming to going the market. The demand for IT-related jobs in the market is higher than all other fields, In this paper, we are going to discuss the survival analysis in the market of parallel two programming languages Python and R . Data sets are growing large and the traditional methods are not capable enough of handling the large data sets, therefore, we tried to use the latest data mining techniques through python and R programming language. It took several months of effort to gather such an amount of data and process it with the data mining techniques using python and R but the results showed that both languages have the same rate of growth over the past years.


Author(s):  
Ratchakoon Pruengkarn ◽  
◽  
Kok Wai Wong ◽  
Chun Che Fung

Data mining is the analytics and knowledge discovery process of analyzing large volumes of data from various sources and transforming the data into useful information. Various disciplines have contributed to its development and is becoming increasingly important in the scientific and industrial world. This article presents a review of data mining techniques and applications from 1996 to 2016. Techniques are divided into two main categories: predictive methods and descriptive methods. Due to the huge number of publications available on this topic, only a selected number are used in this review to highlight the developments of the past 20 years. Applications are included to provide some insights into how each data mining technique has evolved over the last two decades. Recent research trends focus more on large data sets and big data. Recently there have also been more applications in area of health informatics with the advent of newer algorithms.


In data mining Privacy Preserving Data mining (PPDM) of the important research areas concentrated in recent years which ensures ensuring sensitive information and rule not being revealed. Several methods and techniques were proposed to hide sensitive information and rule in databases. In the past, perturbation-based PPDM was developed to preserve privacy before use and secure mining of association rules were performed in horizontally distributed databases. This paper presents an integrated model for solving the multi-objective factors, data and rule hiding through reinforcement and discrete optimization for data publishing. This is denoted as an integrated Reinforced Social Ant and Discrete Swarm Optimization (RSADSO) model. In RSA-DSO model, both Reinforced Social Ant and Discrete Swarm Optimization perform with the same particles. To start with, sensitive data item hiding is performed through Reinforced Social Ant model. Followed by this performance, sensitive rules are identified and further hidden for data publishing using Discrete Swarm Optimization model. In order to evaluate the RSA-DSO model, it was tested on benchmark dataset. The results show that RSA-DSO model is more efficient in improving the privacy preservation accuracy with minimal time for optimal hiding and also optimizing the generation of sensitive rules.


The compilation and analysis of health records on a big data scale is becoming an essential approach to understand problematical diseases. In order to gain new insights it is important that researchers can cooperate: they will have to access each other's data and contribute to the data sets. In many cases, such health records involves privacy sensitive data about patients. Patients should be cautious to count on preservation of their privacy and on secure storage of their data. Polymorphic encryption and Pseudonymisation, form a narrative approach for the management of sensitive information, especially in health care. The conventional encryptionsystem is rather inflexible: once scrambled, just one key can be utilized to unscramble the information. This inflexibility is turning into an each more noteworthy issue with regards to huge information examination, where various gatherings who wish to research some portion of an encoded informational index all need the one key for decoding. Polymorphic encryption is another cryptographic strategy that tackles these issues. Together with the related procedure of polymorphic pseudonymisation new security and protection assurances can be given which are fundamental in zones, for example, (customized) wellbeing area, medicinal information accumulation by means of self-estimation applications, and all the more by and large in protection inviting character the board and information examination.Encryption, pseudonymization and anonymization are some of the importanttechniques that facilitate the usders on security of sensitive data, and ensure compliance both from an Data Regulation act and any other information security act like Health Insurance Portability and Accountability Act - (HIPAA) regulations.


Sensitive information is gradually distributed in the cloud in this project's cloud computing and processing services to reduce costs, which raises concerns regarding data privacy. Encryption was a positive way to keep outsourced sensitive data secure, but it makes efficient use of data a very difficult process. In this paper, we focus on the issue of private matching in ide ntity-based cryptosystem over outsourced encrypted data sets that can simplify the management of certificates. To solve this proble m, we are proposing a private matching scheme based on identity


2019 ◽  
Vol 492 (1) ◽  
pp. 1370-1384 ◽  
Author(s):  
Nicholas W Borsato ◽  
Sarah L Martell ◽  
Jeffrey D Simpson

ABSTRACT Streams of stars from captured dwarf galaxies and dissolved globular clusters are identifiable through the similarity of their orbital parameters, a fact that remains true long after the streams have dispersed spatially. We calculate the integrals of motion for 31 234 stars, to a distance of 4 kpc from the Sun, which have full and accurate 6D phase space positions in the Gaia DR2 catalogue. We then apply a novel combination of data mining, numerical, and statistical techniques to search for stellar streams. This process returns five high confidence streams (including one which was previously undiscovered), all of which display tight clustering in the integral of motion space. Colour–magnitude diagrams indicate that these streams are relatively simple, old, metal-poor populations. One of these resolved streams shares very similar kinematics and metallicity characteristics with the Gaia-Enceladus dwarf galaxy remnant, but with a slightly younger age. The success of this project demonstrates the usefulness of data mining techniques in exploring large data sets.


Author(s):  
Usman Ahmed ◽  
Jerry Chun-Wei Lin ◽  
Gautam Srivastava ◽  
Hsing-Chung Chen

Finding frequent patterns identifies the most important patterns in data sets. Due to the huge and high-dimensional nature of transactional data, classical pattern mining techniques suffer from the limitations of dimensions and data annotations. Recently, data mining while preserving privacy is considered an important research area in recent decades. Information privacy is a tradeoff that must be considered when using data. Through many years, privacy-preserving data mining (PPDM) made use of methods that are mostly based on heuristics. The operation of deletion was used to hide the sensitive information in PPDM. In this study, we used deep active learning to hide sensitive operations and protect private information. This paper combines entropy-based active learning with an attention-based approach to effectively detect sensitive patterns. The constructed models are then validated using high-dimensional transactional data with attention-based and active learning methods in a reinforcement environment. The results show that the proposed model can support and improve the decision boundaries by increasing the number of training instances through the use of a pooling technique and an entropy uncertainty measure. The proposed paradigm can achieve cleanup by hiding sensitive items and avoiding non-sensitive items. The model outperforms greedy, genetic, and particle swarm optimization approaches.


Alpesh Vaghela et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(5), September - October 2021, 2930 – 2935 2930 ABSTRACT Academics and industry researchers alike find privacy-preservation of large data to be a very intriguing field of study. Data collection, storage, and processing are the three steps of big data's life cycle. At different stages of the big data life cycle, different privacy and security solutions are used. Many health-care stakeholders are working together to develop a new pattern for safeguarding people from an unknown disease while also promoting economic prosperity. The methods of big data processing and big data analytics will be employed to discover new economic growth patterns. Because the current method of data anonymization leads to data breaches, researchers needed to develop a new way of large data mining or knowledge discovery in databases (KDD), in which numerous parties share their data to identify new patterns. This study introduces a novel way for data mining privacy protection based on Blockchain and the InterPlanetary File System (IPFS) (PPDM). The authors propose leveraging Blockchain and IPFS to create the ChainPPDM approach for preserving big data privacy. The data saved on the blockchain is immutable, transparent, and safe, and it allows for decentralized storage. IPFS is a distributed file system that stores data in a decentralized manner.


Sign in / Sign up

Export Citation Format

Share Document