Privacy Preservation and Analytical Utility of E-Learning Data Mashups in the Web of Data

Virtual learning environments contain valuable data about students that can be correlated and analyzed to optimize learning. Modern learning environments based on data mashups that collect and integrate data from multiple sources are relevant for learning analytics systems because they provide insights into students’ learning. However, data sets involved in mashups may contain personal information of sensitive nature that raises legitimate privacy concerns. Average privacy preservation methods are based on preemptive approaches that limit the published data in a mashup based on access control and authentication schemes. Such limitations may reduce the analytical utility of the data exposed to gain students’ learning insights. In order to reconcile utility and privacy preservation of published data, this research proposes a new data mashup protocol capable of merging and k-anonymizing data sets in cloud-based learning environments without jeopardizing the analytical utility of the information. The implementation of the protocol is based on linked data so that data sets involved in the mashups are semantically described, thereby enabling their combination with relevant educational data sources. The k-anonymized data sets returned by the protocol still retain essential information for supporting general data exploration and statistical analysis tasks. The analytical and empirical evaluation shows that the proposed protocol prevents individuals’ sensitive information from re-identifying.

Download Full-text

Data Privacy Preservation and Security Approaches for Sensitive Data in Big Data

10.3233/apc210221 ◽

2021 ◽

Author(s):

Rohit Ravindra Nikam ◽

Rekha Shahapurkar

Keyword(s):

Data Mining ◽

Data Analytics ◽

Data Privacy ◽

Privacy Preservation ◽

Large Data ◽

Research Area ◽

Data Sets ◽

Sensitive Information ◽

Sensitive Data ◽

Data Mining Techniques

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.

Download Full-text

Anonymization Techniques for Privacy Preservation in Social Networks: A Review

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst21842 ◽

2021 ◽

pp. 14-21

Author(s):

Kalpana Chavhan ◽

Dr. Praveen S. Challagidad

Keyword(s):

Data Mining ◽

Social Networks ◽

Social Network ◽

Privacy Preservation ◽

Information Science ◽

Original Data ◽

Published Data ◽

Sensitive Information ◽

Social Network Data ◽

The Social

Any data that user creates or owns is known as the user's data (For example: Name, USN, Phone number, address, email Id). As the number of users in social networks are increasing day by day the data generated by the user's is also increasing. Network providers will publish the data to others for analysis with hope that mining will provide additional functionality to their users or produce useful results that they can share with others. The analysis of social networks is used in modern sociology, geography, economics and information science as well as in various fields. Publicizing the original data of social networks for analysis raises issues of confidentiality, the adversary can search for documented threats such as identity theft, digital harassment and personalized spam. The published data may contain some sensitive information of individuals which must not be disclosed for this reason social network data must be anonymized before it is published. To do the data in nominate the anonymization technique should be applied, to preserve the privacy of data in the social network in a manner that preserves the privacy of the user whose records are being published while maintaining the published dataset rich enough to allow for the exploration of data. In order to address the issue of privacy protection, we first describe the concept of k-anonymity and illustrate different approaches for its enforcement. We then discuss how the privacy requirements characterized by k-anonymity can be violated in data mining and introduce possible approaches to ensure the satisfaction of k-anonymity in data mining also several attacks on dataset are discussed.

Download Full-text

Technique for optimizing of association rule mining by utilizing genetic algorithm

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190902115608 ◽

2019 ◽

Vol 13 ◽

Author(s):

Darshana H. Patel ◽

Saurabh Shah ◽

Avani Vasant

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Privacy Preservation ◽

Personal Information ◽

Optimization Technique ◽

Cuckoo Search ◽

Optimization Techniques ◽

Sensitive Information ◽

Time Accuracy ◽

Or Organization

With the advent of various technologies and digitization, popularity of the data mining has been increased for analysis and growth purpose in several fields. However, such pattern discovery by data mining also discloses personal information of an individual or organization. In today’s world, people are very much concerned about their sensitive information which they don’t want to share. Thus, it is very much required to protect the private data. This paper focuses on preserving the sensitive information as well as maintaining the efficiency which gets affected due to privacy preservation. Privacy is preserved by anonymization and efficiency is improved by optimization techniques as now days several advanced optimization techniques are used to solve the various problems of different areas. Furthermore, privacy preserving association classification has been implemented utilizing various datasets considering the accuracy parameter and it has been concluded that as privacy increases, accuracy gets degraded due to data transformation. Hence, optimization techniques are applied to improve the accuracy. In addition, comparison with the existing optimization technique namely particle swarm optimization, Cuckoo search and animal migration optimization has been carried out with the proposed approach specifically genetic algorithm for optimizing association rules.It has been concluded that the proposed approach requires more execution time about 20-80 milliseconds depending on the dataset but at the same time accuracy is improved by 5-6 % as compared to the existing approaches.

Download Full-text

Privacy Preservation of Sensitive Data using Polymorphic Encryption and Cryptographic Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l1108.10812s19 ◽

2019 ◽

Vol 8 (12S) ◽

pp. 433-440

Keyword(s):

Privacy Preservation ◽

Narrative Approach ◽

Data Sets ◽

Sensitive Information ◽

Sensitive Data ◽

Health Records ◽

Secure Storage ◽

Other Information ◽

The One ◽

Cryptographic Techniques

The compilation and analysis of health records on a big data scale is becoming an essential approach to understand problematical diseases. In order to gain new insights it is important that researchers can cooperate: they will have to access each other's data and contribute to the data sets. In many cases, such health records involves privacy sensitive data about patients. Patients should be cautious to count on preservation of their privacy and on secure storage of their data. Polymorphic encryption and Pseudonymisation, form a narrative approach for the management of sensitive information, especially in health care. The conventional encryptionsystem is rather inflexible: once scrambled, just one key can be utilized to unscramble the information. This inflexibility is turning into an each more noteworthy issue with regards to huge information examination, where various gatherings who wish to research some portion of an encoded informational index all need the one key for decoding. Polymorphic encryption is another cryptographic strategy that tackles these issues. Together with the related procedure of polymorphic pseudonymisation new security and protection assurances can be given which are fundamental in zones, for example, (customized) wellbeing area, medicinal information accumulation by means of self-estimation applications, and all the more by and large in protection inviting character the board and information examination.Encryption, pseudonymization and anonymization are some of the importanttechniques that facilitate the usders on security of sensitive data, and ensure compliance both from an Data Regulation act and any other information security act like Health Insurance Portability and Accountability Act - (HIPAA) regulations.

Download Full-text

Privacy Preserving Anonymity for Periodical Releases of Spontaneous ADE Reporting Data: Algorithm Development and Validation (Preprint)

10.2196/preprints.28752 ◽

2021 ◽

Author(s):

Wen-Yang Lin ◽

Jie-Teng Wang

Keyword(s):

Adverse Drug Events ◽

Personal Information ◽

Data Publishing ◽

Published Data ◽

Sensitive Information ◽

Personal Privacy ◽

Data Utility ◽

Data Anonymization ◽

Privacy Model ◽

Bounding Model

BACKGROUND Increasingly, spontaneous reporting systems (SRS) have been established to collect adverse drug events to foster the research of ADR detection and analysis. SRS data contains personal information and so its publication requires data anonymization to prevent the disclosure of individual privacy. We previously have proposed a privacy model called MS(k, θ*)-bounding and the associated MS-Anonymization algorithm to fulfill the anonymization of SRS data. In the real world, the SRS data usually are released periodically, e.g., FAERS, to accommodate newly collected adverse drug events. Different anonymized releases of SRS data available to the attacker may thwart our single-release-focus method, i.e., MS(k, θ*)-bounding. OBJECTIVE We investigate the privacy threat caused by periodical releases of SRS data and propose anonymization methods to prevent the disclosure of personal privacy information while maintain the utility of published data. METHODS We identify some potential attacks on periodical releases of SRS data, namely BFL-attacks, that are mainly caused by follow-up cases. We present a new privacy model called PPMS(k, θ*)-bounding, and propose the associated PPMS-Anonymization algorithm along with two improvements, PPMS+-Anonymization and PPMS++-Anonymization. Empirical evaluations were performed using 32 selected FAERS quarter datasets, from 2004Q1 to 2011Q4. The performance of the proposed three versions of PPMS-Anonymization were inspected against MS-Anonymization from some aspects, including data distortion, measured by Normalized Information Loss (NIS); privacy risk of anonymized data, measured by Dangerous Identity Ratio (DIR) and Dangerous Sensitivity Ratio (DSR); and data utility, measured by bias of signal counting and strength (PRR). RESULTS The results show that our new method can prevent privacy disclosure for periodical releases of SRS data with reasonable sacrifice of data utility and acceptable deviation of the strength of ADR signals. The best version of PPMS-Anonymization, PPMS++-Anonymization, achieves nearly the same quality as MS-Anonymization both in privacy protection and data utility. CONCLUSIONS The proposed PPMS(k, θ*)-bounding model and PPMS-Anonymization algorithm are effective in anonymizing SRS datasets in the periodical data publishing scenario, preventing the series of releases from the disclosure of personal sensitive information caused by BFL-attacks while maintaining the data utility for ADR signal detection.

Download Full-text

GANs Based Density Distribution Privacy-Preservation on Mobility Data

Security and Communication Networks ◽

10.1155/2018/9203076 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Dan Yin ◽

Qing Yang

Keyword(s):

Density Distribution ◽

Privacy Preservation ◽

Differential Privacy ◽

Personal Information ◽

Original Data ◽

Location Based Services ◽

Generative Adversarial Networks ◽

Published Data ◽

Mobility Data ◽

Adversarial Networks

With the development of mobile devices and GPS, plenty of Location-based Services (LBSs) have emerged in these years. LBSs can be applied in a variety of contexts, such as health, entertainment, and personal life. The location based data that contains significant personal information is released for analysing and mining. The privacy information of users can be attacked from the published data. In this paper, we investigate the problem of privacy-preservation of density distribution on mobility data. Different from adding noises into the original data for privacy protection, we devise the Generative Adversarial Networks (GANs) to train the generator and discriminator for generating the privacy-preserved data. We conduct extensive experiments on two real world mobile datasets. It is demonstrated that our method outperforms the differential privacy approach in both data utility and attack error.

Download Full-text

A framework for technology-assisted sensitivity review

ACM SIGIR Forum ◽

10.1145/3458537.3458544 ◽

2019 ◽

Vol 53 (1) ◽

pp. 42-43

Author(s):

Graham McDonald

Keyword(s):

International Relations ◽

Review Process ◽

Personal Information ◽

Empirical Evaluation ◽

User Model ◽

Freedom Of Information ◽

Sensitive Information ◽

Digital Government ◽

The Public ◽

Government Documents

More than a hundred countries implement freedom of information laws. In the UK, the Freedom of Information Act 2000 [1] (FOIA) states that the government's documents must be made freely available, or opened , to the public. Moreover, all central UK government departments' documents that have a historic value must be transferred to the The National Archives (TNA) within twenty years of the document's creation. However, government documents can contain sensitive information, such as personal information or information that would likely damage international relations if it was opened. Therefore, all government documents that are to be publicly archived must be sensitivity reviewed to identify and redact the sensitive information. However, the lack of structure in digital document collections and the volume of digital documents that are to be sensitivity reviewed mean that the traditional manual sensitivity review process is not practical for digital sensitivity review. In this thesis, we argue that sensitivity classification can be deployed to assist government departments and human reviewers to sensitivity review born-digital government documents. However, classifying sensitive information is a complex task, since sensitivity is context-dependent and can require a human to judge on the likely effect of releasing the information into the public domain. Moreover, sensitivity is not necessarily topic-oriented, i.e., it is usually dependent on a combination of what is being said and about whom. Through a thorough empirical evaluation, we show that a text classification approach is effective for sensitivity classification and can be improved by identifying the vocabulary, syntactic and semantic document features that are reliable indicators of sensitive or nonsensitive text [2]. Furthermore, we propose to reduce the number of documents that have to be reviewed to learn an effective sensitivity classifier through an active learning strategy in which a sensitivity reviewer redacts any sensitive text in a document as they review it, to construct a representation of the sensitivities in a collection [3]. With this in mind, we propose a novel framework for technology-assisted sensitivity review that can prioritise the most appropriate documents to be reviewed at specific stages of the sensitivity review process. Furthermore, our framework can provide the reviewers with useful information to assist them in making their reviewing decisions. We conduct two user studies to evaluate the effectiveness of our proposed framework for assisting with two distinct digital sensitivity review scenarios, or user models. Firstly, in the limited review user model, which addresses a scenario in which there are insufficient reviewing resources available to sensitivity review all of the documents in a collection, we show that our proposed framework can increase the number of documents that can be reviewed and released to the public with the available reviewing resources [4]. Secondly, in the exhaustive review user model, which addresses a scenario in which all of the documents in a collection will be manually sensitivity reviewed, we show that providing the reviewers with useful information about the documents that contain sensitive information can increase the reviewers' accuracy, reviewing speed and agreement [5]. This is the first thesis to investigate automatically classifying FOIA sensitive information to assist digital sensitivity review. The central contributions are our proposed framework for technology-assisted sensitivity review and our sensitivity classification approaches. Our contributions are validated using a collection of government documents that are sensitivity reviewed by expert sensitivity reviewers to identify two FOIA sensitivities, namely international relations and personal information. Our results demonstrate that our proposed framework is a viable technology for assisting digital sensitivity review. Supervisors Prof. Iadh Ounis (University of Glasgow), Dr. Craig Macdonald (University of Glasgow) Available from: http://theses.gla.ac.uk/41076

Download Full-text

The Sampling Distribution of Disease-Associated Alleles

Genetics ◽

10.1093/genetics/147.4.1855 ◽

1997 ◽

Vol 147 (4) ◽

pp. 1855-1861 ◽

Cited By ~ 1

Author(s):

Montgomery Slatkin ◽

Bruce Rannala

Keyword(s):

Low Frequency ◽

Null Model ◽

Sampling Distribution ◽

Death Process ◽

Published Data ◽

Data Sets ◽

Likelihood Functions ◽

Alternative Hypotheses ◽

Size Standard ◽

Birth Death

Abstract A theory is developed that provides the sampling distribution of low frequency alleles at a single locus under the assumption that each allele is the result of a unique mutation. The numbers of copies of each allele is assumed to follow a linear birth-death process with sampling. If the population is of constant size, standard results from theory of birth-death processes show that the distribution of numbers of copies of each allele is logarithmic and that the joint distribution of numbers of copies of k alleles found in a sample of size n follows the Ewens sampling distribution. If the population from which the sample was obtained was increasing in size, if there are different selective classes of alleles, or if there are differences in penetrance among alleles, the Ewens distribution no longer applies. Likelihood functions for a given set of observations are obtained under different alternative hypotheses. These results are applied to published data from the BRCA1 locus (associated with early onset breast cancer) and the factor VIII locus (associated with hemophilia A) in humans. In both cases, the sampling distribution of alleles allows rejection of the null hypothesis, but relatively small deviations from the null model can account for the data. In particular, roughly the same population growth rate appears consistent with both data sets.

Download Full-text

Relative testis size and mating systems in anurans: large testis in multiple-male mating in foam-nesting frogs

Animal Biology ◽

10.1163/157075511x570312 ◽

2011 ◽

Vol 61 (2) ◽

pp. 225-238 ◽

Cited By ~ 15

Author(s):

Wen Bo Liao ◽

Zhi Ping Mi ◽

Cai Quan Zhou ◽

Ling Jin ◽

Xian Han ◽

...

Keyword(s):

Sperm Competition ◽

Published Data ◽

Male Mating ◽

Data Sets ◽

Testis Size ◽

Data Set ◽

Monogamous Species ◽

Large Testis ◽

Testes Size ◽

Testis Mass

AbstractComparative studies of the relative testes size in animals show that promiscuous species have relatively larger testes than monogamous species. Sperm competition favours the evolution of larger ejaculates in many animals – they give bigger testes. In the view, we presented data on relative testis mass for 17 Chinese species including 3 polyandrous species. We analyzed relative testis mass within the Chinese data set and combining those data with published data sets on Japanese and African frogs. We found that polyandrous foam nesting species have relatively large testes, suggesting that sperm competition was an important factor affecting the evolution of relative testes size. For 4 polyandrous species testes mass is positively correlated with intensity (males/mating) but not with risk (frequency of polyandrous matings) of sperm competition.

Download Full-text