scholarly journals L-Diversity for Data Analysis: Data Swapping with Customized Clustering

2021 ◽  
Vol 2089 (1) ◽  
pp. 012050
Author(s):  
Thirupathi Lingala ◽  
C Kishor Kumar Reddy ◽  
B V Ramana Murthy ◽  
Rajashekar Shastry ◽  
YVSS Pragathi

Abstract Data anonymization should support the analysts who intend to use the anonymized data. Releasing datasets that contain personal information requires anonymization that balances privacy concerns while preserving the utility of the data. This work shows how choosing anonymization techniques with the data analyst requirements in mind improves effectiveness quantitatively, by minimizing the discrepancy between querying the original data versus the anonymized result, and qualitatively, by simplifying the workflow for querying the data.

2018 ◽  
Vol 6 (10) ◽  
pp. 193
Author(s):  
Abdurrahman Kirtepe

In this study, the risk assessment levels of athletes in different branches were examined in terms of various variables. Descriptive scanning model was used in the study. In the research, the survey was completed with a sample method of 105 people. The questionnaire was used as a data collection tool in the research. The questionnaire consists of questions about personal information and the Risk Assessment scale for athletes and coaches. Data analysis was performed in SPSS 21 package program. Descriptive statistics such as frequency, percent, and mean, standard deviation, minimum and maximum were used in data analysis. Data analysis was performed in SPSS 21 package program. Descriptive statistics such as frequency, percent, and mean, standard deviation, minimum and maximum were used in data analysis. As a result of the research, it was determined that the risk assessment perceptions of athletes according to their age, branches, educational status and income status did not differ. As a result of the research, it was determined that the risk assessment perceptions of athletes according to their age, branches, educational status and income status did not differ.


2021 ◽  
Vol 72 ◽  
pp. 1163-1214
Author(s):  
Konstantinos Nikolaidis ◽  
Stein Kristiansen ◽  
Thomas Plagemann ◽  
Vera Goebel ◽  
Knut Liestøl ◽  
...  

Good training data is a prerequisite to develop useful Machine Learning applications. However, in many domains existing data sets cannot be shared due to privacy regulations (e.g., from medical studies). This work investigates a simple yet unconventional approach for anonymized data synthesis to enable third parties to benefit from such anonymized data. We explore the feasibility of learning implicitly from visually unrealistic, task-relevant stimuli, which are synthesized by exciting the neurons of a trained deep neural network. As such, neuronal excitation can be used to generate synthetic stimuli. The stimuli data is used to train new classification models. Furthermore, we extend this framework to inhibit representations that are associated with specific individuals. We use sleep monitoring data from both an open and a large closed clinical study, and Electroencephalogram sleep stage classification data, to evaluate whether (1) end-users can create and successfully use customized classification models, and (2) the identity of participants in the study is protected. Extensive comparative empirical investigation shows that different algorithms trained on the stimuli are able to generalize successfully on the same task as the original model. Architectural and algorithmic similarity between new and original models play an important role in performance. For similar architectures, the performance is close to that of using the original data (e.g., Accuracy difference of 0.56%-3.82%, Kappa coefficient difference of 0.02-0.08). Further experiments show that the stimuli can provide state-ofthe-art resilience against adversarial association and membership inference attacks.


Author(s):  
Yifat Nahmias

Nudges comprise a key component of the regulatory toolbox. Both the public and private sectors use nudges extensively in various domains, ranging from environmental regulation to health, food and financial regulation. This article focuses on a particular type of nudge: social norm nudges. It discusses, for the first time, the privacy risks of such nudges. Social norm nudges induce behavioral change by capitalizing on people’s desire to fit in with others, on their predisposition to social conformity, and on their susceptibility to the way information is framed. In order to design effective social norm nudges, personal information about individuals and their behavior must be collected, processed, and later disseminated (usually in some aggregated form). Thus, the use of social norm nudges opens up the possibility for privacy threats. Despite the significant privacy concerns raised by social norm nudges, research on the topic has been scarce. This article makes two contributions to the understanding of the privacy risks underlying the use of social norm nudges. The first contribution is analytic: it demonstrates that using social norm nudges can pose a threat to individuals’ privacy through re-identification of anonymized data. This risk was demonstrated in other contexts (e.g. Netflix recommendation contest). The second contribution is policy oriented: it argues that the strategy of differential privacy can be used to mitigate these privacy risks and offer a way to employ social norms nudges while protecting individuals’ privacy.


Author(s):  
Rohit Rastogi ◽  
Devendra K. Chaturvedi ◽  
Parul Singhal ◽  
Mayank Gupta

The Delhi and NCR healthcare systems are rapidly registering electronic health records, diagnostic information available electronically. Furthermore, clinical analysis is rapidly advancing—large quantities of information are examined and new insights are part of the analysis of this technology—and experienced as big data. It provides tools for storing, managing, studying, and assimilating large amounts of robust, structured, and unstructured data generated by existing medical organizations. Recently, data analysis data have been used to help provide care and diagnose disease. In the current era, systems need connected devices, people, time, places, and networks that are fully integrated on the internet (IoT). The internet has become new in developing health monitoring systems. Diabetes is defined as a group of metabolic disorders affecting human health worldwide. Extensive research (diagnosis, path physiology, treatment, etc.) produces a great deal of data on all aspects of diabetes. The main purpose of this chapter is to provide a detailed analysis of healthcare using large amounts of data and analysis. From the Hospitals of Delhi and NCR, a sample of 30 subjects has been collected in random fashion, who have been suffering from diabetes from their health insurance providers without disclosing any personal information (PI) or sensitive personal information (SPI) by law. The present study aimed to analyse diabetes with the latest IoT and big data analysis techniques and its correlation with stress (TTH) on human health. Authors have tried to include age, gender, and insulin factor and its correlation with diabetes. Overall, in conclusion, TTH cases increase with age in case of males and do not follow the pattern of diabetes variation with age while in the case of female TTH pattern variation (i.e., increasing trend up to age of 60 then decreasing).


2019 ◽  
Author(s):  
Italo Santos ◽  
Emanuel Coutinho ◽  
Leonardo Moreira

Privacy is a concept directly related to people's interest in maintaining personal space without the interference of others. In this paper, we focus on study the k-anonymity technique since many generalization algorithms are based on this privacy model. Due to this, we develop a proof of concept that uses the k-anonymity technique for data anonymization to anonymize data raw and generate a new file with anonymized data. We present the system architecture and detailed an experiment using the adult data set which has sensitive information, where each record corresponds to the personal information for a person. Finally, we summarize our work and discuss future works.


Author(s):  
Ambar Widianingrum ◽  
Joko Sulianto ◽  
Rahmat Rais

The purpose of this study was to describe the feasibility of teaching materials based on an open-ended approach to improve the reasoning abilities of fourth grade students in elementary schools. This type of research is research and development (Research and Development). The subjects of this study were 3 classroom teachers. The data analysis technique used is descriptive qualitative data analysis (data reduction, data presentation and conclusion) and quantitative descriptive data analysis. Based on the results of stage 1 media validation, it was obtained 84.8%, and the results of stage 2 media validation were obtained 94.8%. The result of material validation for stage 1 was obtained 84.6%, and validation for material for stage 2 was obtained 93.3%. The results of initial field trials obtained media 93.7% and material 92.3%. This shows that the teaching material is declared valid and suitable for use. Based on the results of this study, the suggestion that can be conveyed is that teaching materials based on an open-ended approach can be used as a tool for teaching and learning resources for students.


Author(s):  
Ying Wang ◽  
Yiding Liu ◽  
Minna Xia

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.


Author(s):  
Joachim Wagner

SummaryThis paper contributes to the literature on the use of anonymized firm level data by reporting results from a replication study. To test for the practical usefulness of anonymized data I selected two of my published papers based on different cross sections of firm data. The data used there were anonymized by micro aggregation. I replicated the analyses reported in the papers with the anonymized data, and then compared the results to those produced with the original data. Frequently, the reported levels of statistical significance differ. Furthermore, statistically significant coefficients sometimes differ by order of magnitude. Therefore, at least for the moderate sample sizes used here micro-aggregated firm data should not be considered as a tool for empirical research.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1771
Author(s):  
Ferdinando Di Martino ◽  
Irina Perfilieva ◽  
Salvatore Sessa

Fuzzy transform is a technique applied to approximate a function of one or more variables applied by researchers in various image and data analysis. In this work we present a summary of a fuzzy transform method proposed in recent years in different data mining disciplines, such as the detection of relationships between features and the extraction of association rules, time series analysis, data classification. After having given the definition of the concept of Fuzzy Transform in one or more dimensions in which the constraint of sufficient data density with respect to fuzzy partitions is also explored, the data analysis approaches recently proposed in the literature based on the use of the Fuzzy Transform are analyzed. In particular, the strategies adopted in these approaches for managing the constraint of sufficient data density and the performance results obtained, compared with those measured by adopting other methods in the literature, are explored. The last section is dedicated to final considerations and future scenarios for using the Fuzzy Transform for the analysis of massive and high-dimensional data.


Sign in / Sign up

Export Citation Format

Share Document