Differential Privacy Approach for Big Data Privacy in Healthcare

Author(s):

Marmar Moussa ◽

Steven A. Demurjian

Keyword(s):

Big Data ◽

Data Sharing ◽

Data Privacy ◽

Large Scale ◽

Differential Privacy ◽

Security And Privacy ◽

Instrumental Case Study ◽

Large Scale Data ◽

Private Data

This chapter presents a survey of the most important security and privacy issues related to large-scale data sharing and mining in big data with focus on differential privacy as a promising approach for achieving privacy especially in statistical databases often used in healthcare. A case study is presented utilizing differential privacy in healthcare domain, the chapter analyzes and compares the major differentially private data release strategies and noise mechanisms such as the Laplace and the exponential mechanisms. The background section discusses several security and privacy approaches in big data including authentication and encryption protocols, and privacy preserving techniques such as k-anonymity. Next, the chapter introduces the differential privacy concepts used in the interactive and non-interactive data sharing models and the various noise mechanisms used. An instrumental case study is then presented to examine the effect of applying differential privacy in analytics. The chapter then explores the future trends and finally, provides a conclusion.

Differential Privacy Approach for Big Data Privacy in Healthcare

Web Services ◽

10.4018/978-1-5225-7501-6.ch084 ◽

2019 ◽

pp. 1623-1645

Author(s):

Marmar Moussa ◽

Steven A. Demurjian

Keyword(s):

Big Data ◽

Data Sharing ◽

Data Privacy ◽

Large Scale ◽

Differential Privacy ◽

Security And Privacy ◽

Instrumental Case Study ◽

Large Scale Data ◽

Private Data

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

Blockchain-Based Data Market (BCBDM) Framework for Security and Privacy

10.4018/978-1-7998-6673-2.ch012 ◽

2021 ◽

pp. 186-205

Author(s):

Shailesh Pancham Khapre ◽

Chandramohan Dhasarathan ◽

Puviyarasi T. ◽

Sam Goundar

Keyword(s):

Big Data ◽

Data Sharing ◽

Data Storage ◽

Data Privacy ◽

Security And Privacy ◽

Research Progress ◽

Distributed Data ◽

Distributed Data Storage ◽

Data Market ◽

Privacy Issues

In the internet era, incalculable data is generated every day. In the process of data sharing, complex issues such as data privacy and ownership are emerging. Blockchain is a decentralized distributed data storage technology. The introduction of blockchain can eliminate the disadvantages of the centralized data market, but at the same time, distributed data markets have created security and privacy issues. It summarizes the industry status and research progress of the domestic and foreign big data trading markets and refines the nature of the blockchain-based big data sharing and circulation platform. Based on these properties, a blockchain-based data market (BCBDM) framework is proposed, and the security and privacy issues as well as corresponding solutions in this framework are analyzed and discussed. Based on this framework, a data market testing system was implemented, and the feasibility and security of the framework were confirmed.

On using MapReduce to scale algorithms for Big Data analytics: a case study

Journal Of Big Data ◽

10.1186/s40537-019-0269-1 ◽

2019 ◽

Vol 6 (1) ◽

Author(s):

Phongphun Kijsanayothin ◽

Gantaphon Chalumporn ◽

Rattikorn Hewett

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Apriori Algorithm ◽

Large Scale Data ◽

Distributed Execution ◽

Mapreduce Model ◽

Large Clusters

Abstract Introduction Many data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to “Big algorithms” for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution. Case description This paper investigates a case study of a scaling problem of “Big algorithms” for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model. Discussion and evaluation Formal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000. Conclusions The results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based “Big algorithms”.

A Comprehensive Survey on Local Differential Privacy

Security and Communication Networks ◽

10.1155/2020/8829523 ◽

2020 ◽

Vol 2020 ◽

pp. 1-29 ◽

Author(s):

Xingxing Xiong ◽

Shubo Liu ◽

Dan Li ◽

Zhaohui Cai ◽

Xiaoguang Niu

Keyword(s):

Big Data ◽

Data Analysis ◽

Statistical Learning ◽

Data Privacy ◽

Statistical Estimation ◽

Differential Privacy ◽

Future Research ◽

Reference Source ◽

Complex Data ◽

Comprehensive Survey

With the advent of the era of big data, privacy issues have been becoming a hot topic in public. Local differential privacy (LDP) is a state-of-the-art privacy preservation technique that allows to perform big data analysis (e.g., statistical estimation, statistical learning, and data mining) while guaranteeing each individual participant’s privacy. In this paper, we present a comprehensive survey of LDP. We first give an overview on the fundamental knowledge of LDP and its frameworks. We then introduce the mainstream privatization mechanisms and methods in detail from the perspective of frequency oracle and give insights into recent studied on private basic statistical estimation (e.g., frequency estimation and mean estimation) and complex statistical estimation (e.g., multivariate distribution estimation and private estimation over complex data) under LDP. Furthermore, we present current research circumstances on LDP including the private statistical learning/inferencing, private statistical data analysis, privacy amplification techniques for LDP, and some application fields under LDP. Finally, we identify future research directions and open challenges for LDP. This survey can serve as a good reference source for the research of LDP to deal with various privacy-related scenarios to be encountered in practice.

A Wait-Free Multi-word Atomic (1,N) Register for Large-Scale Data Sharing on Multi-core Machines

2017 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2017.84 ◽

2017 ◽

Author(s):

Mauro Ianni ◽

Alessandro Pellegrini ◽

Francesco Quaglia

Keyword(s):

Data Sharing ◽

Large Scale ◽

Large Scale Data ◽

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Data Lake Ecosystem Workflow

10.21079/11681/40203 ◽

2021 ◽

Author(s):

R. Salter ◽

Quyen Dong ◽

Cody Coleman ◽

Maria Seale ◽

Alicia Ruvinsky ◽

...

Keyword(s):

Big Data ◽

Language Processing ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Lake Ecosystem ◽

Data Governance ◽

Government Organizations ◽

Large Scale Data ◽

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

A Systematic Analysis of Big Image Data Methodologies in Various Applications

10.35940/ijitee.e2307.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 483-487

Keyword(s):

Big Data ◽

Deep Learning ◽

Large Scale ◽

Image Data ◽

Computational Time ◽

Process Data ◽

Systematic Analysis ◽

Large Scale Data ◽

Learning Techniques ◽

Effective Performance

Big data is large-scale data collected for knowledge discovery, it has been widely used in various applications. Big data often has image data from the various applications and requires effective technique to process data. In this paper, survey has been done in the big image data researches to analysis the effective performance of the methods. Deep learning techniques provides the effective performance compared to other methods included wavelet based methods. The deep learning techniques has the problem of requiring more computational time, and this can be overcome by lightweight methods.

Knowledge Big Graph Fusing Ontology with Property Graph: A Case Study of Financial Ownership Network

KNOWLEDGE ORGANIZATION ◽

10.5771/0943-7444-2021-1-55 ◽

2021 ◽

Vol 48 (1) ◽

pp. 55-71

Author(s):

Xiao-Bo Tang ◽

Wei-Gang Fu ◽

Yan Liu

Keyword(s):

Big Data ◽

Large Scale ◽

Graph Model ◽

Big Graph ◽

Processing Power ◽

Inference Algorithms ◽

Data Environment ◽

Fusion Framework ◽

Contradictory Data

The scale of knowledge is growing rapidly in the big data environment, and traditional knowledge organization and services have faced the dilemma of semantic inaccuracy and untimeliness. From a knowledge fusion perspective-combining the precise semantic superiority of traditional ontology with the large-scale graph processing power and the predicate attribute expression ability of property graph-this paper presents an ontology and property graph fusion framework (OPGFF). The fusion process is divided into content layer fusion and constraint layer fusion. The result of the fusion, that is, the knowledge representation model is called knowledge big graph. In addition, this paper applies the knowledge big graph model to the ownership network in the China’s financial field and builds a financial ownership knowledge big graph. Furthermore, this paper designs and implements six consistency inference algorithms for finding contradictory data and filling in missing data in the financial ownership knowledge big graph, five of which are completely domain agnostic. The correctness and validity of the algorithms have been experimentally verified with actual data. The fusion OPGFF framework and the implementation method of the knowledge big graph could provide technical reference for big data knowledge organization and services.

Affordances of Data Science in Agriculture, Manufacturing, and Education

Web Services ◽

10.4018/978-1-5225-7501-6.ch052 ◽

2019 ◽

pp. 953-978

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.