Ameliorating the Privacy on Large Scale Aviation Dataset by Implementing MapReduce Multidimensional Hybrid k-Anonymization

Author(s):  
Stephen Dass A. ◽  
Prabhu J.

In this fast growing data universe, data generation and data storage are moving into the next-generation process by generating petabytes and gigabytes in an hour. This leads to data accumulation where privacy and preservation are certainly misplaced. This data contains some sensitive and high privacy data which is to be hidden or removed using hashing or anonymization algorithms. In this article, the authors propose a hybrid k anonymity algorithm to handle large scale aircraft datasets with combined concepts of Big Data analytics and privacy preservation of storing the dataset with the help of MapReduce. This published anonymized data are moved by MapReduce to the Hive database for data storage. The authors propose a multi-dimensional hybrid k-anonymity technique to solve the privacy issue and compare the proposed system with other two anonymization methods such as BUG and TDS. Three experiments were performed for evaluating classifier error, calculating disruption value and p% hybrid anonymity and estimation of processing time.

2019 ◽  
Vol 11 (2) ◽  
pp. 14-40
Author(s):  
Stephen Dass A. ◽  
Prabhu J.

In this fast growing data universe, data generation and data storage are moving into the next-generation process by generating petabytes and gigabytes in an hour. This leads to data accumulation where privacy and preservation are certainly misplaced. This data contains some sensitive and high privacy data which is to be hidden or removed using hashing or anonymization algorithms. In this article, the authors propose a hybrid k anonymity algorithm to handle large scale aircraft datasets with combined concepts of Big Data analytics and privacy preservation of storing the dataset with the help of MapReduce. This published anonymized data are moved by MapReduce to the Hive database for data storage. The authors propose a multi-dimensional hybrid k-anonymity technique to solve the privacy issue and compare the proposed system with other two anonymization methods such as BUG and TDS. Three experiments were performed for evaluating classifier error, calculating disruption value and p% hybrid anonymity and estimation of processing time.


Big Data is the era of data processing. Big Data is the Collate’s observer data sets that are complicated that traditional data-processing abilities.There are the various challenges include data analysis, capture the data, curation, search, sharing, stowage, transmission, visualization, and privacy violations. A large collections of petabytes of data is engendered day by day from the up-to-date information systems and digital era such as Internet of Things and cloud computing. Big data environs is used to attain, organize and analyse the numerous types of data. A large scale distributed file system which should be a fault tolerant, flexible and scalable. The term big data comes with the new challenges to input, process and output the data The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Pig, Apache Hive, No SQL and Spark. Initially, we extant the definition of big data and discuss big data challenges. Succeeding, The Propionate Paramour of Big Data Systems Models in the Into Prolonging Seam, Namely data Generation, data Assange, data Storage, and data Analytics. These four modules form a big data value chain. In accumulation, we present the prevalent Hadoop framework for addressing big data.


Author(s):  
Pijush Kanti Dutta Pramanik ◽  
Saurabh Pal ◽  
Moutan Mukhopadhyay

Like other fields, the healthcare sector has also been greatly impacted by big data. A huge volume of healthcare data and other related data are being continually generated from diverse sources. Tapping and analysing these data, suitably, would open up new avenues and opportunities for healthcare services. In view of that, this paper aims to present a systematic overview of big data and big data analytics, applicable to modern-day healthcare. Acknowledging the massive upsurge in healthcare data generation, various ‘V's, specific to healthcare big data, are identified. Different types of data analytics, applicable to healthcare, are discussed. Along with presenting the technological backbone of healthcare big data and analytics, the advantages and challenges of healthcare big data are meticulously explained. A brief report on the present and future market of healthcare big data and analytics is also presented. Besides, several applications and use cases are discussed with sufficient details.


2021 ◽  
Author(s):  
R. Salter ◽  
Quyen Dong ◽  
Cody Coleman ◽  
Maria Seale ◽  
Alicia Ruvinsky ◽  
...  

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.


Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


Author(s):  
Ramgopal Kashyap ◽  
Albert D. Piersson

The motivation behind this chapter is to highlight the qualities, security issue, advantages, and disadvantages of big data. In the recent researches, the issue and challenges are due to the exponential growth of social media data and other images and videos. Big data security threats are rising, which is affecting the data heterogeneity adaptability and privacy preservation analytics. Big data analytics helps cyber security, but no new application can be envisioned without delivering new types of information, working on data-driven calculations and expending determined measure of information. This chapter demonstrates how innate attributes of big data are protected.


Web Services ◽  
2019 ◽  
pp. 953-978
Author(s):  
Krishnan Umachandran ◽  
Debra Sharon Ferdinand-James

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.


Biotechnology ◽  
2019 ◽  
pp. 1967-1984
Author(s):  
Dharmendra Trikamlal Patel

Voluminous data are being generated by various means. The Internet of Things (IoT) has emerged recently to group all manmade artificial things around us. Due to intelligent devices, the annual growth of data generation has increased rapidly, and it is expected that by 2020, it will reach more than 40 trillion GB. Data generated through devices are in unstructured form. Traditional techniques of descriptive and predictive analysis are not enough for that. Big Data Analytics have emerged to perform descriptive and predictive analysis on such voluminous data. This chapter first deals with the introduction to Big Data Analytics. Big Data Analytics is very essential in Bioinformatics field as the size of human genome sometimes reaches 200 GB. The chapter next deals with different types of big data in Bioinformatics. The chapter describes several problems and challenges based on big data in Bioinformatics. Finally, the chapter deals with techniques of Big Data Analytics in the Bioinformatics field.


Author(s):  
Marcus Tanque ◽  
Harry J Foxwell

Big data and cloud computing are transforming information technology. These comparable technologies are the result of dramatic developments in computational power, virtualization, network bandwidth, availability, storage capability, and cyber-physical systems. The crossroads of these two areas, involves the use of cloud computing services and infrastructure, to support large-scale data analytics research, providing relevant solutions or future possibilities for supply chain management. This chapter broadens the current posture of cloud computing and big data, as associate with the supply chain solutions. This chapter focuses on areas of significant technology and scientific advancements, which are likely to enhance supply chain systems. This evaluation emphasizes the security challenges and mega-trends affecting cloud computing and big data analytics pertaining to supply chain management.


Sign in / Sign up

Export Citation Format

Share Document