Ameliorating the Privacy on Large Scale Aviation Dataset by Implementing MapReduce Multidimensional Hybrid k-Anonymization

2019 ◽  
Vol 11 (2) ◽  
pp. 14-40
Author(s):  
Stephen Dass A. ◽  
Prabhu J.

In this fast growing data universe, data generation and data storage are moving into the next-generation process by generating petabytes and gigabytes in an hour. This leads to data accumulation where privacy and preservation are certainly misplaced. This data contains some sensitive and high privacy data which is to be hidden or removed using hashing or anonymization algorithms. In this article, the authors propose a hybrid k anonymity algorithm to handle large scale aircraft datasets with combined concepts of Big Data analytics and privacy preservation of storing the dataset with the help of MapReduce. This published anonymized data are moved by MapReduce to the Hive database for data storage. The authors propose a multi-dimensional hybrid k-anonymity technique to solve the privacy issue and compare the proposed system with other two anonymization methods such as BUG and TDS. Three experiments were performed for evaluating classifier error, calculating disruption value and p% hybrid anonymity and estimation of processing time.

Author(s):  
Stephen Dass A. ◽  
Prabhu J.

In this fast growing data universe, data generation and data storage are moving into the next-generation process by generating petabytes and gigabytes in an hour. This leads to data accumulation where privacy and preservation are certainly misplaced. This data contains some sensitive and high privacy data which is to be hidden or removed using hashing or anonymization algorithms. In this article, the authors propose a hybrid k anonymity algorithm to handle large scale aircraft datasets with combined concepts of Big Data analytics and privacy preservation of storing the dataset with the help of MapReduce. This published anonymized data are moved by MapReduce to the Hive database for data storage. The authors propose a multi-dimensional hybrid k-anonymity technique to solve the privacy issue and compare the proposed system with other two anonymization methods such as BUG and TDS. Three experiments were performed for evaluating classifier error, calculating disruption value and p% hybrid anonymity and estimation of processing time.


Big Data is the era of data processing. Big Data is the Collate’s observer data sets that are complicated that traditional data-processing abilities.There are the various challenges include data analysis, capture the data, curation, search, sharing, stowage, transmission, visualization, and privacy violations. A large collections of petabytes of data is engendered day by day from the up-to-date information systems and digital era such as Internet of Things and cloud computing. Big data environs is used to attain, organize and analyse the numerous types of data. A large scale distributed file system which should be a fault tolerant, flexible and scalable. The term big data comes with the new challenges to input, process and output the data The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Pig, Apache Hive, No SQL and Spark. Initially, we extant the definition of big data and discuss big data challenges. Succeeding, The Propionate Paramour of Big Data Systems Models in the Into Prolonging Seam, Namely data Generation, data Assange, data Storage, and data Analytics. These four modules form a big data value chain. In accumulation, we present the prevalent Hadoop framework for addressing big data.


Author(s):  
Pijush Kanti Dutta Pramanik ◽  
Saurabh Pal ◽  
Moutan Mukhopadhyay

Like other fields, the healthcare sector has also been greatly impacted by big data. A huge volume of healthcare data and other related data are being continually generated from diverse sources. Tapping and analysing these data, suitably, would open up new avenues and opportunities for healthcare services. In view of that, this paper aims to present a systematic overview of big data and big data analytics, applicable to modern-day healthcare. Acknowledging the massive upsurge in healthcare data generation, various ‘V's, specific to healthcare big data, are identified. Different types of data analytics, applicable to healthcare, are discussed. Along with presenting the technological backbone of healthcare big data and analytics, the advantages and challenges of healthcare big data are meticulously explained. A brief report on the present and future market of healthcare big data and analytics is also presented. Besides, several applications and use cases are discussed with sufficient details.


2019 ◽  
Vol 5 (2) ◽  
pp. 76-82
Author(s):  
Cornelius Mellino Sarungu ◽  
Liliana Liliana

Project management practice used many tools to support the process of recording and tracking data generated along the whole project. Project analytics provide deeper insights to be used on decision making. To conduct project analytics, one should explore the tools and techniques required. The mostcommon tool is Microsoft Excel. Its simplicity and flexibility make project manager or project team members can utilize it to do almost any kind of activities. We combine MS Excel with R Studio to brought data analytics into the project management process. While the data input process still using the old way that the project manager already familiar, the analytic engine could extract data from it and create visualization of needed parameters in a single output report file. This kind of approach deliver a low cost solution of project analytics for the organization. We can implement it with relatively low cost technology onone side, some of them are free, while maintaining the simple way of data generation process. This solution can also be proposed to improve project management process maturity level to the next stage, like CMMI level 4 that promote project analytics. Index Terms—project management, project analytics, data analytics.


2021 ◽  
Author(s):  
R. Salter ◽  
Quyen Dong ◽  
Cody Coleman ◽  
Maria Seale ◽  
Alicia Ruvinsky ◽  
...  

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.


Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


Biotechnology ◽  
2019 ◽  
pp. 1967-1984
Author(s):  
Dharmendra Trikamlal Patel

Voluminous data are being generated by various means. The Internet of Things (IoT) has emerged recently to group all manmade artificial things around us. Due to intelligent devices, the annual growth of data generation has increased rapidly, and it is expected that by 2020, it will reach more than 40 trillion GB. Data generated through devices are in unstructured form. Traditional techniques of descriptive and predictive analysis are not enough for that. Big Data Analytics have emerged to perform descriptive and predictive analysis on such voluminous data. This chapter first deals with the introduction to Big Data Analytics. Big Data Analytics is very essential in Bioinformatics field as the size of human genome sometimes reaches 200 GB. The chapter next deals with different types of big data in Bioinformatics. The chapter describes several problems and challenges based on big data in Bioinformatics. Finally, the chapter deals with techniques of Big Data Analytics in the Bioinformatics field.


Author(s):  
Marcus Tanque ◽  
Harry J Foxwell

Big data and cloud computing are transforming information technology. These comparable technologies are the result of dramatic developments in computational power, virtualization, network bandwidth, availability, storage capability, and cyber-physical systems. The crossroads of these two areas, involves the use of cloud computing services and infrastructure, to support large-scale data analytics research, providing relevant solutions or future possibilities for supply chain management. This chapter broadens the current posture of cloud computing and big data, as associate with the supply chain solutions. This chapter focuses on areas of significant technology and scientific advancements, which are likely to enhance supply chain systems. This evaluation emphasizes the security challenges and mega-trends affecting cloud computing and big data analytics pertaining to supply chain management.


Big Data ◽  
2016 ◽  
pp. 1555-1581
Author(s):  
Gueyoung Jung ◽  
Tridib Mukherjee

In the modern information era, the amount of data has exploded. Current trends further indicate exponential growth of data in the future. This prevalent humungous amount of data—referred to as big data—has given rise to the problem of finding the “needle in the haystack” (i.e., extracting meaningful information from big data). Many researchers and practitioners are focusing on big data analytics to address the problem. One of the major issues in this regard is the computation requirement of big data analytics. In recent years, the proliferation of many loosely coupled distributed computing infrastructures (e.g., modern public, private, and hybrid clouds, high performance computing clusters, and grids) have enabled high computing capability to be offered for large-scale computation. This has allowed the execution of the big data analytics to gather pace in recent years across organizations and enterprises. However, even with the high computing capability, it is a big challenge to efficiently extract valuable information from vast astronomical data. Hence, we require unforeseen scalability of performance to deal with the execution of big data analytics. A big question in this regard is how to maximally leverage the high computing capabilities from the aforementioned loosely coupled distributed infrastructure to ensure fast and accurate execution of big data analytics. In this regard, this chapter focuses on synchronous parallelization of big data analytics over a distributed system environment to optimize performance.


Sign in / Sign up

Export Citation Format

Share Document