Addressing big data problem using Hadoop and Map Reduce

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.

Download Full-text

Kringing Regressive Map reduce Entropy Feature Extraction based Rocchio Adaptive Boost Ensemble Classifier for Early Disease Diagnosis with Big Data

Dynamic Systems and Applications ◽

10.46719/dsa20213064 ◽

2021 ◽

Vol 30 (6) ◽

Author(s):

A Kaliappan ◽

D Chitra

Keyword(s):

Feature Extraction ◽

Big Data ◽

Disease Diagnosis ◽

Ensemble Classifier ◽

Map Reduce ◽

Early Disease

Download Full-text

Analysing Distributed Big Data through Hadoop Map Reduce

International Journal of Computer Applications ◽

10.5120/ijca2015907156 ◽

2015 ◽

Vol 129 (15) ◽

pp. 26-31 ◽

Cited By ~ 1

Author(s):

Arpit Gupta ◽

Rajiv Pandey ◽

Komal Verma

Keyword(s):

Big Data ◽

Map Reduce

Download Full-text

A FEASIBILITY STUDY ON BIG DATA INTEGRATION AND ITS METHODOLOGIES FOR HADOOP TECHNIQUES USING MAP REDUCE MODEL

International Journal of Modern Trends in Engineering & Research ◽

10.21884/ijmter.2016.3072.qxemv ◽

2016 ◽

Vol 3 (9) ◽

pp. 230-238

Keyword(s):

Big Data ◽

Data Integration ◽

Feasibility Study ◽

Map Reduce

Download Full-text

Leveraging big-data for business process analytics

The Learning Organization ◽

10.1108/tlo-05-2014-0023 ◽

2015 ◽

Vol 22 (4) ◽

pp. 215-228 ◽

Cited By ~ 21

Author(s):

Alejandro Vera-Baquero ◽

Ricardo Colomo Palacios ◽

Vladimir Stantchev ◽

Owen Molloy

Keyword(s):

Big Data ◽

Process Improvement ◽

Business Process ◽

Business Performance ◽

Business Processes ◽

Heterogeneous Systems ◽

Map Reduce ◽

Heterogeneous Environments ◽

Business Process Improvement ◽

Content Type

Purpose – This paper aims to present a solution that enables organizations to monitor and analyse the performance of their business processes by means of Big Data technology. Business process improvement can drastically influence in the profit of corporations and helps them to remain viable. However, the use of traditional Business Intelligence systems is not sufficient to meet today ' s business needs. They normally are business domain-specific and have not been sufficiently process-aware to support the needs of process improvement-type activities, especially on large and complex supply chains, where it entails integrating, monitoring and analysing a vast amount of dispersed event logs, with no structure, and produced on a variety of heterogeneous environments. This paper tackles this variability by devising different Big-Data-based approaches that aim to gain visibility into process performance. Design/methodology/approach – Authors present a cloud-based solution that leverages (BD) technology to provide essential insights into business process improvement. The proposed solution is aimed at measuring and improving overall business performance, especially in very large and complex cross-organisational business processes, where this type of visibility is hard to achieve across heterogeneous systems. Findings – Three different (BD) approaches have been undertaken based on Hadoop and HBase. We introduced first, a map-reduce approach that it is suitable for batch processing and presents a very high scalability. Secondly, we have described an alternative solution by integrating the proposed system with Impala. This approach has significant improvements in respect with map reduce as it is focused on performing real-time queries over HBase. Finally, the use of secondary indexes has been also proposed with the aim of enabling immediate access to event instances for correlation in detriment of high duplication storage and synchronization issues. This approach has produced remarkable results in two real functional environments presented in the paper. Originality/value – The value of the contribution relies on the comparison and integration of software packages towards an integrated solution that is aimed to be adopted by industry. Apart from that, in this paper, authors illustrate the deployment of the architecture in two different settings.

Download Full-text

Application of Big Data Problem-Solving Framework in Healthcare Sector—Recent Advancement

Smart Innovation, Systems and Technologies - Intelligent and Cloud Computing ◽

10.1007/978-981-15-5971-6_88 ◽

2020 ◽

pp. 819-826

Author(s):

Sushreeta Tripathy ◽

Tripti Swarnkar

Keyword(s):

Big Data ◽

Problem Solving ◽

Healthcare Sector ◽

Recent Advancement ◽

Data Problem

Download Full-text

An Ideal Big Data Architectural Analysis for Medical Image Data Classification or Clustering Using the Map-Reduce Frame Work

Lecture Notes in Electrical Engineering - ICCCE 2020 ◽

10.1007/978-981-15-7961-5_134 ◽

2020 ◽

pp. 1481-1494

Author(s):

Hemanth Kumar Vasireddi ◽

K. Suganya Devi

Keyword(s):

Big Data ◽

Medical Image ◽

Image Data ◽

Data Classification ◽

Map Reduce ◽

Architectural Analysis ◽

Frame Work ◽

Medical Image Data ◽

Image Data Classification

Download Full-text

Big Data

Security, Privacy, and Forensics Issues in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-9742-1.ch002 ◽

2020 ◽

pp. 24-65

Author(s):

P. Lalitha Surya Kumari

Keyword(s):

Big Data ◽

Life Cycle ◽

Data Collection ◽

Operating Systems ◽

Data Security ◽

Map Reduce ◽

Security Issues ◽

Big Data Applications

This chapter gives information about the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by big data applications. Big data is one area where we can store, extract, and process a large amount of data. All these data are very often unstructured. Using big data, security functions are required to work over the heterogeneous composition of diverse hardware, operating systems, and network domains. A clearly defined security boundary like firewalls and demilitarized zones (DMZs), conventional security solutions, are not effective for big data as it expands with the help of public clouds. This chapter discusses the different concepts like characteristics, risks, life cycle, and data collection of big data, map reduce components, issues and challenges in big data, cloud secure alliance, approaches to solve security issues, introduction of cybercrime, YARN, and Hadoop components.

Download Full-text

BigGIS With Hadoop in MapReduce Environment

Handbook of Research on Digital Research Methods and Architectural Tools in Urban Planning and Design - Advances in Civil and Industrial Engineering ◽

10.4018/978-1-5225-9238-9.ch002 ◽

2019 ◽

pp. 25-32

Author(s):

Nada M. Alhakkak

Keyword(s):

Big Data ◽

Scheduling Algorithm ◽

Real Data ◽

Map Reduce ◽

Data Types ◽

Simulated Environment ◽

Merge Sort ◽

Data Source ◽

Sort Algorithm ◽

And Storage

BigGIS is a new product that resulted from developing GIS in the “Big Data” area, which is used in storing and processing big geographical data and helps in solving its issues. This chapter describes an optimized Big GIS framework in Map Reduce Environment M2BG. The suggested framework has been integrated into Map Reduce Environment in order to solve the storage issues and get the benefit of the Hadoop environment. M2BG include two steps: Big GIS warehouse and Big GIS Map Reduce. The first step contains three main layers: Data Source and Storage Layer (DSSL), Data Processing Layer (DPL), and Data Analysis Layer (DAL). The second layer is responsible for clustering using swarms as inputs for the Hadoop phase. Then it is scheduled in the mapping part with the use of a preempted priority scheduling algorithm; some data types are classified as critical and some others are ordinary data type; the reduce part used, merge sort algorithm M2BG, should solve security and be implemented with real data in the simulated environment and later in the real world.

Download Full-text

Intelligent Big Data Analytics

Advances in Business Information Systems and Analytics - Maximizing Business Performance and Efficiency Through Intelligent Systems ◽

10.4018/978-1-5225-2234-8.ch003 ◽

2017 ◽

pp. 50-72 ◽

Cited By ~ 2

Author(s):

Dheeraj Malhotra ◽

Neha Verma ◽

Om Prakash Rishi ◽

Jatinder Singh

Keyword(s):

Big Data ◽

System Design ◽

Research Work ◽

Big Data Analytics ◽

Map Reduce ◽

Time Intervals ◽

Online Commerce ◽

Online Purchase ◽

Online Retailers ◽

Novel Approaches

With the explosive increase in regular E Commerce users, online commerce companies must have more customer friendly websites to satisfy the personalized requirements of online customer to progress their market share over competition; Different individuals have different purchase requirements at different time intervals and hence novel approaches are often required to be deployed by online retailers in order to identify the latest purchase requirements of customer. This research work proposes a novel MR apriori algorithm and system design of a tool called IMSS-SE, which can be used to blend benefits of Apriori-based Map Reduce framework with Intelligent technologies for B2C E-commerce in order to assist the online user to easily search and rank various E Commerce websites which can satisfy his personalized online purchase requirement. An extensive experimental evaluation shows that proposed system can better satisfy the personalized search requirements of E Commerce users than generic search engines.

Download Full-text