A COMPARATIVE ANALYSIS OF CONVENTIONAL HADOOP WITH PROPOSED CLOUD ENABLED HADOOP FRAMEWORK FOR SPATIAL BIG DATA PROCESSING

<p><strong>Abstract.</strong> The emergence of new tools and technologies to gather the information generate the problem of processing spatial big data. The solution of this problem requires new research, techniques, innovation and development. Spatial big data is categorized by the five V’s: volume, velocity, veracity, variety and value. Hadoop is a most widely used framework which address these problems. But it requires high performance computing resources to store and process such huge data. The emergence of cloud computing has provided, on demand, elastic, scalable and payment based computing resources to users to develop their own computing environment. The main objective of this paper is to develop a cloud enabled hadoop framework which combines cloud technology and high computing resources with the conventional hadoop framework to support the spatial big data solutions. The paper also compares the conventional hadoop framework and proposed cloud enabled hadoop framework. It is observed that the propose cloud enabled hadoop framework is much efficient to spatial big data processing than the current available solutions.</p>

Download Full-text

High-Performance Computing for Big Data Processing

Future Generation Computer Systems ◽

10.1016/j.future.2018.07.054 ◽

2018 ◽

Vol 88 ◽

pp. 693-695 ◽

Cited By ~ 1

Author(s):

Yulei Wu ◽

Yang Xiang ◽

Jingguo Ge ◽

Peter Muller

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Big Data Processing ◽

Performance Computing

Download Full-text

An Overview on the Convergence of High Performance Computing and Big Data Processing

2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) ◽

10.1109/padsw.2018.8644997 ◽

2018 ◽

Cited By ~ 1

Author(s):

Songzhu Mei ◽

Hongtao Guan ◽

Qinglin Wang

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Big Data Processing ◽

Performance Computing

Download Full-text

◾ Approaches for High-Performance Big Data Processing: Applications and Challenges

Big Data ◽

10.1201/b18050-10 ◽

2015 ◽

pp. 128-141

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance ◽

Big Data Processing

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

Improving the performance of GIS polygon overlay computation with MapReduce for spatial big data processing

Cluster Computing ◽

10.1007/s10586-015-0428-x ◽

2015 ◽

Vol 18 (2) ◽

pp. 507-516 ◽

Cited By ~ 39

Author(s):

Yong Wang ◽

Zhenling Liu ◽

Hongyan Liao ◽

Chengjun Li

Keyword(s):

Big Data ◽

Data Processing ◽

Big Data Processing ◽

Spatial Big Data

Download Full-text

Big Data Processing on Cloud Computing Using Hadoop Mapreduce and Apache Spark

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch009 ◽

2018 ◽

pp. 224-250

Author(s):

Yassir Samadi ◽

Mostapha Zbakh ◽

Amine Haouari

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Apache Spark ◽

Big Data Processing ◽

Data Intensive ◽

Hadoop Mapreduce ◽

Huge Data ◽

Increasing Demand ◽

Exponential Rates

Size of the data used by enterprises has been growing at exponential rates since last few years; handling such huge data from various sources is a challenge for Businesses. In addition, Big Data becomes one of the major areas of research for Cloud Service providers due to a large amount of data produced every day, and the inefficiency of traditional algorithms and technologies to handle these large amounts of data. In order to resolve the aforementioned problems and to meet the increasing demand for high-speed and data-intensive computing, several solutions have been developed by researches and developers. Among these solutions, there are Cloud Computing tools such as Hadoop MapReduce and Apache Spark, which work on the principles of parallel computing. This chapter focuses on how big data processing challenges can be handled by using Cloud Computing frameworks and the importance of using Cloud Computing by businesses

Download Full-text

High-Performance Geospatial Big Data Processing System Based on MapReduce

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7100399 ◽

2018 ◽

Vol 7 (10) ◽

pp. 399 ◽

Cited By ~ 10

Author(s):

Junghee Jo ◽

Kang-Woo Lee

Keyword(s):

Big Data ◽

Spatial Analysis ◽

Data Processing ◽

Spatial Data ◽

High Performance ◽

Rapid Development ◽

Processing System ◽

Data Processing System ◽

Seamless Integration ◽

Big Data Processing

With the rapid development of Internet of Things (IoT) technologies, the increasing volume and diversity of sources of geospatial big data have created challenges in storing, managing, and processing data. In addition to the general characteristics of big data, the unique properties of spatial data make the handling of geospatial big data even more complicated. To facilitate users implementing geospatial big data applications in a MapReduce framework, several big data processing systems have extended the original Hadoop to support spatial properties. Most of those platforms, however, have included spatial functionalities by embedding them as a form of plug-in. Although offering a convenient way to add new features to an existing system, the plug-in has several limitations. In particular, while executing spatial and nonspatial operations by alternating between the existing system and the plug-in, additional read and write overheads have to be added to the workflow, significantly reducing performance efficiency. To address this issue, we have developed Marmot, a high-performance, geospatial big data processing system based on MapReduce. Marmot extends Hadoop at a low level to support seamless integration between spatial and nonspatial operations of a solid framework, allowing improved performance of geoprocessing workflow. This paper explains the overall architecture and data model of Marmot as well as the main algorithm for automatic construction of MapReduce jobs from a given spatial analysis task. To illustrate how Marmot transforms a sequence of operators for spatial analysis to map and reduce functions in a way to achieve better performance, this paper presents an example of spatial analysis retrieving the number of subway stations per city in Korea. This paper also experimentally demonstrates that Marmot generally outperforms SpatialHadoop, one of the top plug-in based spatial big data frameworks, particularly in dealing with complex and time-intensive queries involving spatial index.

Download Full-text

Upgrading a high performance computing environment for massive data processing

Journal of Internet Services and Applications ◽

10.1186/s13174-019-0118-7 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Lucas M. Ponce ◽

Walter dos Santos ◽

Wagner Meira ◽

Dorgival Guedes ◽

Daniele Lezzi ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Data Access ◽

Massive Data ◽

Analysis Tool ◽

Data Framework ◽

Performance Computing ◽

Massive Data Processing

Abstract High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.

Download Full-text