Architecture for Big Data Storage in Different Cloud Deployment Models

Cloud Computing is a new computing model that distributes the computation on a resource pool. The need for a scalable database capable of expanding to accommodate growth has increased with the growing data in web world. More familiar Cloud Computing vendors such as Amazon Web Services, Microsoft, Google, IBM and Rackspace offer cloud based Hadoop and NoSQL database platforms to process Big Data applications. Variety of services are available that run on top of cloud platforms freeing users from the need to deploy their own systems. Nowadays, integrating Big Data and various cloud deployment models is major concern for Internet companies especially software and data services vendors that are just getting started themselves. This chapter proposes an efficient architecture for integration with comprehensive capabilities including real time and bulk data movement, bi-directional replication, metadata management, high performance transformation, data services and data quality for customer and product domains.

Download Full-text

Architecture for Big Data Storage in Different Cloud Deployment Models

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch008 ◽

2018 ◽

pp. 196-226 ◽

Cited By ~ 2

Author(s):

Chandu Thota ◽

Gunasekaran Manogaran ◽

Daphne Lopez ◽

Revathi Sundarasekar

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Storage ◽

High Performance ◽

Data Services ◽

Big Data Applications ◽

Nosql Database ◽

Amazon Web Services ◽

Product Domains ◽

Scalable Database

Download Full-text

Comparison of cloud computing providers for development of big data and internet of things application

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v22.i3.pp1723-1730 ◽

2021 ◽

Vol 22 (3) ◽

pp. 1723

Author(s):

Muhammad Fajrul Falah ◽

Yohanes Yohanie Fridelin Panduman ◽

Sritrusta Sukaridhoto ◽

Arther Wilem Cornelius Tirie ◽

M. Cahyo Kriswantoro ◽

...

Keyword(s):

Cloud Computing ◽

Big Data ◽

Internet Of Things ◽

High Performance ◽

Service Providers ◽

Computing Services ◽

Big Data Applications ◽

Cloud Computing Service ◽

Data Center Location ◽

Improved Technology

The improved technology of big data and the internet of things (IoT) increases the number of developments in the application of smart city and Industry 4.0. Thus, the need for high-performance cloud computing is increasing. However, the increase in cloud computing service providers causes difficulties in determining the chosen service provider. Therefore, the purpose of this study is to make comparisons to determine the criteria for selecting cloud computing services following the system architecture and services needed to develop IoT and big data applications. We have analyzed several parameters such as technology specifications, model services, data center location, big data service, internet of things, microservices architecture, cloud computing management, and machine learning. We use these parameters to compare several cloud computing service providers. The results present that the parameters able to use as a reference for choosing cloud computing for the implementation of IoT and big data technology.

Download Full-text

Current Big Data Issues and Their Solutions via Deep Learning: An Overview

Iraqi Journal for Electrical And Electronic Engineering ◽

10.37917/ijeee.14.2.5 ◽

2018 ◽

Vol 14 (2) ◽

pp. 127-138

Author(s):

Asif Banka ◽

Roohie Mir

Keyword(s):

Machine Learning ◽

Big Data ◽

Deep Learning ◽

Data Storage ◽

High Performance ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Sets ◽

Big Data Applications ◽

Unseen Data

The advancements in modern day computing and architectures focus on harnessing parallelism and achieve high performance computing resulting in generation of massive amounts of data. The information produced needs to be represented and analyzed to address various challenges in technology and business domains. Radical expansion and integration of digital devices, networking, data storage and computation systems are generating more data than ever. Data sets are massive and complex, hence traditional learning methods fail to rescue the researchers and have in turn resulted in adoption of machine learning techniques to provide possible solutions to mine the information hidden in unseen data. Interestingly, deep learning finds its place in big data applications. One of major advantages of deep learning is that it is not human engineered. In this paper, we look at various machine learning algorithms that have already been applied to big data related problems and have shown promising results. We also look at deep learning as a rescue and solution to big data issues that are not efficiently addressed using traditional methods. Deep learning is finding its place in most applications where we come across critical and dominating 5Vs of big data and is expected to perform better.

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

Performance analysis model for big data applications in cloud computing

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-014-0019-z ◽

2014 ◽

Vol 3 (1) ◽

Cited By ~ 8

Author(s):

Luis Eduardo Bautista Villalpando ◽

Alain April ◽

Alain Abran

Keyword(s):

Cloud Computing ◽

Big Data ◽

Performance Analysis ◽

Analysis Model ◽

Big Data Applications

Download Full-text

Cloud Computing in Singapore: Key Drivers and Recommendations for a Smart Nation

Politics and Governance ◽

10.17645/pag.v6i4.1757 ◽

2018 ◽

Vol 6 (4) ◽

pp. 39-47 ◽

Cited By ~ 2

Author(s):

Reuben Ng

Keyword(s):

Public Policy ◽

Cloud Computing ◽

Big Data ◽

Business Models ◽

Fog Computing ◽

Novel Technologies ◽

Cloud Computing Adoption ◽

Big Data Applications ◽

Key Drivers ◽

The Government

Cloud computing adoption enables big data applications in governance and policy. Singapore’s adoption of cloud computing is propelled by five key drivers: (1) public demand for and satisfaction with e-government services; (2) focus on whole-of-government policies and practices; (3) restructuring of technology agencies to integrate strategy and implementation; (4) building the Smart Nation Platform; (5) purpose-driven cloud applications especially in healthcare. This commentary also provides recommendations to propel big data applications in public policy and management: (a) technologically, embrace cloud analytics, and explore “fog computing”—an emerging technology that enables on-site data sense-making before transmission to the cloud; (b) promote regulatory sandboxes to experiment with policies that proactively manage novel technologies and business models that may radically change society; (c) on the collaboration front, establish unconventional partnerships to co-innovate on challenges like the skills-gap—an example is the unprecedented partnership led by the Lee Kuan Yew School of Public Policy with the government, private sector and unions.

Download Full-text

The Construction and Thinking of Cloud Computing and Big Data Technology in Smart Campus

Advances in Higher Education ◽

10.18686/ahe.v3i2.1435 ◽

2019 ◽

Vol 3 (2) ◽

pp. 152

Author(s):

Xianglan Wu

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Storage ◽

Information Technologies ◽

Rapid Development ◽

Smart Campus ◽

Huge Data ◽

New Information ◽

Big Data Technology

<p>In today's society, the rise of the Internet and rapid development make every day produce a huge amount of data. Therefore, the traditional data processing mode and data storage can not be fully analyzed and mined these data. More and more new information technologies (such as cloud computing, virtualization and big data, etc.) have emerged and been applied, the network has turned from informationization to intelligence, and campus construction has ushered in the stage of smart campus construction.The construction of intelligent campus refers to big data and cloud computing technology, which improves the informatization service quality of colleges and universities by integrating, storing and mining huge data.</p>

Download Full-text

Task-based programming in COMPSs to converge from HPC to big data

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017701278 ◽

2017 ◽

Vol 32 (1) ◽

pp. 45-60 ◽

Cited By ~ 11

Author(s):

Javier Conejero ◽

Sandra Corella ◽

Rosa M Badia ◽

Jesus Labarta

Keyword(s):

Big Data ◽

High Performance ◽

Programming Model ◽

Good Alternative ◽

Programming Models ◽

Suitable Model ◽

Advantages And Disadvantages ◽

Big Data Applications ◽

And Performance ◽

The Right

Task-based programming has proven to be a suitable model for high-performance computing (HPC) applications. Different implementations have been good demonstrators of this fact and have promoted the acceptance of task-based programming in the OpenMP standard. Furthermore, in recent years, Apache Spark has gained wide popularity in business and research environments as a programming model for addressing emerging big data problems. COMP Superscalar (COMPSs) is a task-based environment that tackles distributed computing (including Clouds) and is a good alternative for a task-based programming model for big data applications. This article describes why we consider that task-based programming models are a good approach for big data applications. The article includes a comparison of Spark and COMPSs in terms of architecture, programming model, and performance. It focuses on the differences that both frameworks have in structural terms, on their programmability interface, and in terms of their efficiency by means of three widely known benchmarking kernels: Wordcount, Kmeans, and Terasort. These kernels enable the evaluation of the more important functionalities of both programming models and analyze different work flows and conditions. The main results achieved from this comparison are (1) COMPSs is able to extract the inherent parallelism from the user code with minimal coding effort as opposed to Spark, which requires the existing algorithms to be adapted and rewritten by explicitly using their predefined functions, (2) it is an improvement in terms of performance when compared with Spark, and (3) COMPSs has shown to scale better than Spark in most cases. Finally, we discuss the advantages and disadvantages of both frameworks, highlighting the differences that make them unique, thereby helping to choose the right framework for each particular objective.

Download Full-text

Task Allocation and Re-allocation for Big Data Applications in Cloud Computing Environments

Intelligent Computing and Innovation on Data Science - Lecture Notes in Networks and Systems ◽

10.1007/978-981-15-3284-9_77 ◽

2020 ◽

pp. 679-686

Author(s):

P. Tamilarasi ◽

D. Akila

Keyword(s):

Cloud Computing ◽

Big Data ◽

Task Allocation ◽

Big Data Applications ◽

Computing Environments

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text