Performance Evaluation of Data Intensive Computing In the Cloud

Big data is a topic of active research in the cloud community. With increasing demand for data storage in the cloud, study of data-intensive applications is becoming a primary focus. Data-intensive applications involve high CPU usage for processing large volumes of data on the scale of terabytes or petabytes. While some research exists for the performance effect of data intensive applications in the cloud, none of the research compares the Amazon Elastic Compute Cloud (Amazon EC2) and Google Compute Engine (GCE) clouds using multiple benchmarks. This study performs extensive research on the Amazon EC2 and GCE clouds using the TeraSort, MalStone and CreditStone benchmarks on Hadoop and Sector data layers. Data collected for the Amazon EC2 and GCE clouds measure performance as the number of nodes is varied. This study shows that GCE is more efficient for data-intensive applications compared to Amazon EC2.

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

Empirical Performance Analysis of HPC Benchmarks Across Variations in Cloud Computing

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2013010102 ◽

2013 ◽

Vol 3 (1) ◽

pp. 13-26 ◽

Cited By ~ 4

Author(s):

Sanjay P. Ahuja ◽

Sindhu Mani

Keyword(s):

Data Storage ◽

High Performance ◽

Large Data ◽

Extensive Study ◽

Memory Bandwidth ◽

Platform As A Service ◽

Data Intensive ◽

Computational Performance ◽

Empirical Performance ◽

Data Intensive Applications

High Performance Computing (HPC) applications are scientific applications that require significant CPU capabilities. They are also data-intensive applications requiring large data storage. While many researchers have examined the performance of Amazon’s EC2 platform across some HPC benchmarks, an extensive study and their comparison between Amazon’s EC2 and Microsoft’s Windows Azure is largely missing with metrics such as memory bandwidth, I/O performance, and communication and computational performance. The purpose of this paper is to implement existing benchmarks to evaluate and analyze these metrics for EC2 and Windows Azure that span both Infrastructure-as-a-Service and Platform-as-a-Service types. This was accomplished by running MPI versions of STREAM, Interleaved or Random (IOR) and NAS Parallel (NPB) benchmarks on small and medium instance types. In addition a new EC2 medium instance type (m1.medium) was also included in the analysis. These benchmarks measure the memory bandwidth, I/O performance, communication and computational performance.

Download Full-text

NoSQL Databases

Handbook of Research on Securing Cloud-Based Databases with Biometric Applications - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-6559-0.ch006 ◽

2015 ◽

pp. 109-152 ◽

Cited By ~ 2

Author(s):

Mainak Adhikari ◽

Sukhendu Kar

Keyword(s):

Cloud Computing ◽

Social Networking ◽

Data Storage ◽

Social Networking Sites ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Nosql Database ◽

Real Time Applications ◽

Data Intensive Applications

NoSQL database provides a mechanism for storage and access of data across multiple storage clusters. NoSQL dabases are finding significant and growing industry to meet the huge data storage requirements of Big data, real time applications, and Cloud Computing. NoSQL databases have lots of advantages over the conventional RDBMS features. NoSQL systems are also referred to as “Not only SQL” to emphasize that they may in fact allow Structured language like SQL, and additionally, they allow Semi Structured as well as Unstructured language. A variety of NoSQL databases having different features to deal with exponentially growing data intensive applications are available with open source and proprietary option mostly prompted and used by social networking sites. This chapter discusses some features and challenges of NoSQL databases and some of the popular NoSQL databases with their features on the light of CAP theorem.

Download Full-text

Performance-efficient Recommendation and Prediction Service for Big Data frameworks focusing on Data Compression and In-memory Data Storage Indicators

Scalable Computing Practice and Experience ◽

10.12694/scpe.v22i4.1945 ◽

2021 ◽

Vol 22 (4) ◽

pp. 401-412

Author(s):

Hrachya Astsatryan ◽

Arthur Lalayan ◽

Aram Kocharyan ◽

Daniel Hagimont

Keyword(s):

Big Data ◽

Data Compression ◽

Data Storage ◽

File Systems ◽

Large Datasets ◽

Data Sets ◽

Mapreduce Framework ◽

Data Intensive ◽

Parallel Data ◽

Data Intensive Applications

The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on metrics, and recommends the best configuration parameters to improve Hadoop and Spark frameworks' performance. Several CPU and data-intensive applications and micro-benchmarks have been evaluated to improve the performance, including Log Analyzer, WordCount, and K-Means.

Download Full-text

Efficient Querying Distributed Big-XML Data using MapReduce

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2016070105 ◽

2016 ◽

Vol 8 (3) ◽

pp. 70-79 ◽

Cited By ~ 4

Author(s):

Song Kunfang ◽

Hongwei Lu

Keyword(s):

Data Storage ◽

Two Phase ◽

Xml Data ◽

Data Intensive ◽

Labeling Algorithm ◽

Efficiency And Effectiveness ◽

Hierarchical Index ◽

Parallel Queries ◽

Computing Framework ◽

Data Intensive Applications

MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. This paper proposed an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. The authors' solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, the authors introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, an advanced two-phase MapReduce solution are designed that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The experimental results show the efficiency and effectiveness of the proposed parallel XML data approach using Hadoop.

Download Full-text

Phase-change heterostructure enables ultralow noise and drift for memory operation

Science ◽

10.1126/science.aay0291 ◽

2019 ◽

Vol 366 (6462) ◽

pp. 210-215 ◽

Cited By ~ 47

Author(s):

Keyuan Ding ◽

Jiangjing Wang ◽

Yuxing Zhou ◽

He Tian ◽

Lu Lu ◽

...

Keyword(s):

Phase Change ◽

Data Storage ◽

High Performance ◽

Random Access ◽

Access Memory ◽

Von Neumann ◽

Data Intensive ◽

Set Operations ◽

Manufacturing Procedure ◽

Data Intensive Applications

Artificial intelligence and other data-intensive applications have escalated the demand for data storage and processing. New computing devices, such as phase-change random access memory (PCRAM)–based neuro-inspired devices, are promising options for breaking the von Neumann barrier by unifying storage with computing in memory cells. However, current PCRAM devices have considerable noise and drift in electrical resistance that erodes the precision and consistency of these devices. We designed a phase-change heterostructure (PCH) that consists of alternately stacked phase-change and confinement nanolayers to suppress the noise and drift, allowing reliable iterative RESET and cumulative SET operations for high-performance neuro-inspired computing. Our PCH architecture is amenable to industrial production as an intrinsic materials solution, without complex manufacturing procedure or much increased fabrication cost.

Download Full-text

ExoApp: Performance Evaluation of Data-Intensive Applications on ExoGENI

2013 Second GENI Research and Educational Experiment Workshop ◽

10.1109/gree.2013.14 ◽

2013 ◽

Author(s):

Ze Yu ◽

Xinxin Liu ◽

Min Li ◽

Kaikai Liu ◽

Xiaolin Li

Keyword(s):

Performance Evaluation ◽

Data Intensive ◽

Evaluation Of Data ◽

Data Intensive Applications

Download Full-text

Intelligent Secure Storage Mechanism for Big Data

Webology ◽

10.14704/web/v18si01/web18057 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 246-261

Author(s):

K.R. Remesh Babu ◽

K.P. Madhu

Keyword(s):

Big Data ◽

Data Storage ◽

Big Data Analytics ◽

Business Organizations ◽

Storage Mechanism ◽

Data Intensive ◽

Secure Storage ◽

Huge Data ◽

Efficient Data ◽

Data Intensive Applications

The management of big data became more important due to the wide spread adoption of internet of things in various fields. The developments in technology, science, human habits, etc., generates massive amount of data, so it is increasingly important to store and protect these data from attacks. Big data analytics is now a hot topic. The data storage facility provided by the cloud computing enabled business organizations to overcome the burden of huge data storage and maintenance. Also, several distributed cloud applications supports them to analyze this data for taking appropriate decisions. The dynamic growth of data and data intensive applications demands an efficient intelligent storage mechanism for big data. The proposed system analyzes IP packets for vulnerabilities and classifies data nodes as reliable and unreliable nodes for the efficient data storage. The proposed Apriori algorithm based method automatically classifies the nodes for intelligent secure storage mechanism for the distributed big data storage.

Download Full-text

Task Selection for Scheduling using Hadoop Scheduler

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1020.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 708-710

Keyword(s):

Big Data ◽

Parallel Programming ◽

Data Storage ◽

Selection Method ◽

Experimental Result ◽

Task Selection ◽

Data Intensive ◽

Selection For ◽

Data Intensive Applications ◽

Big Data Storage

MapReduce is a prevalent model for data intensive applications. This covers the difficulties of parallel programming and provides an abstract environment. Hadoop is a benchmark for Big Data storage by being able to provide load balancing, scalable and fault tolerance operation. Hadoop output is mainly dependent on scheduler. Various algorithms for scheduling [6-10]have been suggested for various types of environments, applications and workload. In this work new task selection method is developed to facilitate the scheduler, if a node has several local tasks. Experimental result shows an improvement of 20% in respect of locality and fairness.

Download Full-text