scholarly journals Performance Evaluation of Data Intensive Computing In the Cloud

2015 ◽  
pp. 1901-1914 ◽  
Author(s):  
Sanjay P. Ahuja ◽  
Bhagavathi Kaza

Big data is a topic of active research in the cloud community. With increasing demand for data storage in the cloud, study of data-intensive applications is becoming a primary focus. Data-intensive applications involve high CPU usage for processing large volumes of data on the scale of terabytes or petabytes. While some research exists for the performance effect of data intensive applications in the cloud, none of the research compares the Amazon Elastic Compute Cloud (Amazon EC2) and Google Compute Engine (GCE) clouds using multiple benchmarks. This study performs extensive research on the Amazon EC2 and GCE clouds using the TeraSort, MalStone and CreditStone benchmarks on Hadoop and Sector data layers. Data collected for the Amazon EC2 and GCE clouds measure performance as the number of nodes is varied. This study shows that GCE is more efficient for data-intensive applications compared to Amazon EC2.

2014 ◽  
Vol 4 (2) ◽  
pp. 34-47
Author(s):  
Sanjay P. Ahuja ◽  
Bhagavathi Kaza

Big data is a topic of active research in the cloud community. With increasing demand for data storage in the cloud, study of data-intensive applications is becoming a primary focus. Data-intensive applications involve high CPU usage for processing large volumes of data on the scale of terabytes or petabytes. While some research exists for the performance effect of data intensive applications in the cloud, none of the research compares the Amazon Elastic Compute Cloud (Amazon EC2) and Google Compute Engine (GCE) clouds using multiple benchmarks. This study performs extensive research on the Amazon EC2 and GCE clouds using the TeraSort, MalStone and CreditStone benchmarks on Hadoop and Sector data layers. Data collected for the Amazon EC2 and GCE clouds measure performance as the number of nodes is varied. This study shows that GCE is more efficient for data-intensive applications compared to Amazon EC2.


Author(s):  
Ganesh Chandra Deka

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.


2013 ◽  
Vol 3 (1) ◽  
pp. 13-26 ◽  
Author(s):  
Sanjay P. Ahuja ◽  
Sindhu Mani

High Performance Computing (HPC) applications are scientific applications that require significant CPU capabilities. They are also data-intensive applications requiring large data storage. While many researchers have examined the performance of Amazon’s EC2 platform across some HPC benchmarks, an extensive study and their comparison between Amazon’s EC2 and Microsoft’s Windows Azure is largely missing with metrics such as memory bandwidth, I/O performance, and communication and computational performance. The purpose of this paper is to implement existing benchmarks to evaluate and analyze these metrics for EC2 and Windows Azure that span both Infrastructure-as-a-Service and Platform-as-a-Service types. This was accomplished by running MPI versions of STREAM, Interleaved or Random (IOR) and NAS Parallel (NPB) benchmarks on small and medium instance types. In addition a new EC2 medium instance type (m1.medium) was also included in the analysis. These benchmarks measure the memory bandwidth, I/O performance, communication and computational performance.


Author(s):  
Mainak Adhikari ◽  
Sukhendu Kar

NoSQL database provides a mechanism for storage and access of data across multiple storage clusters. NoSQL dabases are finding significant and growing industry to meet the huge data storage requirements of Big data, real time applications, and Cloud Computing. NoSQL databases have lots of advantages over the conventional RDBMS features. NoSQL systems are also referred to as “Not only SQL” to emphasize that they may in fact allow Structured language like SQL, and additionally, they allow Semi Structured as well as Unstructured language. A variety of NoSQL databases having different features to deal with exponentially growing data intensive applications are available with open source and proprietary option mostly prompted and used by social networking sites. This chapter discusses some features and challenges of NoSQL databases and some of the popular NoSQL databases with their features on the light of CAP theorem.


2021 ◽  
Vol 22 (4) ◽  
pp. 401-412
Author(s):  
Hrachya Astsatryan ◽  
Arthur Lalayan ◽  
Aram Kocharyan ◽  
Daniel Hagimont

The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on metrics, and recommends the best configuration parameters to improve Hadoop and Spark frameworks' performance. Several CPU and data-intensive applications and micro-benchmarks have been evaluated to improve the performance, including Log Analyzer, WordCount, and K-Means.


Author(s):  
Song Kunfang ◽  
Hongwei Lu

MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. This paper proposed an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. The authors' solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, the authors introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, an advanced two-phase MapReduce solution are designed that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The experimental results show the efficiency and effectiveness of the proposed parallel XML data approach using Hadoop.


Science ◽  
2019 ◽  
Vol 366 (6462) ◽  
pp. 210-215 ◽  
Author(s):  
Keyuan Ding ◽  
Jiangjing Wang ◽  
Yuxing Zhou ◽  
He Tian ◽  
Lu Lu ◽  
...  

Artificial intelligence and other data-intensive applications have escalated the demand for data storage and processing. New computing devices, such as phase-change random access memory (PCRAM)–based neuro-inspired devices, are promising options for breaking the von Neumann barrier by unifying storage with computing in memory cells. However, current PCRAM devices have considerable noise and drift in electrical resistance that erodes the precision and consistency of these devices. We designed a phase-change heterostructure (PCH) that consists of alternately stacked phase-change and confinement nanolayers to suppress the noise and drift, allowing reliable iterative RESET and cumulative SET operations for high-performance neuro-inspired computing. Our PCH architecture is amenable to industrial production as an intrinsic materials solution, without complex manufacturing procedure or much increased fabrication cost.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 01) ◽  
pp. 246-261
Author(s):  
K.R. Remesh Babu ◽  
K.P. Madhu

The management of big data became more important due to the wide spread adoption of internet of things in various fields. The developments in technology, science, human habits, etc., generates massive amount of data, so it is increasingly important to store and protect these data from attacks. Big data analytics is now a hot topic. The data storage facility provided by the cloud computing enabled business organizations to overcome the burden of huge data storage and maintenance. Also, several distributed cloud applications supports them to analyze this data for taking appropriate decisions. The dynamic growth of data and data intensive applications demands an efficient intelligent storage mechanism for big data. The proposed system analyzes IP packets for vulnerabilities and classifies data nodes as reliable and unreliable nodes for the efficient data storage. The proposed Apriori algorithm based method automatically classifies the nodes for intelligent secure storage mechanism for the distributed big data storage.


MapReduce is a prevalent model for data intensive applications. This covers the difficulties of parallel programming and provides an abstract environment. Hadoop is a benchmark for Big Data storage by being able to provide load balancing, scalable and fault tolerance operation. Hadoop output is mainly dependent on scheduler. Various algorithms for scheduling [6-10]have been suggested for various types of environments, applications and workload. In this work new task selection method is developed to facilitate the scheduler, if a node has several local tasks. Experimental result shows an improvement of 20% in respect of locality and fairness.


Sign in / Sign up

Export Citation Format

Share Document