Performance Evaluation of Map Reduce vs. Spark framework on Amazon Machine Image for TeraSort Algorithm

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. We focus on the comparison of TeraSort algorithm on the different distributed platforms with different configurations of the resources. We have considered the parameters of measure of efficiency as Compute Time, Data Read, Data Write, Compute Time, and Speedup. We have conducted experiments using Hadoop map reduce and Spark (Java). We empirically evaluate the performance of TeraSort algorithm on Amazon EC2 Machine Images, and demonstrate that it achieves 3.95 × - 2.4 × speedup, compared with TeraSort, for typical settings of interest.

Download Full-text

Performance evaluation of Amazon EC2 for NASA HPC applications

Proceedings of the 3rd workshop on Scientific Cloud Computing Date - ScienceCloud '12 ◽

10.1145/2287036.2287045 ◽

2012 ◽

Cited By ~ 42

Author(s):

Piyush Mehrotra ◽

Jahed Djomehri ◽

Steve Heistand ◽

Robert Hood ◽

Haoqiang Jin ◽

...

Keyword(s):

Performance Evaluation ◽

Amazon Ec2

Download Full-text

Comparison of the effectiveness of the Map-Reduce approach and the actor model in solving problems with high connectivity of inp data on the example of the optimization problem for a swarm of particles

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2021.01.049 ◽

2021 ◽

pp. 049-055

Author(s):

Larin V.O. ◽

◽

Provotar O.I. ◽

Keyword(s):

Optimization Problem ◽

Map Reduce ◽

Actor Model ◽

Swarm Optimization ◽

Memory Efficiency ◽

Order Of Magnitude ◽

High Connectivity ◽

Memory Support ◽

Bounded Input ◽

Spark Framework

The paper defines the notion of distributed problems with bounded input components. Particle Swarm Optimization problem is shown to be an example of such a class. Such a problem's implementation based on the Map-Reduce model (implemented on the Spark framework) and an implementation based on an actor model with shared memory support (implemented on Strumok DSL) is provided. Both versions' performance assessment is conducted. The hybrid actor model is shown to be an order of magnitude more effective in time and memory efficiency than Map-Reduce implementation. Additional optimization for the hybrid actor model solution is proposed. The prospects of using the hybrid actor model for other similar problems are given

Download Full-text

A Web Based Data Processing Concept for Building Diagnostics and Performance Evaluation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.887.641 ◽

2019 ◽

Vol 887 ◽

pp. 641-649

Author(s):

Matthias Schuss ◽

Farhang Tahmasebi ◽

Ardeshir Mahdavi

Keyword(s):

Performance Evaluation ◽

Web Services ◽

Data Processing ◽

Web Service ◽

Data Access ◽

Data Logging ◽

Time Data ◽

Web Based ◽

Related Service ◽

The Impact

Buildings are responsible for a major amount of the annual energy consumption. A detailed recording and evaluation of building data could provide a deeper understanding of building operation schemes and the corresponding performance. This could help building owners and operators to evaluate and better understand the actual situation. Based on this (real-time) data an optimized operation scheme can be designed and implemented for future time steps. Additionally, a more detailed understanding of the impact of previous building systems interactions will be possible. The building automation industry and the related service provider sector are actually providing proprietary solutions for data logging, visualization and energy optimization. Such solutions are regularly integrated into their own specific software of the used proprietary building management solutions. As an alternative, we suggest an Internet of Things (IoT) and web services inspired concept for the implementation of a generic web service for building diagnostics. Our suggestion encompasses a holistic performance evaluation that considers both the energy consumptions and delivered building service. In this contribution, a general design of a web service based solution is presented and the future possibilities for data access from various sources are discussed. Furthermore, details of actually developed and demonstratively implemented software components for data preprocessing are presented. Data processing examples for different types of data are included and highlight the potential of such web-based approaches. Moreover, possibilities for improved building control by the use of web services for operation schedule generation or model predictive control are illustrated and critically debated.

Download Full-text

Performance evaluation of USP's cloud platform and its comparison with Amazon EC2

2015 2nd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE) ◽

10.1109/apwccse.2015.7476128 ◽

2015 ◽

Cited By ~ 2

Author(s):

Ravishel Naicker ◽

Nikhil Chand ◽

Asneel Raj ◽

Ashma Khanam ◽

Sunil Lal

Keyword(s):

Performance Evaluation ◽

Cloud Platform ◽

Amazon Ec2

Download Full-text

Performance Evaluation of Reliable Real-Time Data Distribution for UAV-aided Tactical Networks

Communications in Computer and Information Science - Control and Automation, and Energy System Engineering ◽

10.1007/978-3-642-26010-0_21 ◽

2011 ◽

pp. 176-182

Author(s):

Jongryool Kim ◽

KeyongTae Kim ◽

Soo-Hyung Lee ◽

JongWon Kim

Keyword(s):

Performance Evaluation ◽

Real Time ◽

Data Distribution ◽

Time Data ◽

Real Time Data ◽

Tactical Networks

Download Full-text

Performance evaluation of transport layer protocols for transmitting real-time data over DiffServ networks

10.1117/12.434363 ◽

2001 ◽

Author(s):

Yoko Noda ◽

Tatsuhiko Sakai ◽

Hiroshi Shigeno ◽

Yutaka Matsushita

Keyword(s):

Performance Evaluation ◽

Real Time ◽

Transport Layer ◽

Time Data ◽

Real Time Data ◽

Transport Layer Protocols

Download Full-text

Build and Evaluate a Free Virtual Cluster on Amazon Elastic Compute Cloud for Scientific Computing

International Journal of Online Engineering (iJOE) ◽

10.3991/ijoe.v13i08.7373 ◽

2017 ◽

Vol 13 (08) ◽

pp. 121 ◽

Cited By ~ 1

Author(s):

Jie Xiong ◽

Shen-Han Shi ◽

Song Zhang

Keyword(s):

Cloud Computing ◽

Performance Evaluation ◽

Scientific Computing ◽

Prototype System ◽

Virtual Cluster ◽

Huge Amount ◽

Computing Systems ◽

Comprehensive Performance ◽

Amazon Ec2

Scientific computing requires a huge amount of computing resources, but not all the scientific researchers have an access to sufficient high-end computing systems. Currently, Amazon provides a free tier account for cloud computing which could be used to build a virtual cluster. In order to investigate whether it is suitable for scientific computing, we first describe how to build a free virtual cluster using StarCluster on Amazon Elastic Compute Cloud (EC2). Then, we perform a comprehensive performance evaluation of the virtual cluster built before. The results show that a free virtual cluster is easily built on Amazon EC2 and it is suitable for the basic scientific computing. It is especially valuable for scientific researchers, who do not have any HPC or cluster, to develop and test their prototype system of scientific computing without paying anything, and move it to a higher performance virtual cluster when necessary by choosing more powerful instance on Amazon EC2.

Download Full-text

Real World Performance Evaluation of FF-MAC Protocol for Real-Time Data Forwarding in WSN

Wireless Personal Communications ◽

10.1007/s11277-015-2329-y ◽

2015 ◽

Vol 82 (4) ◽

pp. 2017-2031

Author(s):

Khalid El Gholami ◽

Kun Mean Hou ◽

Najib Elkamoun ◽

Hong Ling Shi ◽

Xing Liu

Keyword(s):

Performance Evaluation ◽

Real Time ◽

Real World ◽

Mac Protocol ◽

Data Forwarding ◽

Time Data ◽

Real Time Data

Download Full-text

Big Data Classification Using Distributed Optimized Hoeffding Trees

Journal of Machine Intelligence ◽

10.21174/jomi.v2i1.101 ◽

2017 ◽

Vol 2 (1) ◽

pp. 14-20

Author(s):

Sharmishta Suhas Desai ◽

S. T. Patil

Keyword(s):

Data Stream ◽

Three Dimensions ◽

Map Reduce ◽

Time Data ◽

Data Set ◽

Local Optima ◽

Hoeffding Tree ◽

Leaf Level ◽

Global Optima ◽

Big Data Classification

Large usage of social media, online shopping or transactions gives birth to voluminous data. Visual representation and analysis of this large amount of data is one of the major research topics today. As this data is changing over the period of time, we need an approach which will take care of velocity of data as well as volume and variety. In this paper, author has proposed a distributed method which will handle three dimensions of data and gives good results as compared to other method. Traditional algorithms are based on global optima which are basically memory resident programs. Our approach which is based on optimized hoeffding bound uses local optima method and distributed map-reduce architecture. It does not require copying whole data set onto a memory. As the model build is frequently updated on multiple nodes concurrently, it is more suitable for time varying data. Hoeffding bound is basically suitable for real time data stream. We have proposed very efficient distributed map-reduce architecture to implement hoeffding tree efficiently. We have used deep learning at leaf level to optimize the hoeffding tree. Drift detection is taken care by the architecture itself no separate provision is required for this. In this paper, with experimental results it is proved that our method takes less learning time with more accuracy. Also distributed algorithm for hoeffding tree implementation is proposed.

Download Full-text

Comparative analysis of various distributed file systems & performance evaluation using map reduce implementation

2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE) ◽

10.1109/icraie.2016.7939473 ◽

2016 ◽

Cited By ~ 1

Author(s):

Madhavi Vaidya ◽

Shrinivas Deshpande

Keyword(s):

Performance Evaluation ◽

Comparative Analysis ◽

File Systems ◽

Map Reduce ◽

Distributed File Systems

Download Full-text