scholarly journals Performance Evaluation of Map Reduce vs. Spark framework on Amazon Machine Image for TeraSort Algorithm

Author(s):  
Gangadhara Rao Kommu

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. We focus on the comparison of TeraSort algorithm on the different distributed platforms with different configurations of the resources. We have considered the parameters of measure of efficiency as Compute Time, Data Read, Data Write, Compute Time, and Speedup. We have conducted experiments using Hadoop map reduce and Spark (Java). We empirically evaluate the performance of TeraSort algorithm on Amazon EC2 Machine Images, and demonstrate that it achieves 3.95 × - 2.4 × speedup, compared with TeraSort, for typical settings of interest.

Author(s):  
Piyush Mehrotra ◽  
Jahed Djomehri ◽  
Steve Heistand ◽  
Robert Hood ◽  
Haoqiang Jin ◽  
...  

2021 ◽  
pp. 049-055
Author(s):  
Larin V.O. ◽  
◽  
Provotar O.I. ◽  

The paper defines the notion of distributed problems with bounded input components. Particle Swarm Optimization problem is shown to be an example of such a class. Such a problem's implementation based on the Map-Reduce model (implemented on the Spark framework) and an implementation based on an actor model with shared memory support (implemented on Strumok DSL) is provided. Both versions' performance assessment is conducted. The hybrid actor model is shown to be an order of magnitude more effective in time and memory efficiency than Map-Reduce implementation. Additional optimization for the hybrid actor model solution is proposed. The prospects of using the hybrid actor model for other similar problems are given


2019 ◽  
Vol 887 ◽  
pp. 641-649
Author(s):  
Matthias Schuss ◽  
Farhang Tahmasebi ◽  
Ardeshir Mahdavi

Buildings are responsible for a major amount of the annual energy consumption. A detailed recording and evaluation of building data could provide a deeper understanding of building operation schemes and the corresponding performance. This could help building owners and operators to evaluate and better understand the actual situation. Based on this (real-time) data an optimized operation scheme can be designed and implemented for future time steps. Additionally, a more detailed understanding of the impact of previous building systems interactions will be possible. The building automation industry and the related service provider sector are actually providing proprietary solutions for data logging, visualization and energy optimization. Such solutions are regularly integrated into their own specific software of the used proprietary building management solutions. As an alternative, we suggest an Internet of Things (IoT) and web services inspired concept for the implementation of a generic web service for building diagnostics. Our suggestion encompasses a holistic performance evaluation that considers both the energy consumptions and delivered building service. In this contribution, a general design of a web service based solution is presented and the future possibilities for data access from various sources are discussed. Furthermore, details of actually developed and demonstratively implemented software components for data preprocessing are presented. Data processing examples for different types of data are included and highlight the potential of such web-based approaches. Moreover, possibilities for improved building control by the use of web services for operation schedule generation or model predictive control are illustrated and critically debated.


2017 ◽  
Vol 13 (08) ◽  
pp. 121 ◽  
Author(s):  
Jie Xiong ◽  
Shen-Han Shi ◽  
Song Zhang

Scientific computing requires a huge amount of computing resources, but not all the scientific researchers have an access to sufficient high-end computing systems. Currently, Amazon provides a free tier account for cloud computing which could be used to build a virtual cluster. In order to investigate whether it is suitable for scientific computing, we first describe how to build a free virtual cluster using StarCluster on Amazon Elastic Compute Cloud (EC2). Then, we perform a comprehensive performance evaluation of the virtual cluster built before. The results show that a free virtual cluster is easily built on Amazon EC2 and it is suitable for the basic scientific computing. It is especially valuable for scientific researchers, who do not have any HPC or cluster, to develop and test their prototype system of scientific computing without paying anything, and move it to a higher performance virtual cluster when necessary by choosing more powerful instance on Amazon EC2.


2015 ◽  
Vol 82 (4) ◽  
pp. 2017-2031
Author(s):  
Khalid El Gholami ◽  
Kun Mean Hou ◽  
Najib Elkamoun ◽  
Hong Ling Shi ◽  
Xing Liu

2017 ◽  
Vol 2 (1) ◽  
pp. 14-20
Author(s):  
Sharmishta Suhas Desai ◽  
S. T. Patil

Large usage of social media, online shopping or transactions gives birth to voluminous data. Visual representation and analysis of this large amount of data is one of the major research topics today. As this data is changing over the period of time, we need an approach which will take care of velocity of data as well as volume and variety. In this paper, author has proposed a distributed method which will handle three dimensions of data and gives good results as compared to other method.  Traditional algorithms are based on global optima which are basically memory resident programs. Our approach which is based on optimized hoeffding bound uses local optima method and distributed map-reduce architecture. It does not require copying whole data set onto a memory. As the model build is frequently updated on multiple nodes concurrently, it is more suitable for time varying data. Hoeffding bound is basically suitable for real time data stream. We have proposed very efficient distributed map-reduce architecture to implement hoeffding tree efficiently. We have used deep learning at leaf level to optimize the hoeffding tree. Drift detection is taken care by the architecture itself no separate provision is required for this. In this paper, with experimental results it is proved that our method takes less learning time with more accuracy. Also distributed algorithm for hoeffding tree implementation is proposed.


Sign in / Sign up

Export Citation Format

Share Document