Performance Evaluation of a Big Data Application on Apache Spark

10.32920/ryerson.14651544 ◽

2021 ◽

Author(s):

Jeanne Alcantara

Keyword(s):

Big Data ◽

Performance Evaluation ◽

Execution Time ◽

Apache Spark ◽

Massive Data ◽

Application Performance ◽

Data Application ◽

Size Number ◽

Big Data Application ◽

The Impact

Apache Spark enables a big data application—one that takes massive data as input and may produce massive data along its execution—to run in parallel on multiple nodes. Hence, for a big data application, performance is a vital issue. This project analyzes a WordCount application using Apache Spark, where the impact on the execution time and average utilization is assessed. To facilitate this assessment, the number of executor cores and the size of executor memory are varied across different sizes of data that the application has to process, and the different number of nodes in the cluster that the application runs on. It is concluded that different pairs (data size, number of nodes in the cluster) require different number of executor cores and different size of executor memory to obtain optimum results for execution time and average node utilization.

Download Full-text

ECL-watch: A big data application performance tuning tool in the HPCC systems platform

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8258263 ◽

2017 ◽

Cited By ~ 2

Author(s):

Lili Xu ◽

Edin Muharemagic ◽

Flavio Villanustre ◽

Amy Apon

Keyword(s):

Big Data ◽

Performance Tuning ◽

Application Performance ◽

Data Application ◽

Big Data Application ◽

Hpcc Systems

Download Full-text

Performance Analysis of NoSQL and Relational Databases with CouchDB and MySQL for Application’s Data Storage

Applied Sciences ◽

10.3390/app10238524 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8524

Author(s):

Cornelia A. Győrödi ◽

Diana V. Dumşe-Burescu ◽

Doina R. Zmaranda ◽

Robert Ş. Győrödi ◽

Gianina A. Gabor ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

Relational Databases ◽

Database Systems ◽

Application Performance ◽

Big Data Applications ◽

Database Technology ◽

Important Challenge ◽

Big Data Application ◽

The Impact

In the current context of emerging several types of database systems (relational and non-relational), choosing the type and database system for storing large amounts of data in today’s big data applications has become an important challenge. In this paper, we aimed to provide a comparative evaluation of two popular open-source database management systems (DBMSs): MySQL as a relational DBMS and, more recently, as a non-relational DBMS, and CouchDB as a non-relational DBMS. This comparison was based on performance evaluation of CRUD (CREATE, READ, UPDATE, DELETE) operations for different amounts of data to show how these two databases could be modeled and used in an application and highlight the differences in the response time and complexity. The main objective of the paper was to make a comparative analysis of the impact that each specific DBMS has on application performance when carrying out CRUD requests. To perform the analysis and to ensure the consistency of tests, two similar applications were developed in Java, one using MySQL and the other one using CouchDB database; these applications were further used to evaluate the time responses for each database technology on the same CRUD operations on the database. Finally, a comprehensive discussion based on the results of the analysis was performed that centered on the results obtained and several conclusions were revealed. Advantages and drawbacks for each DBMS are outlined to support a decision for choosing a specific type of DBMS that could be used in a big data application.

Download Full-text

Improving Big Data Application Performance in Edge-Cloud Systems

2019 IEEE 12th International Conference on Cloud Computing (CLOUD) ◽

10.1109/cloud.2019.00039 ◽

2019 ◽

Cited By ~ 2

Author(s):

David Haja ◽

Balazs Vass ◽

Laszlo Toka

Keyword(s):

Big Data ◽

Application Performance ◽

Cloud Systems ◽

Data Application ◽

Big Data Application

Download Full-text

Tutorial on Challenges for Big Data Application Performance Tuning and Prediction

Companion Publication for ACM/SPEC on International Conference on Performance Engineering - ICPE '16 Companion ◽

10.1145/2859889.2883587 ◽

2016 ◽

Author(s):

Rekha Singhal

Keyword(s):

Big Data ◽

Performance Tuning ◽

Application Performance ◽

Data Application ◽

Big Data Application

Download Full-text

PNNCP- Parallel Nearest Neighbor Classification and Prediction for Big Data Application Based on Apache Spark and Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1382.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2358-2365

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Streams ◽

Nearest Neighbor ◽

Apache Spark ◽

Course Of Action ◽

Huge Data ◽

Data Application ◽

Big Data Application ◽

Neighbor Classification

Right by and by the Colossal Information applications, for case, social orchestrating, helpful human administrations, agribusiness, keeping cash, stock show, direction, Facebook and so forward are making the data with especially tall speed. Volume and Speed of the Immense data plays a fundamental bit interior the execution of Colossal data applications. Execution of the Colossal data application can be affected by distinctive parameters. Quickly watch, capacity and precision are the a significant parcel of the triumphant parameters which impact the by and gigantic execution of any Huge data applications. Due the energize and underhanded affiliation of the qualities of 7Vs of Colossal data, each Colossal Information affiliations expect the tall execution.Tall execution is the foremost obvious test within the display advancing condition. In this paper we propose the parallel course of action way to bargain with speedup the explore for closest neighbor center. k-NN classifier is the preeminent basic and comprehensively utilized method for gathering. In this paper we apply a parallelism thought to k-NN for looking the another closest neighbor. This neighbor center will be utilized for putting lost and execution of the remarkable data streams. This classifier unequivocally overhaul and coordinate of the out of date data streams. We are utilizing the Apache Begin and scattered estimation space affiliation for snappier evaluation.

Download Full-text