A comparative between hadoop mapreduce and apache Spark on HDFS

2020 ◽

pp. 27-36

Author(s):

O. Dmytriieva ◽

◽

D. Nikulin

Keyword(s):

Distributed Processing ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Transactional Data

Роботу присвячено питанням розподіленої обробки транзакцій при проведенні аналізу великих обсягів даних з метою пошуку асоціативних правил. На основі відомих алгоритмів глибинного аналізу даних для пошуку частих предметних наборів AIS та Apriori було визначено можливі варіанти паралелізації, які позбавлені необхідності ітераційного сканування бази даних та великого споживання пам'яті. Досліджено можливість перенесення обчислень на різні платформи, які підтримують паралельну обробку даних. В якості обчислювальних платформ було обрано MapReduce – потужну базу для обробки великих, розподілених наборів даних на кластері Hadoop, а також програмний інструмент для обробки надзвичайно великої кількості даних Apache Spark. Проведено порівняльний аналіз швидкодії розглянутих методів, отримано рекомендації щодо ефективного використання паралельних обчислювальних платформ, запропоновано модифікації алгоритмів пошуку асоціативних правил. В якості основних завдань, реалізованих в роботі, слід визначити дослідження сучасних засобів розподіленої обробки структурованих і не структурованих даних, розгортання тестового кластера в хмарному сервісі, розробку скриптів для автоматизації розгортання кластера, проведення модифікацій розподілених алгоритмів з метою адаптації під необхідні фреймворки розподілених обчислень, отримання показників швидкодії обробки даних в послідовному і розподіленому режимах з застосуванням Hadoop MapReduce. та Apache Spark, проведення порівняльного аналізу результатів тестових вимірів швидкодії, отримання та обґрунтування залежності між кількістю оброблюваних даних, і часом, витраченим на обробку, оптимізацію розподілених алгоритмів пошуку асоціативних правил при обробці великих обсягів транзакційних даних, отримання показників швидкодії розподіленої обробки існуючими програмними засобами. Ключові слова: розподілена обробка, транзакційні дані, асоціативні правила, обчислюваний кластер, Hadoop, MapReduce, Apache Spark

Download Full-text

Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification

10.1145/3481646.3481649 ◽

2021 ◽

Author(s):

Taha Tekdogan ◽

Ali Cakmak

Keyword(s):

Big Data ◽

Data Classification ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Big Data Classification

Download Full-text

Big Data Processing on Cloud Computing Using Hadoop Mapreduce and Apache Spark

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch009 ◽

2018 ◽

pp. 224-250

Author(s):

Yassir Samadi ◽

Mostapha Zbakh ◽

Amine Haouari

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Apache Spark ◽

Big Data Processing ◽

Data Intensive ◽

Hadoop Mapreduce ◽

Huge Data ◽

Increasing Demand ◽

Exponential Rates

Size of the data used by enterprises has been growing at exponential rates since last few years; handling such huge data from various sources is a challenge for Businesses. In addition, Big Data becomes one of the major areas of research for Cloud Service providers due to a large amount of data produced every day, and the inefficiency of traditional algorithms and technologies to handle these large amounts of data. In order to resolve the aforementioned problems and to meet the increasing demand for high-speed and data-intensive computing, several solutions have been developed by researches and developers. Among these solutions, there are Cloud Computing tools such as Hadoop MapReduce and Apache Spark, which work on the principles of parallel computing. This chapter focuses on how big data processing challenges can be handled by using Cloud Computing frameworks and the importance of using Cloud Computing by businesses

Download Full-text

On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8257938 ◽

2017 ◽

Cited By ~ 4

Author(s):

Bilal Akil ◽

Ying Zhou ◽

Uwe Rohm

Keyword(s):

Data Science ◽

Apache Spark ◽

Hadoop Mapreduce

Download Full-text

Comparative Analysis of Apache Spark and Hadoop MapReduce Using Various Parameters and Execution Time

Intelligent Computing and Communication - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-1084-7_70 ◽

2020 ◽

pp. 719-725

Author(s):

Bhagavathula Meena ◽

I. S. L. Sarwani ◽

M. Archana ◽

P. Supriya

Keyword(s):

Comparative Analysis ◽

Execution Time ◽

Apache Spark ◽

Hadoop Mapreduce

Download Full-text

Big Data Processing on Cloud Computing Using Hadoop Mapreduce and Apache Spark

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch039 ◽

2021 ◽

pp. 824-845

Author(s):

Yassir Samadi ◽

Mostapha Zbakh ◽

Amine Haouari

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Apache Spark ◽

Big Data Processing ◽

Data Intensive ◽

Hadoop Mapreduce ◽

Huge Data ◽

Increasing Demand ◽

Exponential Rates

Size of the data used by enterprises has been growing at exponential rates since last few years; handling such huge data from various sources is a challenge for Businesses. In addition, Big Data becomes one of the major areas of research for Cloud Service providers due to a large amount of data produced every day, and the inefficiency of traditional algorithms and technologies to handle these large amounts of data. In order to resolve the aforementioned problems and to meet the increasing demand for high-speed and data-intensive computing, several solutions have been developed by researches and developers. Among these solutions, there are Cloud Computing tools such as Hadoop MapReduce and Apache Spark, which work on the principles of parallel computing. This chapter focuses on how big data processing challenges can be handled by using Cloud Computing frameworks and the importance of using Cloud Computing by businesses

Download Full-text