Parallel Reachability Testing Based on Hadoop MapReduce

DISTRIBUTED PROCESSING OF LARGE VOLUMES OF TRANSACTIONAL DATA

Naukovyi visnyk Donetskoho natsionalnoho tekhnichnoho universytetu ◽

10.31474/2415-7902-2020-1(4)-2(5)-27-36 ◽

2020 ◽

pp. 27-36

Author(s):

O. Dmytriieva ◽

◽

D. Nikulin

Keyword(s):

Distributed Processing ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Transactional Data

Роботу присвячено питанням розподіленої обробки транзакцій при проведенні аналізу великих обсягів даних з метою пошуку асоціативних правил. На основі відомих алгоритмів глибинного аналізу даних для пошуку частих предметних наборів AIS та Apriori було визначено можливі варіанти паралелізації, які позбавлені необхідності ітераційного сканування бази даних та великого споживання пам'яті. Досліджено можливість перенесення обчислень на різні платформи, які підтримують паралельну обробку даних. В якості обчислювальних платформ було обрано MapReduce – потужну базу для обробки великих, розподілених наборів даних на кластері Hadoop, а також програмний інструмент для обробки надзвичайно великої кількості даних Apache Spark. Проведено порівняльний аналіз швидкодії розглянутих методів, отримано рекомендації щодо ефективного використання паралельних обчислювальних платформ, запропоновано модифікації алгоритмів пошуку асоціативних правил. В якості основних завдань, реалізованих в роботі, слід визначити дослідження сучасних засобів розподіленої обробки структурованих і не структурованих даних, розгортання тестового кластера в хмарному сервісі, розробку скриптів для автоматизації розгортання кластера, проведення модифікацій розподілених алгоритмів з метою адаптації під необхідні фреймворки розподілених обчислень, отримання показників швидкодії обробки даних в послідовному і розподіленому режимах з застосуванням Hadoop MapReduce. та Apache Spark, проведення порівняльного аналізу результатів тестових вимірів швидкодії, отримання та обґрунтування залежності між кількістю оброблюваних даних, і часом, витраченим на обробку, оптимізацію розподілених алгоритмів пошуку асоціативних правил при обробці великих обсягів транзакційних даних, отримання показників швидкодії розподіленої обробки існуючими програмними засобами. Ключові слова: розподілена обробка, транзакційні дані, асоціативні правила, обчислюваний кластер, Hadoop, MapReduce, Apache Spark

Download Full-text

An Efficient Hybrid Approach for Brain Tumor Detection in MR Images using Hadoop-MapReduce

2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics) ◽

10.1109/ithings-greencom-cpscom-smartdata-cybermatics50389.2020.00144 ◽

2020 ◽

Author(s):

Prabhjot Kaur Chahal ◽

Shreelekha Pandey

Keyword(s):

Brain Tumor ◽

Hybrid Approach ◽

Tumor Detection ◽

Mr Images ◽

Hadoop Mapreduce

Download Full-text

Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

SN Computer Science ◽

10.1007/s42979-021-00638-0 ◽

2021 ◽

Vol 2 (4) ◽

Author(s):

D. C. Vinutha ◽

G. T. Raju

Keyword(s):

Big Data ◽

Budget Constraint ◽

Hadoop Mapreduce

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

Estimating runtime of a job in Hadoop MapReduce

Journal Of Big Data ◽

10.1186/s40537-020-00319-4 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Narges Peyravi ◽

Ali Moeini

Keyword(s):

Hadoop Mapreduce

Download Full-text

Best Trade-Off Point Method for Efficient Resource Provisioning in Spark

Algorithms ◽

10.3390/a11120190 ◽

2018 ◽

Vol 11 (12) ◽

pp. 190

Author(s):

Peter Nghiem

Keyword(s):

Resource Allocation ◽

Cluster Computing ◽

High Energy ◽

Optimal Number ◽

Resource Provisioning ◽

Trade Off ◽

Energy Efficient Computing ◽

Hadoop Mapreduce ◽

Efficient Resource ◽

Mathematical Formulas

Considering the recent exponential growth in the amount of information processed in Big Data, the high energy consumed by data processing engines in datacenters has become a major issue, underlining the need for efficient resource allocation for more energy-efficient computing. We previously proposed the Best Trade-off Point (BToP) method, which provides a general approach and techniques based on an algorithm with mathematical formulas to find the best trade-off point on an elbow curve of performance vs. resources for efficient resource provisioning in Hadoop MapReduce. The BToP method is expected to work for any application or system which relies on a trade-off elbow curve, non-inverted or inverted, for making good decisions. In this paper, we apply the BToP method to the emerging cluster computing framework, Apache Spark, and show that its performance and energy consumption are better than Spark with its built-in dynamic resource allocation enabled. Our Spark-Bench tests confirm the effectiveness of using the BToP method with Spark to determine the optimal number of executors for any workload in production environments where job profiling for behavioral replication will lead to the most efficient resource provisioning.

Download Full-text

Model Checking via Reachability Testing for Timed Automata

BRICS Report Series ◽

10.7146/brics.v4i29.18955 ◽

1997 ◽

Vol 4 (29) ◽

Cited By ~ 3

Author(s):

Luca Aceto ◽

Augusto Burgueno ◽

Kim G. Larsen

Keyword(s):

Model Checking ◽

Timed Automata ◽

Reachability Problem ◽

Liveness Properties ◽

Time Logic ◽

Definition Of ◽

Reachability Testing ◽

Time Systems ◽

Timed Automaton ◽

Theoretical Side

In this paper we develop an approach to model-checking for timed automata via reachability testing. As our specification formalism, we consider a dense-time logic with clocks. This logic may be used to express safety and bounded liveness properties of real-time systems. We show how to automatically synthesize, for every logical formula phi, a so-called test automaton T_phi in such a way that checking whether a system S satisfies the property phi can be reduced to a reachability question over the system obtained by making T_phi interact with S. <br />The testable logic we consider is both of practical and theoretical interest. On the practical side, we have used the logic, and the associated approach to model-checking via reachability testing it supports, in the specification and verification in Uppaal of a collision avoidance protocol. On the theoretical side, we show that the logic is powerful enough to permit the definition of characteristic properties, with respect to a timed version of<br />the ready simulation preorder, for nodes of deterministic, tau-free timed automata. This allows one to compute behavioural relations via our model-checking technique, therefore effectively reducing the problem of checking the existence of a behavioural relation among states of a timed automaton to a reachability problem.

Download Full-text

Weighted Finite Automata Based Image Compression on Hadoop MapReduce Framework

2015 IEEE International Congress on Big Data ◽

10.1109/bigdatacongress.2015.101 ◽

2015 ◽

Cited By ~ 1

Author(s):

U.S.N. Raju ◽

Irlanki Sandeep ◽

Nattam Sai Karthik ◽

Rayapudi Siva Praveen ◽

Mayank Singh Sachan

Keyword(s):

Image Compression ◽

Finite Automata ◽

Mapreduce Framework ◽

Hadoop Mapreduce ◽

Weighted Finite Automata

Download Full-text

REACHABILITY TESTING: AN APPROACH TO TESTING CONCURRENT SOFTWARE

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194095000241 ◽

1995 ◽

Vol 05 (04) ◽

pp. 493-510 ◽

Cited By ~ 38

Author(s):

GWAN-HWAN HWANG ◽

KUO-CHUNG TAI ◽

TING-LU HUANG

Keyword(s):

Empirical Studies ◽

Concurrent Programs ◽

Concurrent Software ◽

Sequential Programs ◽

Concurrent Program ◽

Testing Approach ◽

Reachability Testing ◽

Deterministic Behavior

Concurrent programs are more difficult to test than sequential programs because of non-deterministic behavior. An execution of a concurrent program non-deterministically exercises a sequence of synchronization events called a synchronization sequence (or SYN-sequence). Non-deterministic testing of a concurrent program P is to execute P with a given input many times in order to exercise distinct SYN-sequences. In this paper, we present a new testing approach called reachability testing. If every execution of P with input X terminates, reachability testing of P with input X derives and executes all possible SYN-sequences of P with input X. We show how to perform reachability testing of concurrent programs using read and write operations. Also, we present results of empirical studies comparing reachability and non-deterministic testing. Our results indicate that reachability testing has advantages over non-deterministic testing.

Download Full-text

TaskTracker Aware Scheduling for Hadoop MapReduce

2013 Third International Conference on Advances in Computing and Communications ◽

10.1109/icacc.2013.103 ◽

2013 ◽

Cited By ~ 8

Author(s):

Jisha S. Manjaly ◽

Varghese S. Chooralil

Keyword(s):

Hadoop Mapreduce

Download Full-text