Perbandingan Kinerja Komputasi Hadoop dan Spark untuk Memprediksi Cuaca (Studi Kasus : Storm Event Database)

Rendiyono Wahyu Saputro; Aminuddin Aminuddin; Yuda Munarko

doi:10.22219/repositor.v2i4.93

Perbandingan Kinerja Komputasi Hadoop dan Spark untuk Memprediksi Cuaca (Studi Kasus : Storm Event Database)

Repositor ◽

10.22219/repositor.v2i4.93 ◽

2020 ◽

Vol 2 (4) ◽

pp. 463

Author(s):

Rendiyono Wahyu Saputro ◽

Aminuddin Aminuddin ◽

Yuda Munarko

Keyword(s):

Internet Of Things ◽

Distributed Computing ◽

Apache Spark ◽

Data Sources ◽

User Generated Content ◽

Storm Event ◽

Test Results ◽

Process Data ◽

Apache Hadoop ◽

Database Technology

AbstrakPerkembangan teknologi telah mengakibatkan pertumbuhan data yang semakin cepat dan besar setiap waktunya. Hal tersebut disebabkan oleh banyaknya sumber data seperti mesin pencari, RFID, catatan transaksi digital, arsip video dan foto, user generated content, internet of things, penelitian ilmiah di berbagai bidang seperti genomika, meteorologi, astronomi, fisika, dll. Selain itu, data - data tersebut memiliki karakteristik yang unik antara satu dengan lainnya, hal ini yang menyebabkan tidak dapat diproses oleh teknologi basis data konvensional.Oleh karena itu, dikembangkan beragam framework komputasi terdistribusi seperti Apache Hadoop dan Apache Spark yang memungkinkan untuk memproses data secara terdistribusi dengan menggunakan gugus komputer.Adanya ragam framework komputasi terdistribusi, sehingga diperlukan sebuah pengujian untuk mengetahui kinerja komputasi keduanya. Pengujian dilakukan dengan memproses dataset dengan beragam ukuran dan dalam gugus komputer dengan jumlah node yang berbeda. Dari semua hasil pengujian, Apache Hadoop memerlukan waktu yang lebih sedikit dibandingkan dengan Apache Spark. Hal tersebut terjadi karena nilai throughput dan throughput/node Apache Hadoop lebih tinggi daripada Apache Spark.AbstractTechnological developments have resulted in rapid and growing data growth every time. This is due to the large number of data sources such as search engines, RFID, digital transaction records, video and photo archives, user generated content, internet of things, scientific research in areas such as genomics, meteorology, astronomy, physics, In addition, these data have unique characteristics of each other, this is the cause can not be processed by conventional database technology. Therefore, developed various distributed computing frameworks such as Apache Hadoop and Apache Spark that enable to process data in a distributed by using computer cluster.The existence of various frameworks of distributed computing, so required a test to determine the performance of both computing. Testing is done by processing datasets of various sizes and in clusters of computers with different number of nodes. Of all the test results, Apache Hadoop takes less time than the Apache Spark. This happens because the value of throuhgput and throughput / node Apache Hadoop is higher than Apache Spark.

Download Full-text

Performance evaluation of GPU- and cluster-computing for parallelization of compute-intensive tasks

International Journal of Web Information Systems ◽

10.1108/ijwis-03-2021-0032 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Alexander Döschl ◽

Max-Emanuel Keller ◽

Peter Mandl

Keyword(s):

Distributed Computing ◽

Cluster Computing ◽

Graphics Processing Unit ◽

Apache Spark ◽

Processing Unit ◽

Data Set ◽

Apache Hadoop ◽

Content Type ◽

All Solutions ◽

Computationally Intensive

Purpose This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA). Design/methodology/approach The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements. Findings The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study. Originality/value There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.

Download Full-text

Pengukuran Performa Apache Spark dengan Library H2O Menggunakan Benchmark Hibench Berbasis Cloud Computing

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2019651520 ◽

2019 ◽

Vol 6 (5) ◽

pp. 519

Author(s):

Aminudin Aminudin ◽

Eko Budi Cahyono

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Big Data ◽

Large Data ◽

Apache Spark ◽

Weather Data ◽

Computing Environment ◽

Process Data ◽

Apache Hadoop ◽

Cloud Computing Environment

Apache Spark merupakan platform yang dapat digunakan untuk memproses data dengan ukuran data yang relatif besar (big data) dengan kemampuan untuk membagi data tersebut ke masing-masing cluster yang telah ditentukan konsep ini disebut dengan parallel komputing. Apache Spark mempunyai kelebihan dibandingkan dengan framework lain yang serupa misalnya Apache Hadoop dll, di mana Apache Spark mampu memproses data secara streaming artinya data yang masuk ke dalam lingkungan Apache Spark dapat langsung diproses tanpa menunggu data lain terkumpul. Agar di dalam Apache Spark mampu melakukan proses machine learning, maka di dalam paper ini akan dilakukan eksperimen yaitu dengan mengintegrasikan Apache Spark yang bertindak sebagai lingkungan pemrosesan data yang besar dan konsep parallel komputing akan dikombinasikan dengan library H2O yang khusus untuk menangani pemrosesan data menggunakan algoritme machine learning. Berdasarkan hasil pengujian Apache Spark di dalam lingkungan cloud computing, Apache Spark mampu memproses data cuaca yang didapatkan dari arsip data cuaca terbesar yaitu yaitu data NCDC dengan ukuran data sampai dengan 6GB. Data tersebut diproses menggunakan salah satu model machine learning yaitu deep learning dengan membagi beberapa node yang telah terbentuk di lingkungan cloud computing dengan memanfaatkan library H2O. Keberhasilan tersebut dapat dilihat dari parameter pengujian yang telah diujikan meliputi nilai running time, throughput, Avarege Memory dan Average CPU yang didapatkan dari Benchmark Hibench. Semua nilai tersebut dipengaruhi oleh banyaknya data dan jumlah node. AbstractApache Spark is a platform that can be used to process data with relatively large data sizes (big data) with the ability to divide the data into each cluster that has been determined. This concept is called parallel computing. Apache Spark has advantages compared to other similar frameworks such as Apache Hadoop, etc., where Apache Spark is able to process data in streaming, meaning that the data entered into the Apache Spark environment can be directly processed without waiting for other data to be collected. In order for Apache Spark to be able to do machine learning processes, in this paper an experiment will be conducted that integrates Apache Spark which acts as a large data processing environment and the concept of parallel computing will be combined with H2O libraries specifically for handling data processing using machine learning algorithms . Based on the results of testing Apache Spark in a cloud computing environment, Apache Spark is able to process weather data obtained from the largest weather data archive, namely NCDC data with data sizes up to 6GB. The data is processed using one of the machine learning models namely deep learning by dividing several nodes that have been formed in the cloud computing environment by utilizing the H2O library. The success can be seen from the test parameters that have been tested including the value of running time, throughput, Avarege Memory and CPU Average obtained from the Hibench Benchmark. All these values are influenced by the amount of data and number of nodes.

Download Full-text

Standardization of Diagnostic Immunohistochemistry: Literature Review and Geisinger Experience

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2014-0074-ra ◽

2014 ◽

Vol 138 (12) ◽

pp. 1564-1577 ◽

Cited By ~ 52

Author(s):

Fan Lin ◽

Zongming Chen

Keyword(s):

Quality Management ◽

Literature Review ◽

Best Practices ◽

Critical Points ◽

Medical Center ◽

Management Program ◽

Review Article ◽

Data Sources ◽

Test Results ◽

Antibody Validation

Context Immunohistochemistry has become an indispensable ancillary technique in anatomic pathology laboratories. Standardization of every step in preanalytic, analytic, and postanalytic phases is crucial to achieve reproducible and reliable immunohistochemistry test results. Objective To standardize immunohistochemistry tests from preanalytic, analytic, to postanalytic phases. Data Sources Literature review and Geisinger (Geisinger Medical Center, Danville, Pennsylvania) experience. Conclusions This review article delineates some critical points in preanalytic, analytic, and postanalytic phases; reiterates some important questions, which may or may not have a consensus at this time; and updates the newly proposed guidelines on antibody validation from the College of American Pathologists Pathology and Laboratory Quality Center. Additionally, the article intends to share Geisinger's experience with (1) testing/optimizing a new antibody and troubleshooting; (2) interpreting and reporting immunohistochemistry assay results; (3) improving and implementing a total immunohistochemistry quality management program; and (4) developing best practices in immunohistochemistry.

Download Full-text

Construction of Distance Education Classroom in Architecture Specialty Based on Internet of Things Technology

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v11i05.5695 ◽

2016 ◽

Vol 11 (05) ◽

pp. 56 ◽

Cited By ~ 6

Author(s):

Yuqiao YANG ◽

Kanhua YU

Keyword(s):

Internet Of Things ◽

Information Exchange ◽

Industrial Development ◽

Industrial Revolution ◽

Hot Spot ◽

Teaching Experiment ◽

Sensor Technology ◽

Distance Teaching ◽

Database Technology ◽

Internet Of Things Technology

Internet of Things technology and industrial development will trigger a new round of information technology revolution and industrial revolution, and they are the commanding point of future competition in information industry and core driving force of industrial upgrade. This paper introduces current situation of distance teaching of Internet of Things and architecture specialties, designs and implements distance teaching experiment system platform for architecture specialty based on Internet of Things. This system is based on ZigBee /GPRS wireless network technology, sensor technology, embedded technology, Web distributed software technology and database technology. Besides, it adopts three interlinked networks and achieves efficient connection of multiple experiment terminals, servers and clients. As well, the information exchange is fast. Hence, it is convenient for practical application of distance teaching. The results of teaching experiment show that Internet of Things technology can improve students’ academic performance and teachers’ teaching effect. Therefore, it is a hot spot in modern teaching technology, so we should pay attention to it.

Download Full-text

Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark

Journal of Systems and Software ◽

10.1016/j.jss.2016.11.037 ◽

2017 ◽

Vol 125 ◽

pp. 133-151 ◽

Cited By ~ 51

Author(s):

Ilias Mavridis ◽

Helen Karatza

Keyword(s):

Performance Evaluation ◽

Apache Spark ◽

Apache Hadoop ◽

Log File Analysis ◽

Log File

Download Full-text

Distributed Immersive Participation as Crowd-Sensing in Culture Events

Journal of Virtual Worlds Research ◽

10.4101/jvwr.v7i2.7083 ◽

2014 ◽

Vol 7 (2) ◽

Cited By ~ 1

Author(s):

Theo Kanter ◽

Rahim Rahmani ◽

Jamie Walters ◽

Willmar Sauter

Keyword(s):

Internet Of Things ◽

Distributed Computing ◽

Real Time ◽

Social Innovation ◽

The Internet ◽

New Paradigm ◽

Content Creation ◽

Crowd Sensing ◽

Information Objects

This article investigates new forms for creating and enabling massive and scalable participatory immersive experiences in live cultural events, characterized by processes, involving pervasive objects, places and people. The multi-disciplinary research outlines a new paradigm for collaborative creation and participation towards technological and social innovation, tapping into crowd-sensing. The approach promotes user-driven content-creation and offsets economic models thereby rewarding creators and performers. In response to these challenges, we propose a framework for bringing about massive and real-time presence and awareness on the Internet through an Internet-of-Things infrastructure to connect artifacts, performers, participants and places. Equally importantly, we enable the in-situ creation of collaborative experiences building on relevant existing and stored content, based on decisions leveraging multi-criteria clustering and proximity of pervasive information, objects, people and places. Finally, we investigate some new ways for immersive experiences via distributed computing but pointing forward to the necessity to do more with regard to collaborative creation.

Download Full-text

Monitoring System of The Temperature for Mini Fish Storage using Internet of Things

E3S Web of Conferences ◽

10.1051/e3sconf/202132401011 ◽

2021 ◽

Vol 324 ◽

pp. 01011

Author(s):

Eko Prayetno ◽

Tonny Suhendra ◽

Jeremya Lukmanto Saputra

Keyword(s):

Internet Of Things ◽

Human Brain ◽

Monitoring System ◽

Room Temperature ◽

High Protein ◽

Test Results ◽

Error Level ◽

The Right ◽

Monitoring Temperature

Fish is one of the high-protein foods that are very helpful for the development of the human brain. Then, it is necessary to maintain the freshness of the fish for consumption. At this time, fishers and fishmongers preserve the freshness of fish by using Ice in the fish storage. However, it is considered ineffective due to improper ice change time. Therefore, monitoring temperature is very important and helpful to find the right time when replacing the Ice used to ensure the quality of fish. The development of this device uses Arduino ESP32, DHT21 Sensor, Micro SD Module, Internet of Things system, monitoring using Blynk Application and notifications using Telegram App. DHT21 sensor test results obtained a data conformity level (Error Level) of 2%. At the fish storage room temperature, there is the lowest temperature of 10.50 oC and ice temperature conditions in the storage of 0 oC. Therefore, the best state to keep fish fresh that researchers want is 0 oC to 2 oC at ice temperatures or 11.50 oC obtained in testing the time it takes to replace Ice by about 10 hours.

Download Full-text

Faktor-faktor Pendukung Pencegahan Fraud pada Bank Perkreditan Rakyat

Jurnal Ilmiah Akuntansi ◽

10.23887/jia.v5i1.24275 ◽

2020 ◽

Vol 5 (1) ◽

pp. 84

Author(s):

Gine Das Prena ◽

Reynaldi Mulyana Kusmawan

Keyword(s):

Corporate Governance ◽

Quantitative Analysis ◽

Internal Audit ◽

Linear Analysis ◽

Data Sources ◽

Primary Data ◽

Test Results ◽

Rural Credit ◽

Good Corporate Governance ◽

Positive Effect

This study aims to determine whether the understanding of Risk Based Internal Audit, Whistleblowing System, anti-fraud awareness, and the application of the principles of Good Corporate Governance affect the prevention of fraud in Rural Credit Banks in Bali Province. This study uses primary data sources, using instruments in the form of questionnaires. The population used in this study were auditor internal and bord of directors from 134 Rural Credit Banks and the sample used was auditor internal from 57 Rural Credit Banks taken purposive sampling. The analytical method used is quantitative analysis that is multiple linear analysis using SPSS test equipment. t test results show that Risk Based Internal Audit, Whistleblowing System, Anti-Fraud awareness, and the application of the principles of Good Corporate Governance each have a positive effect on Fraud prevention

Download Full-text

Edge Computing - The Way Forward

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36628 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 1562-1566

Author(s):

Aayush Jain

Keyword(s):

Reaction Time ◽

Augmented Reality ◽

Internet Of Things ◽

Distributed Computing ◽

Cost Saving ◽

Smart Home ◽

Edge Computing ◽

Transmission Capacity ◽

Capacity Cost ◽

The Way

As of late, the Edge Computing worldview has acquired significant notoriety in scholastic and mechanical circles. It fills in as a key empowering influence for some, future advances like 5G, Internet of Things (IoT), augmented reality by interfacing distributed computing offices and administrations to the end clients. The Edge registering worldview gives low idleness, versatility, and area mindfulness backing to delay-delicate applications. Edge figuring can possibly address the worries of reaction time necessity, transmission capacity cost saving, just as information wellbeing and protection. In this paper, we present the meaning of edge Computing, trailed by a few contextual investigations, going from cloud offloading to smart home and city.

Download Full-text

Examining IoT's Applications Using Cloud Services

Advances in Wireless Technologies and Telecommunication - Examining Cloud Computing Technologies Through the Internet of Things ◽

10.4018/978-1-5225-3445-7.ch008 ◽

2018 ◽

pp. 147-163 ◽

Cited By ~ 9

Author(s):

Saravanan K ◽

P. Srinivasan

Keyword(s):

Cloud Computing ◽

Internet Of Things ◽

Distributed Computing ◽

Knowledge Sharing ◽

Effective Communication ◽

Cloud Services ◽

On Demand ◽

Computing Paradigm ◽

Iot Devices ◽

And Storage

Cloud IoT has evolved from the convergence of Cloud computing with Internet of Things (IoT). The networked devices in the IoT world grow exponentially in the distributed computing paradigm and thus require the power of the Cloud to access and share computing and storage for these devices. Cloud offers scalable on-demand services to the IoT devices for effective communication and knowledge sharing. It alleviates the computational load of IoT, which makes the devices smarter. This chapter explores the different IoT services offered by the Cloud as well as application domains that are benefited by the Cloud IoT. The challenges on offloading the IoT computation into the Cloud are also discussed.

Download Full-text