scholarly journals Perbandingan Kinerja Komputasi Hadoop dan Spark untuk Memprediksi Cuaca (Studi Kasus : Storm Event Database)

Repositor ◽  
2020 ◽  
Vol 2 (4) ◽  
pp. 463
Author(s):  
Rendiyono Wahyu Saputro ◽  
Aminuddin Aminuddin ◽  
Yuda Munarko

AbstrakPerkembangan teknologi telah mengakibatkan pertumbuhan data yang semakin cepat dan besar setiap waktunya. Hal tersebut disebabkan oleh banyaknya sumber data seperti mesin pencari, RFID, catatan transaksi digital, arsip video dan foto, user generated content, internet of things, penelitian ilmiah di berbagai bidang seperti genomika, meteorologi, astronomi, fisika, dll. Selain itu, data - data tersebut memiliki karakteristik yang unik antara satu dengan lainnya, hal ini yang menyebabkan tidak dapat diproses oleh teknologi basis data konvensional.Oleh karena itu, dikembangkan beragam framework komputasi terdistribusi seperti Apache Hadoop dan Apache Spark yang memungkinkan untuk memproses data secara terdistribusi dengan menggunakan gugus komputer.Adanya ragam framework komputasi terdistribusi, sehingga diperlukan sebuah pengujian untuk mengetahui kinerja komputasi keduanya. Pengujian dilakukan dengan memproses dataset dengan beragam ukuran dan dalam gugus komputer dengan jumlah node yang berbeda. Dari semua hasil pengujian, Apache Hadoop memerlukan waktu yang lebih sedikit dibandingkan dengan Apache Spark. Hal tersebut terjadi karena nilai throughput dan throughput/node Apache Hadoop lebih tinggi daripada Apache Spark.AbstractTechnological developments have resulted in rapid and growing data growth every time. This is due to the large number of data sources such as search engines, RFID, digital transaction records, video and photo archives, user generated content, internet of things, scientific research in areas such as genomics, meteorology, astronomy, physics, In addition, these data have unique characteristics of each other, this is the cause can not be processed by conventional database technology. Therefore, developed various distributed computing frameworks such as Apache Hadoop and Apache Spark that enable to process data in a distributed by using computer cluster.The existence of various frameworks of distributed computing, so required a test to determine the performance of both computing. Testing is done by processing datasets of various sizes and in clusters of computers with different number of nodes. Of all the test results, Apache Hadoop takes less time than the Apache Spark. This happens because the value of throuhgput and throughput / node Apache Hadoop is higher than Apache Spark.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Alexander Döschl ◽  
Max-Emanuel Keller ◽  
Peter Mandl

Purpose This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA). Design/methodology/approach The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements. Findings The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study. Originality/value There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.


2019 ◽  
Vol 6 (5) ◽  
pp. 519
Author(s):  
Aminudin Aminudin ◽  
Eko Budi Cahyono

<p class="Judul2">Apache Spark merupakan platform yang dapat digunakan untuk memproses data dengan ukuran data yang relatif  besar (<em>big data</em>) dengan kemampuan untuk membagi data tersebut ke masing-masing cluster yang telah ditentukan konsep ini disebut dengan parallel komputing. Apache Spark mempunyai kelebihan dibandingkan dengan framework lain yang serupa misalnya Apache Hadoop dll, di mana Apache Spark mampu memproses data secara streaming artinya data yang masuk ke dalam lingkungan Apache Spark dapat langsung diproses tanpa menunggu data lain terkumpul. Agar di dalam Apache Spark mampu melakukan proses machine learning, maka di dalam paper ini akan dilakukan eksperimen yaitu dengan mengintegrasikan Apache Spark yang bertindak sebagai lingkungan pemrosesan data yang besar dan konsep parallel komputing akan dikombinasikan dengan library H2O yang khusus untuk menangani pemrosesan data menggunakan algoritme machine learning. Berdasarkan hasil pengujian Apache Spark di dalam lingkungan cloud computing, Apache Spark mampu memproses data cuaca yang didapatkan dari arsip data cuaca terbesar yaitu yaitu data NCDC dengan ukuran data sampai dengan 6GB. Data tersebut diproses menggunakan salah satu model machine learning yaitu deep learning dengan membagi beberapa node yang telah terbentuk di lingkungan cloud computing dengan memanfaatkan library H2O. Keberhasilan tersebut dapat dilihat dari parameter pengujian yang telah diujikan meliputi nilai running time, throughput, Avarege Memory dan Average CPU yang didapatkan dari Benchmark Hibench. Semua nilai tersebut  dipengaruhi oleh banyaknya data dan jumlah node.</p><p class="Judul2"> </p><p class="Judul2"><em><strong>Abstract</strong></em></p><p><em>Apache Spark is a platform that can be used to process data with relatively large data sizes (big data) with the ability to divide the data into each cluster that has been determined. This concept is called parallel computing. Apache Spark has advantages compared to other similar frameworks such as Apache Hadoop, etc., where Apache Spark is able to process data in streaming, meaning that the data entered into the Apache Spark environment can be directly processed without waiting for other data to be collected. In order for Apache Spark to be able to do machine learning processes, in this paper an experiment will be conducted that integrates Apache Spark which acts as a large data processing environment and the concept of parallel computing will be combined with H2O libraries specifically for handling data processing using machine learning algorithms . Based on the results of testing Apache Spark in a cloud computing environment, Apache Spark is able to process weather data obtained from the largest weather data archive, namely NCDC data with data sizes up to 6GB. The data is processed using one of the machine learning models namely deep learning by dividing several nodes that have been formed in the cloud computing environment by utilizing the H2O library. The success can be seen from the test parameters that have been tested including the value of running time, throughput, Avarege Memory and CPU Average obtained from the Hibench Benchmark. All these values are influenced by the amount of data and number of nodes.</em><em></em></p><p class="Judul2"><em><strong><br /></strong></em></p>


2014 ◽  
Vol 138 (12) ◽  
pp. 1564-1577 ◽  
Author(s):  
Fan Lin ◽  
Zongming Chen

Context Immunohistochemistry has become an indispensable ancillary technique in anatomic pathology laboratories. Standardization of every step in preanalytic, analytic, and postanalytic phases is crucial to achieve reproducible and reliable immunohistochemistry test results. Objective To standardize immunohistochemistry tests from preanalytic, analytic, to postanalytic phases. Data Sources Literature review and Geisinger (Geisinger Medical Center, Danville, Pennsylvania) experience. Conclusions This review article delineates some critical points in preanalytic, analytic, and postanalytic phases; reiterates some important questions, which may or may not have a consensus at this time; and updates the newly proposed guidelines on antibody validation from the College of American Pathologists Pathology and Laboratory Quality Center. Additionally, the article intends to share Geisinger's experience with (1) testing/optimizing a new antibody and troubleshooting; (2) interpreting and reporting immunohistochemistry assay results; (3) improving and implementing a total immunohistochemistry quality management program; and (4) developing best practices in immunohistochemistry.


Author(s):  
Yuqiao YANG ◽  
Kanhua YU

Internet of Things technology and industrial development will trigger a new round of information technology revolution and industrial revolution, and they are the commanding point of future competition in information industry and core driving force of industrial upgrade. This paper introduces current situation of distance teaching of Internet of Things and architecture specialties, designs and implements distance teaching experiment system platform for architecture specialty based on Internet of Things. This system is based on ZigBee /GPRS wireless network technology, sensor technology, embedded technology, Web distributed software technology and database technology. Besides, it adopts three interlinked networks and achieves efficient connection of multiple experiment terminals, servers and clients. As well, the information exchange is fast. Hence, it is convenient for practical application of distance teaching. The results of teaching experiment show that Internet of Things technology can improve students’ academic performance and teachers’ teaching effect. Therefore, it is a hot spot in modern teaching technology, so we should pay attention to it.


2014 ◽  
Vol 7 (2) ◽  
Author(s):  
Theo Kanter ◽  
Rahim Rahmani ◽  
Jamie Walters ◽  
Willmar Sauter

This article investigates new forms for creating and enabling massive and scalable participatory immersive experiences in live cultural events, characterized by processes, involving pervasive objects, places and people. The multi-disciplinary research outlines a new paradigm for collaborative creation and participation towards technological and social innovation, tapping into crowd-sensing. The approach promotes user-driven content-creation and offsets economic models thereby rewarding creators and performers. In response to these challenges, we propose a framework for bringing about massive and real-time presence and awareness on the Internet through an Internet-of-Things infrastructure to connect artifacts, performers, participants and places. Equally importantly, we enable the in-situ creation of collaborative experiences building on relevant existing and stored content, based on decisions leveraging multi-criteria clustering and proximity of pervasive information, objects, people and places. Finally, we investigate some new ways for immersive experiences via distributed computing but pointing forward to the necessity to do more with regard to collaborative creation.


2021 ◽  
Vol 324 ◽  
pp. 01011
Author(s):  
Eko Prayetno ◽  
Tonny Suhendra ◽  
Jeremya Lukmanto Saputra

Fish is one of the high-protein foods that are very helpful for the development of the human brain. Then, it is necessary to maintain the freshness of the fish for consumption. At this time, fishers and fishmongers preserve the freshness of fish by using Ice in the fish storage. However, it is considered ineffective due to improper ice change time. Therefore, monitoring temperature is very important and helpful to find the right time when replacing the Ice used to ensure the quality of fish. The development of this device uses Arduino ESP32, DHT21 Sensor, Micro SD Module, Internet of Things system, monitoring using Blynk Application and notifications using Telegram App. DHT21 sensor test results obtained a data conformity level (Error Level) of 2%. At the fish storage room temperature, there is the lowest temperature of 10.50 oC and ice temperature conditions in the storage of 0 oC. Therefore, the best state to keep fish fresh that researchers want is 0 oC to 2 oC at ice temperatures or 11.50 oC obtained in testing the time it takes to replace Ice by about 10 hours.


2020 ◽  
Vol 5 (1) ◽  
pp. 84
Author(s):  
Gine Das Prena ◽  
Reynaldi Mulyana Kusmawan

This study aims to determine whether the understanding of Risk Based Internal Audit, Whistleblowing System, anti-fraud awareness, and the application of the principles of Good Corporate Governance affect the prevention of fraud in Rural Credit Banks in Bali Province. This study uses primary data sources, using instruments in the form of questionnaires. The population used in this study were auditor internal and bord of directors from 134 Rural Credit Banks and the sample used was auditor internal from 57 Rural Credit Banks taken purposive sampling. The analytical method used is quantitative analysis that is multiple linear analysis using SPSS test equipment. t test results show that Risk Based  Internal Audit, Whistleblowing System, Anti-Fraud awareness, and the application of the principles of Good Corporate Governance each have a positive effect on Fraud prevention


Author(s):  
Aayush Jain

As of late, the Edge Computing worldview has acquired significant notoriety in scholastic and mechanical circles. It fills in as a key empowering influence for some, future advances like 5G, Internet of Things (IoT), augmented reality by interfacing distributed computing offices and administrations to the end clients. The Edge registering worldview gives low idleness, versatility, and area mindfulness backing to delay-delicate applications. Edge figuring can possibly address the worries of reaction time necessity, transmission capacity cost saving, just as information wellbeing and protection. In this paper, we present the meaning of edge Computing, trailed by a few contextual investigations, going from cloud offloading to smart home and city.


Author(s):  
Saravanan K ◽  
P. Srinivasan

Cloud IoT has evolved from the convergence of Cloud computing with Internet of Things (IoT). The networked devices in the IoT world grow exponentially in the distributed computing paradigm and thus require the power of the Cloud to access and share computing and storage for these devices. Cloud offers scalable on-demand services to the IoT devices for effective communication and knowledge sharing. It alleviates the computational load of IoT, which makes the devices smarter. This chapter explores the different IoT services offered by the Cloud as well as application domains that are benefited by the Cloud IoT. The challenges on offloading the IoT computation into the Cloud are also discussed.


Sign in / Sign up

Export Citation Format

Share Document