Treatment and Research of Massive Data Mining Based on Cloud Computing

This paper introduces SPRINT algorithm optimized in the Hadoop core framework. Combing the data mining process, we will study the cloud computing in the MapReduce programming model, then improve and optimize the SPRINT algorithm in conjunction with the mode, transplant the optimized algorithm to Hadoop platform for distributed data processing.

Download Full-text

Apache Nemo: A Framework for Optimizing Distributed Data Processing

ACM Transactions on Computer Systems ◽

10.1145/3468144 ◽

2020 ◽

Vol 38 (3-4) ◽

pp. 1-31

Author(s):

Won Wook Song ◽

Youngseok Yang ◽

Jeongyoon Eo ◽

Jangho Seo ◽

Joo Yeon Kim ◽

...

Keyword(s):

Data Processing ◽

High Performance ◽

Programming Model ◽

Compiler Optimization ◽

Ease Of Use ◽

Distributed Data ◽

Performance Improvements ◽

Distributed Data Processing ◽

Fine Control ◽

High Level

Optimizing scheduling and communication of distributed data processing for resource and data characteristics is crucial for achieving high performance. Existing approaches to such optimizations largely fall into two categories. First, distributed runtimes provide low-level policy interfaces to apply the optimizations, but do not ensure the maintenance of correct application semantics and thus often require significant effort to use. Second, policy interfaces that extend a high-level application programming model ensure correctness, but do not provide sufficient fine control. We describe Apache Nemo, an optimization framework for distributed dataflow processing that provides fine control for high performance and also ensures correctness for ease of use. We combine several techniques to achieve this, including an intermediate representation of dataflow, compiler optimization passes, and runtime extensions. Our evaluation results show that Nemo enables composable and reusable optimizations that bring performance improvements on par with existing specialized runtimes tailored for a specific deployment scenario. Apache Nemo is open-sourced at https://nemo.apache.org as an Apache incubator project.

Download Full-text

Uma análise do consumo de energia de ambientes de processamento de dados massivos em nuvem

10.5753/wperformance.2018.3338 ◽

2018 ◽

Author(s):

Nestor D. O. Volpini ◽

Vinicius S. Conceição ◽

Raphael L. Pontes ◽

Dorgival Guedes

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Performance Metrics ◽

Massive Data ◽

Distributed Environments ◽

Big Data Applications ◽

Massive Data Processing ◽

Data Mining Application

Massive data processing (big-data) related fields and cloud computing have been growing conjointly. Thus, data processing is among the largest resource consumers in datacenters, consuming around 2% of global energy. Comprehension of how elements such as virtualized environments and applications' parallelization degree affect such consumption is therefore an urgent need. This article relies on a monitoring solution that provides performance metrics, data mining application logs, and data produced in distributed environments to assess how power consumption of virtualized big-data applications varies on allocated resources.

Download Full-text

Study on Distributed Data Processing System for Decentralized Spherical Multi-robot based on Edge Computing and Blockchain

2020 IEEE International Conference on Mechatronics and Automation (ICMA) ◽

10.1109/icma49215.2020.9233789 ◽

2020 ◽

Author(s):

Shuxiang Guo ◽

Sheng Cao ◽

Jian Guo ◽

Jigang Xu

Keyword(s):

Data Processing ◽

Processing System ◽

Edge Computing ◽

Distributed Data ◽

Data Processing System ◽

Distributed Data Processing ◽

Multi Robot

Download Full-text

Efficient Geo-distributed Data Processing with Rout

2013 IEEE 33rd International Conference on Distributed Computing Systems ◽

10.1109/icdcs.2013.23 ◽

2013 ◽

Cited By ~ 4

Author(s):

Chamikara Jayalath ◽

Patrick Eugster

Keyword(s):

Data Processing ◽

Distributed Data ◽

Distributed Data Processing

Download Full-text

Distributed Data Processing for Large-Scale Simulations on Cloud

10.1109/emc/si/pi/emceurope52599.2021.9559316 ◽

2021 ◽

Author(s):

Tianjian Lu ◽

Stephan Hoyer ◽

Qing Wang ◽

Lily Hu ◽

Yi-Fan Chen

Keyword(s):

Data Processing ◽

Large Scale ◽

Distributed Data ◽

Distributed Data Processing ◽

Large Scale Simulations

Download Full-text

Predictive Repair and Support of Engineering Systems Based on Distributed Data Processing Model within an IoT Concept

2018 6th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW) ◽

10.1109/w-ficloud.2018.00019 ◽

2018 ◽

Cited By ~ 1

Author(s):

Vasiliy S. Kireev ◽

Stanislav A. Filippov ◽

Anna I. Guseva ◽

Pyotr V. Bochkaryov ◽

Igor A. Kuznetsov ◽

...

Keyword(s):

Data Processing ◽

Distributed Data ◽

Engineering Systems ◽

Distributed Data Processing

Download Full-text

DolphinNext: a distributed data processing platform for high throughput genomics

BMC Genomics ◽

10.1186/s12864-020-6714-x ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 8

Author(s):

Onur Yukselen ◽

Osman Turkyilmaz ◽

Ahmet Rasit Ozturk ◽

Manuel Garber ◽

Alper Kucukural

Keyword(s):

Data Processing ◽

High Throughput ◽

Distributed Data ◽

Distributed Data Processing ◽

Processing Platform

Download Full-text

Distributed data processing and analysis environment for neutron scattering experiments at CSNS

Nuclear Instruments and Methods in Physics Research Section A Accelerators Spectrometers Detectors and Associated Equipment ◽

10.1016/j.nima.2016.07.043 ◽

2016 ◽

Vol 834 ◽

pp. 24-29 ◽

Cited By ~ 6

Author(s):

H.L. Tian ◽

J.R. Zhang ◽

L.L. Yan ◽

M. Tang ◽

L. Hu ◽

...

Keyword(s):

Data Processing ◽

Neutron Scattering ◽

Distributed Data ◽

Distributed Data Processing ◽

Analysis Environment

Download Full-text

Was ist Computerleistung am Arbeitsplatz?

it - Information Technology ◽

10.1515/itit-1979-0207 ◽

1979 ◽

Vol 21 (2) ◽

Author(s):

L. J. Heinrich

Keyword(s):

Data Processing ◽

Distributed Data ◽

Distributed Data Processing

Der Beitrag erläutert das subjektive Verständnis des Begriffes ,,Computerleistung am Arbeitsplatz" als Schlagwort für eine progressive Gestaltungsphilosophie computergestützter Informationssysteme. Sie impliziert sowohl die Anwendung moderner Hard- und Softwaretechnologien, wie sie für die 80er Jahre bestimmend sein werden, als auch die in den Vordergrund rückende Berücksichtigung der sowohl von der Arbeitsaufgabe bestimmten als auch der subjektiven Benutzerbedürfnisse. Sie verbindet damit ,, Distributed Data Processing" als ein technologisches Konzept mit ..Benutzerorientierung". Die Gestaltungsbereiche der Benutzerorientierung - Arbeitsmittel und Arbeitsumwelt, Mensch- Computer-Interaktionsschnittstelle sowie die Arbeitsorganisation - werden erläutert. Gestaltungsmaßnahmen werden beispielhaft angegeben, und es wird auf die weiterführende Literatur verwiesen; dabei steht das im Oldenbourg- Verlag erschienene Buch ,,Computerleistung am Arbeitsplatz - benutzerorientiertes Distributed Data Processing" im Vordergrund.

Download Full-text

Design and Implementation of Internet Ticketing System Based on Distributed Data Processing Platform

10.1109/icmsse53595.2021.00067 ◽

2021 ◽

Author(s):

Xiaomei Pei ◽

Hailin Tang

Keyword(s):

Data Processing ◽

Distributed Data ◽

Design And Implementation ◽

Distributed Data Processing ◽

Processing Platform

Download Full-text