An algebra for distributed Big Data analytics

Journal of Functional Programming ◽

10.1017/s0956796817000193 ◽

2017 ◽

Vol 27 ◽

Cited By ~ 5

Author(s):

LEONIDAS FEGARAS

Keyword(s):

Large Scale ◽

Big Data Analytics ◽

Distributed Data ◽

Data Intensive ◽

Domain Specific ◽

Small Set ◽

Formal Basis ◽

Nested Queries ◽

Distributed Data Analysis ◽

Query Processing And Optimization

AbstractWe present an algebra for data-intensive scalable computing based on monoid homomorphisms that consists of a small set of operations that capture most features supported by current domain-specific languages for data-centric distributed computing. This algebra is being used as the formal basis of MRQL, which is a query processing and optimization system for large-scale distributed data analysis. The MRQL semantics is given in terms of monoid comprehensions, which support group-by and order-by syntax and can work on heterogeneous collections without requiring any extension to the monoid algebra. We present the syntax and semantics of monoid comprehensions and provide rules to translate them to the monoid algebra. We give evidence of the effectiveness of our algebra by presenting some important optimization rules, such as converting nested queries to joins.

Download Full-text

Nakamoto Consensus to Accelerate Supervised Classification Algorithms for Multiparty Computing

Security and Communication Networks ◽

10.1155/2021/6629433 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Zhen Zhang ◽

Bing Guo ◽

Yan Shen ◽

Chengjie Li ◽

Xinhua Suo ◽

...

Keyword(s):

Distributed Computing ◽

Supervised Classification ◽

Large Scale ◽

Heterogeneous Data ◽

Classification Algorithms ◽

Distributed Data ◽

Data Intensive ◽

Private Data ◽

Cooperation Mechanism ◽

Mathematical Formulas

Bitcoin mining consumes tremendous amounts of electricity to solve the hash problem. At the same time, large-scale applications of artificial intelligence (AI) require efficient and secure computing. There are many computing devices in use, and the hardware resources are highly heterogeneous. This means a cooperation mechanism is needed to realize cooperation among computing devices, and a good calculation structure is required in the case of data dispersion. In this paper, we propose an architecture where devices (also called nodes) can reach a consensus on task results using off-chain smart contracts and private data. The proposed distributed computing architecture can accelerate computing-intensive and data-intensive supervised classification algorithms with limited resources. This architecture can significantly increase privacy protection and prevent leakage of distributed data. Our proposed architecture can support heterogeneous data, making computing on each device more efficient. We used mathematical formulas to prove the correctness and robustness of our system and deduced the condition to stop a given task. In the experiments, we transformed Bitcoin hash collision into distributed computing on several nodes and evaluated the training and prediction accuracy for handwritten digit images (MNIST). The experimental results demonstrate the effectiveness of the proposed method.

Download Full-text

Distributed data provenance for large-scale data-intensive computing

2013 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2013.6702685 ◽

2013 ◽

Cited By ~ 28

Author(s):

Dongfang Zhao ◽

Chen Shou ◽

Tanu Maliky ◽

Ioan Raicu

Keyword(s):

Large Scale ◽

Data Provenance ◽

Distributed Data ◽

Data Intensive Computing ◽

Data Intensive ◽

Large Scale Data ◽

Scale Data

Download Full-text

Software architecture for large-scale, distributed, data-intensive systems

Proceedings Fourth Working IEEE/IFIP Conference on Software Architecture (WICSA 2004) WICSA-04 ◽

10.1109/wicsa.2004.1310708 ◽

2004 ◽

Cited By ~ 4

Author(s):

C.A. Mattmann ◽

D.J. Crichton ◽

S.J. Hughes ◽

S.C. Kelly ◽

M. Paul

Keyword(s):

Software Architecture ◽

Large Scale ◽

Distributed Data ◽

Data Intensive

Download Full-text

Data and information architectures for large-scale distributed data intensive information systems

Proceedings of 8th International Conference on Scientific and Statistical Data Base Management ◽

10.1109/ssdm.1996.506065 ◽

2002 ◽

Cited By ~ 9

Author(s):

L. Kerschberg ◽

H. Gomaa ◽

D. Menasce ◽

Jong Pil Yoon

Keyword(s):

Information Systems ◽

Large Scale ◽

Distributed Data ◽

Data Intensive

Download Full-text

A performance oriented design methodology for large-scale distributed data intensive information systems

Proceedings of First IEEE International Conference on Engineering of Complex Computer Systems. ICECCS'95 ◽

10.1109/iceccs.1995.479308 ◽

2002 ◽

Cited By ~ 3

Author(s):

D.A. Menasce ◽

H. Gomaa ◽

L. Kerschberg

Keyword(s):

Information Systems ◽

Design Methodology ◽

Large Scale ◽

Distributed Data ◽

Data Intensive ◽

A Performance

Download Full-text

Configurable Distributed Data Management for the Internet of the Things

Information ◽

10.3390/info10120360 ◽

2019 ◽

Vol 10 (12) ◽

pp. 360 ◽

Cited By ~ 2

Author(s):

Nikos Kefalakis ◽

Aikaterini Roukounaki ◽

John Soldatos

Keyword(s):

Data Streams ◽

Data Analytics ◽

Real Life ◽

Big Data Analytics ◽

Use Cases ◽

Distributed Data ◽

Data Streaming ◽

Practical Applications ◽

Data Intensive ◽

Digital World

One of the main challenges in modern Internet of Things (IoT) systems is the efficient collection, routing and management of data streams from heterogeneous sources, including sources with high ingestion rates. Despite the existence of various IoT data streaming frameworks, there is still no easy way for collecting and routing IoT streams in efficient and configurable ways that are easy to be implemented and deployed in realistic environments. In this paper, we introduce a programmable engine for Distributed Data Analytics (DDA), which eases the task of collecting IoT streams from different sources and accordingly, routing them to appropriate consumers. The engine provides also the means for preprocessing and analysis of data streams, which are two of the most important tasks in Big Data analytics applications. At the heart of the engine lies a Domain Specific Language (DSL) that enables the zero-programming definition of data routing and preprocessing tasks. This DSL is outlined in the paper, along with the middleware that supports its runtime execution. As part of the paper, we present the architecture of the engine, as well as the digital models that it uses for modelling data streams in the digital world. We also discuss the validation of the DDA in several data intensive IoT use cases in industrial environments, including use cases in pilot productions lines and in several real-life manufacturing environments. The latter manifest the configurability, programmability and flexibility of the DDA engine, as well as its ability to support practical applications.

Download Full-text

Big Data Analytics in Cloud Platform

Advances in Web Technologies and Engineering - Challenges and Opportunities for the Convergence of IoT, Big Data, and Cloud Computing ◽

10.4018/978-1-7998-3111-2.ch010 ◽

2021 ◽

pp. 159-179

Author(s):

Sathishkumar S. ◽

Devi Priya R. ◽

Karthika K.

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

New Paradigm ◽

Data Types ◽

Data Intensive ◽

Service Oriented ◽

Big Data Computing

Big data computing in clouds is a new paradigm for next-generation analytics development. It enables large-scale data organizations to share and explore large quantities of ever-increasing data types using cloud computing technology as a back-end. Knowledge exploration and decision-making from this rapidly increasing volume of data encourage data organization, access, and timely processing, an evolving trend known as big data computing. This modern paradigm incorporates large-scale computing, new data-intensive techniques, and mathematical models to create data analytics for intrinsic information extraction. Cloud computing emerged as a service-oriented computing model to deliver infrastructure, platform, and applications as services from the providers to the consumers meeting the QoS parameters by enabling the archival and processing of large volumes of rapidly growing data faster economy models.

Download Full-text

The bounds of the distributed data-intensive computing systems

Pollack Periodica ◽

10.1556/pollack.2.2007.s.8 ◽

2007 ◽

Vol 2 (Supplement 1) ◽

pp. 85-96 ◽

Cited By ~ 1

Author(s):

Antal Buza

Keyword(s):

Distributed Data ◽

Data Intensive Computing ◽

Computing Systems ◽

Data Intensive

Download Full-text

Distributed Simulation Platforms and Data Passing Tools for Natural Hazards Engineering: Reviews, Limitations, and Recommendations

International Journal of Disaster Risk Science ◽

10.1007/s13753-021-00361-7 ◽

2021 ◽

Author(s):

Lichao Xu ◽

Szu-Yun Lin ◽

Andrew W. Hlynka ◽

Hao Lu ◽

Vineet R. Kamat ◽

...

Keyword(s):

Natural Hazards ◽

Data Exchange ◽

Distributed Simulation ◽

Distributed Data ◽

Interactive Simulation ◽

Integrated Simulation ◽

Domain Specific ◽

Advantages And Disadvantages ◽

High Level ◽

Data Passing

AbstractThere has been a strong need for simulation environments that are capable of modeling deep interdependencies between complex systems encountered during natural hazards, such as the interactions and coupled effects between civil infrastructure systems response, human behavior, and social policies, for improved community resilience. Coupling such complex components with an integrated simulation requires continuous data exchange between different simulators simulating separate models during the entire simulation process. This can be implemented by means of distributed simulation platforms or data passing tools. In order to provide a systematic reference for simulation tool choice and facilitating the development of compatible distributed simulators for deep interdependent study in the context of natural hazards, this article focuses on generic tools suitable for integration of simulators from different fields but not the platforms that are mainly used in some specific fields. With this aim, the article provides a comprehensive review of the most commonly used generic distributed simulation platforms (Distributed Interactive Simulation (DIS), High Level Architecture (HLA), Test and Training Enabling Architecture (TENA), and Distributed Data Services (DDS)) and data passing tools (Robot Operation System (ROS) and Lightweight Communication and Marshalling (LCM)) and compares their advantages and disadvantages. Three specific limitations in existing platforms are identified from the perspective of natural hazard simulation. For mitigating the identified limitations, two platform design recommendations are provided, namely message exchange wrappers and hybrid communication, to help improve data passing capabilities in existing solutions and provide some guidance for the design of a new domain-specific distributed simulation framework.

Download Full-text

Priority-based Selection of Individuals in Memetic Algorithms for Distributed Data-intensive Web Service compositions

IEEE Transactions on Services Computing ◽

10.1109/tsc.2021.3066322 ◽

2021 ◽

pp. 1-1

Author(s):

Soheila Sadeghiram ◽

Hui Ma ◽

Gang Chen

Keyword(s):

Web Service ◽

Memetic Algorithms ◽

Distributed Data ◽

Data Intensive ◽

Selection Of

Download Full-text