distributed analytics Latest Research Papers

AbstractTemporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, a graph dataflow system for scalable, distributed analytics of temporal property graphs which has been continuously developed since 2005. Its graph model TPGM allows bitemporal modeling not only of vertices and edges but also of graph collections. A declarative analytical language called GrALa allows analysts to flexibly define analytical graph workflows by composing different operators that support temporal graph analysis. Built on a distributed dataflow system, large temporal graphs can be processed on a shared-nothing cluster. We present the system architecture of Gradoop, its data model TPGM with composable temporal graph operators, like snapshot, difference, pattern matching, graph grouping and several implementation details. We evaluate the performance and scalability of selected operators and a composed workflow for synthetic and real-world temporal graphs with up to 283 M vertices and 1.8 B edges, and a graph lifetime of about 8 years with up to 20 M new edges per year. We also reflect on lessons learned from the Gradoop effort.

Download Full-text

DAMS: A Distributed Analytics Metadata Schema

Data Intelligence ◽

10.1162/dint_a_00100 ◽

2021 ◽

pp. 1-17

Author(s):

Sascha Welten ◽

Laurenz Neumann ◽

Yeliz Ucer Yediel ◽

Luiz Olavo Bonino da Silva Santos ◽

Stefan Decker ◽

...

Keyword(s):

Complex Analysis ◽

Expert Knowledge ◽

Data Access ◽

Semantic Content ◽

Scientific Data ◽

Sensitive Data ◽

Metadata Schema ◽

Distributed Components ◽

Fair Principles ◽

Distributed Analytics

Abstract In recent years, implementations enabling Distributed Analytics (DA) have gained considerable attention due to their ability to perform complex analysis tasks on decentralised data by bringing the analysis to the data. These concepts propose privacy-enhancing alternatives to data centralisation approaches, which have restricted applicability in case of sensitive data due to ethical, legal or social aspects. Nevertheless, the immanent problem of DA-enabling architectures is the black-box-alike behaviour of the highly distributed components originating from the lack of semantically enriched descriptions, particularly the absence of basic metadata for datasets or analysis tasks. To approach the mentioned problems, we propose a metadata schema for DA infrastructures, which provides a vocabulary to enrich the involved entities with descriptive semantics. We initially perform a requirement analysis with domain experts to reveal necessary metadata items, which represents the foundation of our schema. Afterwards, we transform the obtained domain expert knowledge into user stories and derive the most significant semantic content. In the final step, we enable machine-readability via RDF(S) and SHACL serialisations. We deploy our schema in a proof-of-concept monitoring dashboard to validate its contribution to the transparency of DA architectures. Additionally, we evaluate the schema’s compliance with the FAIR principles. The evaluation shows that the schema succeeds in increasing transparency while being compliant with most of the FAIR principles. Because a common metadata model is critical for enhancing the compatibility between multiple DA infrastructures, our work lowers data access and analysis barriers. It represents an initial and infrastructure-independent foundation for the FAIRification of DA and the underlying scientific data management.

Download Full-text

Reinforcement and transfer learning for distributed analytics in fragmented software defined coalitions

Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III ◽

10.1117/12.2587874 ◽

2021 ◽

Author(s):

Kin K. Leung ◽

Ziyao (Spike) Zhang ◽

Anand Mudgerikar ◽

Ankush Singla ◽

Elisa Bertino ◽

...

Keyword(s):

Transfer Learning ◽

Distributed Analytics

Download Full-text

An Architecture for the Development of Distributed Analytics Based on Polystore Events

Heterogeneous Data Management, Polystores, and Analytics for Healthcare - Lecture Notes in Computer Science ◽

10.1007/978-3-030-71055-2_5 ◽

2021 ◽

pp. 54-65

Author(s):

Athanasios Zolotas ◽

Konstantinos Barmpis ◽

Fady Medhat ◽

Patrick Neubauer ◽

Dimitris Kolovos ◽

...

Keyword(s):

Distributed Analytics

Download Full-text

Privacy-Preserving Distributed Analytics in Fog-Enabled IoT Systems

Sensors ◽

10.3390/s20216153 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6153

Author(s):

Liang Zhao

Keyword(s):

Private Information ◽

Data Analytics ◽

Seismic Imaging ◽

Security Analysis ◽

Optimal Solution ◽

Privacy Preserving ◽

Raw Data ◽

Data Movement ◽

Secure Protocol ◽

Distributed Analytics

The Internet of Things (IoT) has evolved significantly with advances in gathering data that can be extracted to provide knowledge and facilitate decision-making processes. Currently, IoT data analytics encountered challenges such as growing data volumes collected by IoT devices and fast response requirements for time-sensitive applications in which traditional Cloud-based solution is unable to meet due to bandwidth and high latency limitations. In this paper, we develop a distributed analytics framework for fog-enabled IoT systems aiming to avoid raw data movement and reduce latency. The distributed framework leverages the computational capacities of all the participants such as edge devices and fog nodes and allows them to obtain the global optimal solution locally. To further enhance the privacy of data holders in the system, a privacy-preserving protocol is proposed using cryptographic schemes. Security analysis was conducted and it verified that exact private information about any edge device’s raw data would not be inferred by an honest-but-curious neighbor in the proposed secure protocol. In addition, the accuracy of solution is unaffected in the secure protocol comparing to the proposed distributed algorithm without encryption. We further conducted experiments on three case studies: seismic imaging, diabetes progression prediction, and Enron email classification. On seismic imaging problem, the proposed algorithm can be up to one order of magnitude faster than the benchmarks in reaching the optimal solution. The evaluation results validate the effectiveness of the proposed methodology and demonstrate its potential to be a promising solution for data analytics in fog-enabled IoT systems.

Download Full-text