Network flow based resource brokering and optimization techniques for distributed data streaming over optical networks

This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.

Download Full-text

Improvised Distributed Data Streaming Scheduler in Storm

Lecture Notes in Electrical Engineering - International Conference on Communication, Computing and Electronics Systems ◽

10.1007/978-981-33-4909-4_42 ◽

2021 ◽

pp. 557-568

Author(s):

J. Geetha ◽

D. S. Jayalakshmi ◽

Riya R. Ganiga ◽

Shaguftha Zuveria Kottur ◽

Tallapalli Surabhi

Keyword(s):

Distributed Data ◽

Data Streaming

Download Full-text

Survivable spectrum-shared ability in flexible bandwidth optical networks with distributed data centers

Photonic Network Communications ◽

10.1007/s11107-016-0642-3 ◽

2016 ◽

Vol 33 (2) ◽

pp. 102-111 ◽

Cited By ~ 3

Author(s):

Bowen Chen ◽

Yongli Zhao ◽

Jie Zhang

Keyword(s):

Optical Networks ◽

Data Centers ◽

Distributed Data

Download Full-text

Exact and Heuristic Data Workflow Placement Algorithms for Big Data Computing in Cloud Datacenters

Scalable Computing Practice and Experience ◽

10.12694/scpe.v19i3.1365 ◽

2018 ◽

Vol 19 (3) ◽

pp. 223-244

Author(s):

Sonia Ikken ◽

Eric Renault ◽

Abdelkamel Tari ◽

Tahar Kechadi

Keyword(s):

Big Data ◽

Network Flow ◽

Data Exchange ◽

Data Placement ◽

Data Driven ◽

Greedy Heuristic ◽

Cost Ratio ◽

Distributed Data ◽

Placement Problem ◽

Intermediate Data

Several big data-driven applications are currently carried out in collaboration using distributed infrastructure. These data-driven applications usually deal with experiments at massive scale. Data generated by such experiments are huge and stored at multiple geographic locations for reuse. Workflow systems, composed of jobs using collaborative task-based models, present new dependency and data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of applications is on time, and resource usage-cost-efficient. In this paper, we present an efficient data placement approach to improve the performance of workflow processing in distributed data centres. The proposed approach involves two types of data: splittable and unsplittable intermediate data. Moreover, we place intermediate data by considering not only their source location but also their dependencies. The main objective is to minimise the total storage cost, including the effort for transferring, storing, and moving that data according to the applications needs. We first propose an exact algorithm which takes into account the intra-job dependencies, and we show that the optimal fractional intermediate data placement problem is NP-hard. To solve the problem of unsplittable intermediate data placement, we propose a greedy heuristic algorithm based on a network flow optimisation framework. The experimental results show that the performance of our approach is very promising. We also show that even with divergent conditions, the cost ratio of the heuristic approach is close to the optimal solution.

Download Full-text

Optimization Techniques for Incremental Planning of Multilayer Elastic Optical Networks

Journal of Optical Communications and Networking ◽

10.1364/jocn.10.000183 ◽

2018 ◽

Vol 10 (3) ◽

pp. 183 ◽

Cited By ~ 4

Author(s):

P. Papanikolaou ◽

K. Christodoulopoulos ◽

E. Varvarigos

Keyword(s):

Optical Networks ◽

Optimization Techniques ◽

Incremental Planning

Download Full-text

Data Stream of Wireless Sensor Networks Based on Deep Learning

International Journal of Online Engineering (iJOE) ◽

10.3991/ijoe.v12i11.6232 ◽

2016 ◽

Vol 12 (11) ◽

pp. 22

Author(s):

Yue-jie Li

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Management System ◽

Data Stream ◽

Sensor Data ◽

Wireless Sensor ◽

Distributed Data ◽

Data Streaming ◽

Control Interface ◽

Continuous Query Processing

The sensor data in wireless sensor networks are continuously arriving in multiple, rapid, time varying, possibly unpredictable, unbounded streams, and no record of historical information is kept. These limitations make conventional Database Management Systems and their evolution unsuitable for streams. Thereby there is a need to build a complete Data Streaming Management System (DSMS), which could process streams and perform dynamic continuous query processing. In this paper, a framework for Adaptive Distributed Data Streaming Management System (ADDSMS) is presented, which operates as streams control interface between arrays of distributed data stream sources and end-user clients who access and analyze these streams. Simulation results show that the proposed method can thus improve overall system performance substantially.

Download Full-text

Spectrum-shared ability in survivable flexible bandwidth optical networks with distributed data centers interconnect

2014 13th International Conference on Optical Communications and Networks (ICOCN) ◽

10.1109/icocn.2014.6987126 ◽

2014 ◽

Author(s):

Bowen Chen ◽

Jie Zhang ◽

Yongli Zhao ◽

Guangjun Luo

Keyword(s):

Optical Networks ◽

Data Centers ◽

Distributed Data

Download Full-text

A distributed data streaming algorithm for network-wide traffic anomaly detection

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/1639562.1639596 ◽

2009 ◽

Vol 37 (2) ◽

pp. 81-82 ◽

Cited By ~ 11

Author(s):

Yang Liu ◽

Linfeng Zhang ◽

Yong Guan

Keyword(s):

Anomaly Detection ◽

Distributed Data ◽

Data Streaming ◽

Streaming Algorithm ◽

Traffic Anomaly ◽

Traffic Anomaly Detection

Download Full-text

Data Base Management Systems Query Optimization Techniques for Distributed Database Systems

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38654 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1535-1539

Author(s):

Yashvi Barot

Keyword(s):

Database Systems ◽

Distributed Database ◽

Optimization Techniques ◽

Management Systems ◽

Distributed Data ◽

Data Set ◽

Distributed Information ◽

Fundamental Goal ◽

Relation Model ◽

Base Management

Abstract: The fundamental goal of this postulation is to introduce various models for single also as numerous inquiry handling in the Distributed data set framework which brings about less question handling cost. One of the significant issues in the plan and execution of Distributed Information Base Management Systems (DDBMS) is productive inquiry handling. The objective of dispersed inquiry improvement decreases to minimization of measure of information to be communicated among destinations for handling a given inquiry. The issue of question handling in DDBS (1 1) has been concentrated broadly in writing. In the greater part of calculations, the capability of the question will contain a grouping of tasks. In such cases, while executing tasks from right to left, as per the request for tasks in arrangement, the aftereffect of an activity might be an operand to the next activity. Since the tasks are subject to each other, at a moment in particular one activity at one site will be executed despite the fact that the climate is dispersed. Then frameworks at any remaining locales will be inactive for this inquiry. Another model, Totally Reducible Relation Model (CRK Medel), which permits parallelism and processes numerous tasks all the while at all important locales is introduced. It is expected that the tasks are in the type of conjunctions. So every activity can be handled freely. In this model at some moment, relations at every single significant site will be totally diminished by relating sets of every appropriate activity (Determinations, Semijoins and Joins) all the while. Thus, every connection will be checked just a single time to deal with all appropriate tasks by decreasing VO cost.

Download Full-text

Network flow based resource brokering and optimization techniques for distributed data streaming over optical networks

Traffic Grooming Approaches in Flexible Bandwidth Optical Networks with Distributed Data Centers

dispel4py: A Python framework for data-intensive scientific computing

Improvised Distributed Data Streaming Scheduler in Storm

Survivable spectrum-shared ability in flexible bandwidth optical networks with distributed data centers

Exact and Heuristic Data Workflow Placement Algorithms for Big Data Computing in Cloud Datacenters

Optimization Techniques for Incremental Planning of Multilayer Elastic Optical Networks

Data Stream of Wireless Sensor Networks Based on Deep Learning

Spectrum-shared ability in survivable flexible bandwidth optical networks with distributed data centers interconnect

A distributed data streaming algorithm for network-wide traffic anomaly detection

Data Base Management Systems Query Optimization Techniques for Distributed Database Systems

Export Citation Format