Online Scheduling Algorithm for Heterogeneous Distributed Machine Learning Jobs

Rachid Guerraoui was the rst keynote speaker, and he got things o to a great start by discussing the broad relevance of the research done in our community relative to both industry and academia. He rst argued that, in some sense, the fact that distributed computing is so pervasive nowadays could end up sti ing progress in our community by inducing people to work on marginal problems, and becoming isolated. His rst suggestion was to try to understand and incorporate new ideas coming from applied elds into our research, and argued that this has been historically very successful. He illustrated this point via the distributed payment problem, which appears in the context of blockchains, in particular Bitcoin, but then turned out to be very theoretically interesting; furthermore, the theoretical understanding of the problem inspired new practical protocols. He then went further to discuss new directions in distributed computing, such as the COVID tracing problem, and new challenges in Byzantine-resilient distributed machine learning. Another source of innovation Rachid suggested was hardware innovations, which he illustrated with work studying the impact of RDMA-based primitives on fundamental problems in distributed computing. The talk concluded with a very lively discussion.

Download Full-text

MODES: model-based optimization on distributed embedded systems

Machine Learning ◽

10.1007/s10994-021-06014-6 ◽

2021 ◽

Author(s):

Junjie Shi ◽

Jiang Bian ◽

Jakob Richter ◽

Kuan-Hsun Chen ◽

Jörg Rahnenführer ◽

...

Keyword(s):

Machine Learning ◽

Embedded Systems ◽

Learning Model ◽

Black Box ◽

Distributed Embedded Systems ◽

Data Set ◽

Individual Model ◽

Model Based ◽

Machine Learning Model ◽

Distributed Machine Learning

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.

Download Full-text

Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication

IEEE Transactions on Network Science and Engineering ◽

10.1109/tnse.2021.3073897 ◽

2021 ◽

pp. 1-1

Author(s):

Yubin Duan ◽

Ning Wang ◽

Jie Wu

Keyword(s):

Machine Learning ◽

Data Communication ◽

Training Time ◽

Distributed Machine Learning

Download Full-text

SNAP: A Communication Efficient Distributed Machine Learning Framework for Edge Computing

2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) ◽

10.1109/icdcs47774.2020.00072 ◽

2020 ◽

Author(s):

Yangming Zhao ◽

Jingyuan Fan ◽

Lu Su ◽

Tongyu Song ◽

Sheng Wang ◽

...

Keyword(s):

Machine Learning ◽

Edge Computing ◽

Learning Framework ◽

Distributed Machine Learning

Download Full-text

A Scalable Smartwatch-Based Medication Intake Detection System Using Distributed Machine Learning

Journal of Medical Systems ◽

10.1007/s10916-019-1518-8 ◽

2020 ◽

Vol 44 (4) ◽

Cited By ~ 6

Author(s):

Donya Fozoonmayeh ◽

Hai Vu Le ◽

Ekaterina Wittfoth ◽

Chong Geng ◽

Natalie Ha ◽

...

Keyword(s):

Machine Learning ◽

Detection System ◽

Medication Intake ◽

Distributed Machine Learning

Download Full-text

Insider Collusion Attack on Distributed Machine Learning System and its Solutions - A Case of SVM

Proceedings of the 7th ACM Workshop on ASIA Public-Key Cryptography ◽

10.1145/3384940.3390638 ◽

2020 ◽

Author(s):

Peter Shaojui Wang

Keyword(s):

Machine Learning ◽

Learning System ◽

Collusion Attack ◽

Distributed Machine Learning

Download Full-text

A simulation-driven online scheduling algorithm for the maintenance and operation of wind farm systems

SIMULATION ◽

10.1177/00375497211028605 ◽

2021 ◽

pp. 003754972110286

Author(s):

Eduardo Pérez

Keyword(s):

Wind Turbines ◽

Wind Farm ◽

Computational Study ◽

Scheduling Algorithm ◽

Discrete Event ◽

Wind Farms ◽

Online Scheduling ◽

Maintenance Scheduling ◽

Lead Times ◽

System Specification

Wind turbines experience stochastic loading due to seasonal variations in wind speed and direction. These harsh operational conditions lead to failures of wind turbines, which are difficult to predict. Consequently, it is challenging to schedule maintenance actions that will avoid failures. In this article, a simulation-driven online maintenance scheduling algorithm for wind farm operational planning is derived. Online scheduling is a suitable framework for this problem since it integrates data that evolve over time into the maintenance scheduling decisions. The computational study presented in this article compares the performance of the simulation-driven online scheduling algorithm against two benchmark algorithms commonly used in practice: scheduled maintenance and condition-based monitoring maintenance. An existing discrete event system specification simulation model was used to test and study the benefits of the proposed algorithm. The computational study demonstrates the importance of avoiding over-simplistic assumptions when making maintenance decisions for wind farms. For instance, most literature assumes maintenance lead times are constant. The computational results show that allowing lead times to be adjusted in an online fashion improves the performance of wind farm operations in terms of the number of turbine failures, availability capacity, and power generation.

Download Full-text

Online Scheduling Algorithm for Heterogeneous Distributed Machine Learning Jobs

Toward Efficient Online Scheduling for Distributed Machine Learning Systems

Online scheduling of heterogeneous distributed machine learning jobs

Joint Data Collection and Resource Allocation for Distributed Machine Learning at the Edge

PODC 2020 Review

MODES: model-based optimization on distributed embedded systems

Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication

SNAP: A Communication Efficient Distributed Machine Learning Framework for Edge Computing

A Scalable Smartwatch-Based Medication Intake Detection System Using Distributed Machine Learning

Insider Collusion Attack on Distributed Machine Learning System and its Solutions - A Case of SVM

A simulation-driven online scheduling algorithm for the maintenance and operation of wind farm systems

Export Citation Format