A Multi-Agent Reinforcement Learning Framework for Lithium-ion Battery Scheduling Problems

This paper presents a reinforcement learning framework for solving battery scheduling problems in order to extend the lifetime of batteries used in electrical vehicles (EVs), cellular phones, and embedded systems. Battery pack lifetime has often been the limiting factor in many of today’s smart systems, from mobile devices and wireless sensor networks to EVs. Smart charge-discharge scheduling of battery packs is essential to obtain super linear gain of overall system lifetime, due to the recovery effect and nonlinearity in the battery characteristics. Additionally, smart scheduling has also been shown to be beneficial for optimizing the system’s thermal profile and minimizing chances of irreversible battery damage. The recent rapidly-growing community and development infrastructure have added deep reinforcement learning (DRL) to the available tools for designing battery management systems. Through leveraging the representation powers of deep neural networks and the flexibility and versatility of reinforcement learning, DRL offers a powerful solution to both roofline analysis and real-world deployment on complicated use cases. This work presents a DRL-based battery scheduling framework to solve battery scheduling problems, with high flexibility to fit various battery models and application scenarios. Through the discussion of this framework, comparisons have also been made between conventional heuristics-based methods and DRL. The experiments demonstrate that DRL-based scheduling framework achieves battery lifetime comparable to the best weighted-k round-robin (kRR) heuristic scheduling algorithm. In the meantime, the framework offers much greater flexibility in accommodating a wide range of battery models and use cases, including thermal control and imbalanced battery.

Download Full-text

A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

Computational Intelligence and Neuroscience ◽

10.1155/2011/869348 ◽

2011 ◽

Vol 2011 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Karim El-Laithy ◽

Martin Bogdan

Keyword(s):

Reinforcement Learning ◽

Spike Timing ◽

Neural Representation ◽

Model Parameters ◽

Learning Framework ◽

Reference Target ◽

Wide Range ◽

Spiking Network ◽

Dynamic Synapses ◽

Exclusive Or

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

Download Full-text

Predicting Human Mobility with Reinforcement-Learning-Based Long-Term Periodicity Modeling

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3469860 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-23

Author(s):

Shuo Tao ◽

Jingang Jiang ◽

Defu Lian ◽

Kai Zheng ◽

Enhong Chen

Keyword(s):

Reinforcement Learning ◽

Human Mobility ◽

Recurrent Network ◽

Mobility Prediction ◽

Learning Framework ◽

Temporal Features ◽

Wide Range ◽

Spatio Temporal ◽

Historical Trajectory

Mobility prediction plays an important role in a wide range of location-based applications and services. However, there are three problems in the existing literature: (1) explicit high-order interactions of spatio-temporal features are not systemically modeled; (2) most existing algorithms place attention mechanisms on top of recurrent network, so they can not allow for full parallelism and are inferior to self-attention for capturing long-range dependence; (3) most literature does not make good use of long-term historical information and do not effectively model the long-term periodicity of users. To this end, we propose MoveNet and RLMoveNet. MoveNet is a self-attention-based sequential model, predicting each user’s next destination based on her most recent visits and historical trajectory. MoveNet first introduces a cross-based learning framework for modeling feature interactions. With self-attention on both the most recent visits and historical trajectory, MoveNet can use an attention mechanism to capture the user’s long-term regularity in a more efficient way. Based on MoveNet, to model long-term periodicity more effectively, we add the reinforcement learning layer and named RLMoveNet. RLMoveNet regards the human mobility prediction as a reinforcement learning problem, using the reinforcement learning layer as the regularization part to drive the model to pay attention to the behavior with periodic actions, which can help us make the algorithm more effective. We evaluate both of them with three real-world mobility datasets. MoveNet outperforms the state-of-the-art mobility predictor by around 10% in terms of accuracy, and simultaneously achieves faster convergence and over 4x training speedup. Moreover, RLMoveNet achieves higher prediction accuracy than MoveNet, which proves that modeling periodicity explicitly from the perspective of reinforcement learning is more effective.

Download Full-text

IEEE 802.11-Enabled Wake-Up Radio: Use Cases and Applications

Sensors ◽

10.3390/s20010066 ◽

2019 ◽

Vol 20 (1) ◽

pp. 66 ◽

Cited By ~ 1

Author(s):

Elena Lopez-Aguilera ◽

Ilker Demirkol ◽

Eduard Garcia-Villegas ◽

Josep Paradells

Keyword(s):

Low Power ◽

Ieee 802.11 ◽

Energy Savings ◽

Operation Mode ◽

Data Communication ◽

Use Cases ◽

High Rate ◽

Handheld Devices ◽

Battery Lifetime ◽

Wide Range

IEEE 802.11 is one of the most commonly used radio access technologies, being present in almost all handheld devices with networking capabilities. However, its energy-hungry communication modes are a challenge for the increased battery lifetime of such devices and are an obstacle for its use in battery-constrained devices such as the ones defined by many Internet of Things applications. Wake-up Radio (WuR) systems have appeared as a solution for increasing the energy efficiency of communication technologies by employing a secondary low-power radio interface, which is always in the active state and switches the primary transceiver (used for main data communication) from the energy-saving to the active operation mode. The high market penetration of IEEE 802.11 technology, together with the benefits that WuR systems can bring to this widespread technology, motivates this article’s focus on IEEE 802.11-based WuR solutions. More specifically, we elaborate on the feasibility of such IEEE 802.11-based WuR solutions, and introduce the latest standardization efforts in this IEEE 802.11-based WuR domain, IEEE 802.11ba, which is a forthcoming IEEE 802.11 amendment, discussing its main features and potential use cases. As a use case consisting of green Wi-Fi application, we provide a proof-of-concept smart plug system implemented by a WuR that is activated remotely using IEEE 802.11 devices, evaluate its monetary and energy savings, and compare it with commercially available smart plug solutions. Finally, we discuss novel applications beyond the wake-up functionality that IEEE 802.11-enabled WuR devices can offer using a secondary radio, as well as applications that have not yet been considered by IEEE 802.11ba. As a result, we argue that the IEEE 802.11-based WuR solution will support a wide range of devices and deployments, for both low-rate and low-power communications, as well as high-rate transmissions.

Download Full-text

Shaking Lithium‐Ion Cells on a Rocker – Temperature‐Dependent Influence of Mechanical Movement on Lithium‐Ion Battery Lifetime

Chemie Ingenieur Technik ◽

10.1002/cite.201800218 ◽

2021 ◽

Author(s):

Elisabeth Maria Boerger ◽

Felix Gottschalk ◽

Laura Drescher ◽

Alexander Boerger

Keyword(s):

Lithium Ion Battery ◽

Lithium Ion ◽

Temperature Dependent ◽

Battery Lifetime ◽

Lithium Ion Cells ◽

Mechanical Movement

Download Full-text

Hierarchical Reinforcement Learning Framework for Secure UAV Communication in the Presence of Multiple UAV Adaptive Eavesdroppers

2020 IEEE 6th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc51575.2020.9344970 ◽

2020 ◽

Author(s):

Liu Jue ◽

Yang Weiwei

Keyword(s):

Reinforcement Learning ◽

Hierarchical Reinforcement Learning ◽

Learning Framework

Download Full-text

A multi-shop integrated scheduling algorithm with fixed output constraint

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189721 ◽

2021 ◽

pp. 1-9

Author(s):

Yingchun Xia ◽

Zhiqiang Xie ◽

Yu Xin ◽

Xiaowei Zhang

Keyword(s):

Constraint Programming ◽

Programming Model ◽

Scheduling Algorithm ◽

Processing Parameters ◽

Scheduling Problem ◽

Integrated Scheduling ◽

Scheduling Scheme ◽

Wide Range ◽

Constraint Programming Model ◽

Output Constraint

The customized products such as electromechanical prototype products are a type of product with research and trial manufacturing characteristics. The BOM structures and processing parameters of the products vary greatly, making it difficult for a single shop to meet such a wide range of processing parameters. For the dynamic and fuzzy manufacturing characteristics of the products, not only the coordinated transport time of multiple shops but also the fact that the product has a designated output shop should be considered. In order to solve such Multi-shop Integrated Scheduling Problem with Fixed Output Constraint (MISP-FOC), a constraint programming model is developed to minimize the total tardiness, and then a Multi-shop Integrated Scheduling Algorithm (MISA) based on EGA (Enhanced Genetic Algorithm) and B&B (Branch and Bound) is proposed. MISA is a hybrid optimization method and consists of four parts. Firstly, to deal with the dynamic and fuzzy manufacturing characteristics, the dynamic production process is transformed into a series of time-continuous static scheduling problem according to the proposed dynamic rescheduling mechanism. Secondly, the pre-scheduling scheme is generated by the EGA at each event moment. Thirdly, the jobs in the pre-scheduling scheme are divided into three parts, namely, dispatched jobs, jobs to be dispatched, and jobs available for rescheduling, and at last, the B&B method is used to optimize the jobs available for rescheduling by utilizing the period when the dispatched jobs are in execution. Google OR-Tools is used to verify the proposed constraint programming model, and the experiment results show that the proposed algorithm is effective and feasible.

Download Full-text

Goal-driven active learning

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09527-5 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Process ◽

Real World ◽

Imitation Learning ◽

Learning Approaches ◽

Wide Range ◽

Fixed Set ◽

Complex Decision Making ◽

Complex Decision

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Download Full-text

Adaptive Reinforcement Learning Framework for NOMA-UAV Networks

IEEE Communications Letters ◽

10.1109/lcomm.2021.3093385 ◽

2021 ◽

pp. 1-1

Author(s):

Syed Khurram Mahmud ◽

Yuanwei Liu ◽

Yue Chen ◽

Kok Keong Chai

Keyword(s):

Reinforcement Learning ◽

Learning Framework

Download Full-text

A Reinforcement Learning Iterated Local Search for Makespan Minimization in Additive Manufacturing Machine Scheduling Problems

Computers & Operations Research ◽

10.1016/j.cor.2021.105272 ◽

2021 ◽

pp. 105272

Author(s):

Mirko Alicastro ◽

Daniele Ferone ◽

Paola Festa ◽

Serena Fugaro ◽

Tommaso Pastore

Keyword(s):

Reinforcement Learning ◽

Additive Manufacturing ◽

Local Search ◽

Machine Scheduling ◽

Iterated Local Search ◽

Scheduling Problems ◽

Makespan Minimization ◽

Machine Scheduling Problems ◽

Manufacturing Machine

Download Full-text

Prototype Design and Experimental Evaluation of Autonomous Collaborative Communication System for Emerging Maritime Use Cases

Sensors ◽

10.3390/s21113871 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3871

Author(s):

Jiri Pokorny ◽

Khanh Ma ◽

Salwa Saafi ◽

Jakub Frolka ◽

Jose Villa ◽

...

Keyword(s):

Communication System ◽

Autonomous Vehicles ◽

High Speed ◽

High Capacity ◽

Workplace Safety ◽

Unmanned Vehicles ◽

Use Cases ◽

Automated Systems ◽

Collaborative Communication ◽

Wide Range

Automated systems have been seamlessly integrated into several industries as part of their industrial automation processes. Employing automated systems, such as autonomous vehicles, allows industries to increase productivity, benefit from a wide range of technologies and capabilities, and improve workplace safety. So far, most of the existing systems consider utilizing one type of autonomous vehicle. In this work, we propose a collaboration of different types of unmanned vehicles in maritime offshore scenarios. Providing high capacity, extended coverage, and better quality of services, autonomous collaborative systems can enable emerging maritime use cases, such as remote monitoring and navigation assistance. Motivated by these potential benefits, we propose the deployment of an Unmanned Surface Vehicle (USV) and an Unmanned Aerial Vehicle (UAV) in an autonomous collaborative communication system. Specifically, we design high-speed, directional communication links between a terrestrial control station and the two unmanned vehicles. Using measurement and simulation results, we evaluate the performance of the designed links in different communication scenarios and we show the benefits of employing multiple autonomous vehicles in the proposed communication system.

Download Full-text