scholarly journals A Multi-Agent Reinforcement Learning Framework for Lithium-ion Battery Scheduling Problems

Energies ◽  
2020 ◽  
Vol 13 (8) ◽  
pp. 1982
Author(s):  
Yu Sui ◽  
Shiming Song

This paper presents a reinforcement learning framework for solving battery scheduling problems in order to extend the lifetime of batteries used in electrical vehicles (EVs), cellular phones, and embedded systems. Battery pack lifetime has often been the limiting factor in many of today’s smart systems, from mobile devices and wireless sensor networks to EVs. Smart charge-discharge scheduling of battery packs is essential to obtain super linear gain of overall system lifetime, due to the recovery effect and nonlinearity in the battery characteristics. Additionally, smart scheduling has also been shown to be beneficial for optimizing the system’s thermal profile and minimizing chances of irreversible battery damage. The recent rapidly-growing community and development infrastructure have added deep reinforcement learning (DRL) to the available tools for designing battery management systems. Through leveraging the representation powers of deep neural networks and the flexibility and versatility of reinforcement learning, DRL offers a powerful solution to both roofline analysis and real-world deployment on complicated use cases. This work presents a DRL-based battery scheduling framework to solve battery scheduling problems, with high flexibility to fit various battery models and application scenarios. Through the discussion of this framework, comparisons have also been made between conventional heuristics-based methods and DRL. The experiments demonstrate that DRL-based scheduling framework achieves battery lifetime comparable to the best weighted-k round-robin (kRR) heuristic scheduling algorithm. In the meantime, the framework offers much greater flexibility in accommodating a wide range of battery models and use cases, including thermal control and imbalanced battery.

2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Karim El-Laithy ◽  
Martin Bogdan

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.


2021 ◽  
Vol 12 (6) ◽  
pp. 1-23
Author(s):  
Shuo Tao ◽  
Jingang Jiang ◽  
Defu Lian ◽  
Kai Zheng ◽  
Enhong Chen

Mobility prediction plays an important role in a wide range of location-based applications and services. However, there are three problems in the existing literature: (1) explicit high-order interactions of spatio-temporal features are not systemically modeled; (2) most existing algorithms place attention mechanisms on top of recurrent network, so they can not allow for full parallelism and are inferior to self-attention for capturing long-range dependence; (3) most literature does not make good use of long-term historical information and do not effectively model the long-term periodicity of users. To this end, we propose MoveNet and RLMoveNet. MoveNet is a self-attention-based sequential model, predicting each user’s next destination based on her most recent visits and historical trajectory. MoveNet first introduces a cross-based learning framework for modeling feature interactions. With self-attention on both the most recent visits and historical trajectory, MoveNet can use an attention mechanism to capture the user’s long-term regularity in a more efficient way. Based on MoveNet, to model long-term periodicity more effectively, we add the reinforcement learning layer and named RLMoveNet. RLMoveNet regards the human mobility prediction as a reinforcement learning problem, using the reinforcement learning layer as the regularization part to drive the model to pay attention to the behavior with periodic actions, which can help us make the algorithm more effective. We evaluate both of them with three real-world mobility datasets. MoveNet outperforms the state-of-the-art mobility predictor by around 10% in terms of accuracy, and simultaneously achieves faster convergence and over 4x training speedup. Moreover, RLMoveNet achieves higher prediction accuracy than MoveNet, which proves that modeling periodicity explicitly from the perspective of reinforcement learning is more effective.


Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 66 ◽  
Author(s):  
Elena Lopez-Aguilera ◽  
Ilker Demirkol ◽  
Eduard Garcia-Villegas ◽  
Josep Paradells

IEEE 802.11 is one of the most commonly used radio access technologies, being present in almost all handheld devices with networking capabilities. However, its energy-hungry communication modes are a challenge for the increased battery lifetime of such devices and are an obstacle for its use in battery-constrained devices such as the ones defined by many Internet of Things applications. Wake-up Radio (WuR) systems have appeared as a solution for increasing the energy efficiency of communication technologies by employing a secondary low-power radio interface, which is always in the active state and switches the primary transceiver (used for main data communication) from the energy-saving to the active operation mode. The high market penetration of IEEE 802.11 technology, together with the benefits that WuR systems can bring to this widespread technology, motivates this article’s focus on IEEE 802.11-based WuR solutions. More specifically, we elaborate on the feasibility of such IEEE 802.11-based WuR solutions, and introduce the latest standardization efforts in this IEEE 802.11-based WuR domain, IEEE 802.11ba, which is a forthcoming IEEE 802.11 amendment, discussing its main features and potential use cases. As a use case consisting of green Wi-Fi application, we provide a proof-of-concept smart plug system implemented by a WuR that is activated remotely using IEEE 802.11 devices, evaluate its monetary and energy savings, and compare it with commercially available smart plug solutions. Finally, we discuss novel applications beyond the wake-up functionality that IEEE 802.11-enabled WuR devices can offer using a secondary radio, as well as applications that have not yet been considered by IEEE 802.11ba. As a result, we argue that the IEEE 802.11-based WuR solution will support a wide range of devices and deployments, for both low-rate and low-power communications, as well as high-rate transmissions.


Author(s):  
Yingchun Xia ◽  
Zhiqiang Xie ◽  
Yu Xin ◽  
Xiaowei Zhang

The customized products such as electromechanical prototype products are a type of product with research and trial manufacturing characteristics. The BOM structures and processing parameters of the products vary greatly, making it difficult for a single shop to meet such a wide range of processing parameters. For the dynamic and fuzzy manufacturing characteristics of the products, not only the coordinated transport time of multiple shops but also the fact that the product has a designated output shop should be considered. In order to solve such Multi-shop Integrated Scheduling Problem with Fixed Output Constraint (MISP-FOC), a constraint programming model is developed to minimize the total tardiness, and then a Multi-shop Integrated Scheduling Algorithm (MISA) based on EGA (Enhanced Genetic Algorithm) and B&B (Branch and Bound) is proposed. MISA is a hybrid optimization method and consists of four parts. Firstly, to deal with the dynamic and fuzzy manufacturing characteristics, the dynamic production process is transformed into a series of time-continuous static scheduling problem according to the proposed dynamic rescheduling mechanism. Secondly, the pre-scheduling scheme is generated by the EGA at each event moment. Thirdly, the jobs in the pre-scheduling scheme are divided into three parts, namely, dispatched jobs, jobs to be dispatched, and jobs available for rescheduling, and at last, the B&B method is used to optimize the jobs available for rescheduling by utilizing the period when the dispatched jobs are in execution. Google OR-Tools is used to verify the proposed constraint programming model, and the experiment results show that the proposed algorithm is effective and feasible.


2021 ◽  
Vol 35 (2) ◽  
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.


2021 ◽  
pp. 1-1
Author(s):  
Syed Khurram Mahmud ◽  
Yuanwei Liu ◽  
Yue Chen ◽  
Kok Keong Chai

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3871
Author(s):  
Jiri Pokorny ◽  
Khanh Ma ◽  
Salwa Saafi ◽  
Jakub Frolka ◽  
Jose Villa ◽  
...  

Automated systems have been seamlessly integrated into several industries as part of their industrial automation processes. Employing automated systems, such as autonomous vehicles, allows industries to increase productivity, benefit from a wide range of technologies and capabilities, and improve workplace safety. So far, most of the existing systems consider utilizing one type of autonomous vehicle. In this work, we propose a collaboration of different types of unmanned vehicles in maritime offshore scenarios. Providing high capacity, extended coverage, and better quality of services, autonomous collaborative systems can enable emerging maritime use cases, such as remote monitoring and navigation assistance. Motivated by these potential benefits, we propose the deployment of an Unmanned Surface Vehicle (USV) and an Unmanned Aerial Vehicle (UAV) in an autonomous collaborative communication system. Specifically, we design high-speed, directional communication links between a terrestrial control station and the two unmanned vehicles. Using measurement and simulation results, we evaluate the performance of the designed links in different communication scenarios and we show the benefits of employing multiple autonomous vehicles in the proposed communication system.


Sign in / Sign up

Export Citation Format

Share Document