learning agent
Recently Published Documents


TOTAL DOCUMENTS

292
(FIVE YEARS 146)

H-INDEX

14
(FIVE YEARS 4)

2021 ◽  
Vol 12 (1) ◽  
pp. 384
Author(s):  
Seolwon Koo ◽  
Yujin Lim

In the Industrial Internet of Things (IIoT), various tasks are created dynamically because of the small quantity batch production. Hence, it is difficult to execute tasks only with devices that have limited battery lives and computation capabilities. To solve this problem, we adopted the mobile edge computing (MEC) paradigm. However, if there are numerous tasks to be processed on the MEC server (MECS), it may not be suitable to deal with all tasks in the server within a delay constraint owing to the limited computational capability and high network overhead. Therefore, among cooperative computing techniques, we focus on task offloading to nearby devices using device-to-device (D2D) communication. Consequently, we propose a method that determines the optimal offloading strategy in an MEC environment with D2D communication. We aim to minimize the energy consumption of the devices and task execution delay under certain delay constraints. To solve this problem, we adopt a Q-learning algorithm that is part of reinforcement learning (RL). However, if one learning agent determines whether to offload tasks from all devices, the computing complexity of that agent increases tremendously. Thus, we cluster the nearby devices that comprise the job shop, where each cluster’s head determines the optimal offloading strategy for the tasks that occur within its cluster. Simulation results show that the proposed algorithm outperforms the compared methods in terms of device energy consumption, task completion rate, task blocking rate, and throughput.


2021 ◽  
Author(s):  
Gabriel Borrageiro ◽  
Nick Firoozye ◽  
Paolo Barucca

We explore online inductive transfer learning, with a feature representation transfer from a radial basis function network formed of Gaussian mixture model hidden processing units to a direct, recurrent reinforcement learning agent. This agent is put to work in an experiment, trading the major spot market currency pairs, where we accurately account for transaction and funding costs. These sources of profit and loss, including the price trends that occur in the currency markets, are made available to the agent via a quadratic utility, who learns to target a position directly. We improve upon earlier work by learning to target a risk position in an online transfer learning context. Our agent achieves an annualised portfolio information ratio of 0.52 with a compound return of 9.3%, net of execution and funding cost, over a 7-year test set; this is despite forcing the model to trade at the close of the trading day at 5 pm EST when trading costs are statistically the most expensive.<br>


2021 ◽  
Author(s):  
Shree Sowndarya S. V. ◽  
Jeffrey Law ◽  
Charles Tripp ◽  
Dmitry Duplyakin ◽  
Erotokritos Skordilis ◽  
...  

Advances in the field of goal-directed molecular optimization offer the promise to find feasible candidates for even the most challenging molecular design applications. However, several obstacles remain in applying these tools to practical problems, including lengthy computational or experimental evaluation, synthesizability considerations, and a vast potential search space. As an example of a fundamental design challenge with industrial relevance, we search for novel stable radical scaffolds for an aqueous redox flow battery that simultaneously satisfy redox requirements at the anode and cathode. To meet this challenge, we develop a new open-source molecular optimization framework based on AlphaZero coupled with a fast, machine learning-derived surrogate objective trained with nearly 100,000 quantum chemistry simulations. The objective function comprises two graph neural networks: one that predicts adiabatic oxidation and reduction potentials and a second that predicts electron density and local 3D environment, previously shown to be correlated with radical persistence and stability. With no hand-coded knowledge of organic chemistry, the reinforcement learning agent finds molecule candidates that satisfy a precise combination of redox, stability, and synthesizability requirements defined at the quantum chemistry level, many of which have reasonable predicted retrosynthetic pathways. The optimized molecules show that alternative stable radical scaffolds may offer a unique profile of stability and redox potentials to enable low-cost symmetric aqueous redox flow batteries.


2021 ◽  
Vol 10 ◽  
pp. 41-57
Author(s):  
Valentyna Yunchyk ◽  
◽  
Natalia Kunanets ◽  
Volodymyr Pasichnyk ◽  
Anatolii Fedoniuk ◽  
...  

The key terms and basic concepts of the agent are analyzed. The structured general classification of agents according to the representation of the model of the external environment, by the type of processing information and by the functions performed is given. The classification of artificial agents (intellectual, reflex, impulsive, trophic) also is s analyzed. The necessary conditions for the implementation of a certain behavior by the agent are given, as well as the scheme of functioning of the intelligent agent. The levels of knowledge that play a key role in the architecture of the agent are indicated. The functional diagram of a learning agent that works relatively independently, demonstrating flexible behavior. It is discussed that the functional scheme of the reactive agent determines the dependence on the environment. The properties of the intelligent agent are described in detail and the block diagram is indicated. Various variants of agent architectures, in particular neural network agent architectures, are considered. The organization of level interaction in the multilevel agent architecture is proposed. Considerable attention is paid to the Will-architecture and InteRRaP- architecture of agents. A multilevel architecture for an autonomous agent of a Turing machine is considered.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Andrei Bratu ◽  
Gabriela Czibula

Data augmentation is a commonly used technique in data science for improving the robustness and performance of machine learning models. The purpose of the paper is to study the feasibility of generating synthetic data points of temporal nature towards this end. A general approach named DAuGAN (Data Augmentation using Generative Adversarial Networks) is presented for identifying poorly represented sections of a time series, studying the synthesis and integration of new data points, and performance improvement on a benchmark machine learning model. The problem is studied and applied in the domain of algorithmic trading, whose constraints are presented and taken into consideration. The experimental results highlight an improvement in performance on a benchmark reinforcement learning agent trained on a dataset enhanced with DAuGAN to trade a financial instrument.


2021 ◽  
Vol 2050 (1) ◽  
pp. 012012
Author(s):  
Yifei Shen ◽  
Tian Liu ◽  
Wenke Liu ◽  
Ruiqing Xu ◽  
Zhuo Li ◽  
...  

Abstract Recommending stocks is very important for investment companies and investors. However, without enough analysts, no stock selection strategy can capture the dynamics of all S&P 500 stocks. Nevertheless, most existing recommending strategies are based on predictive models to buy and hold stocks with high return potential. But these strategies fail to recommend stocks from different industrial sectors to reduce risks. In this article, we propose a novel solution that recommends a stock portfolio with reinforcement learning from the S&P 500 index. Our basic idea is to construct a stock relation graph (RG) which provide rich relations among stocks and industrial sectors, to generate diversified recommendation result. To this end, we design a new method to explore high-quality stocks from the constructed relation graph with reinforcement learning. Specifically, the reinforcement learning agent jumps from each industrial sector to select stock based on the feedback signals from the market. Finally, we apply portfolio allocation methods (i.e., mean-variance and minimum-variance) to test the validity of the recommendation. The empirical results show that the performance of portfolio allocation based on the selected stocks is better than the long-term strategy on the S&P 500 Index in terms of cumulative returns.


Author(s):  
Yusufu Gambo ◽  
Muhammad Zeeshan Shakir

The increasing development in smart and mobile technologies are transforming learning environments into a smart learning environment. Students process information and learn in different ways, and this can affect the teaching and learning process. To provide a system capable of adapting learning contents based on student's learning behavior in a learning environment, the automated classification of the learners' learning patterns offers a concrete means for teachers to personalize students' learning. Previously, this research proposed a model of a self-regulated smart learning environment called the metacognitive smart learning environment model (MSLEM). The model identified five metacognitive skills-goal settings (GS), help-seeking (HS), task strategies (TS), time-management (TM), and self-evaluation (SE) that are critical for online learning success. Based on these skills, this paper develops a learning agent to classify students' learning styles using artificial neural networks (ANN), which mapped to Felder-Silverman Learning Style Model (FSLSM) as the expected outputs. The receiver operating characteristic (ROC) curve was used to determine the consistency of classification data, and positive results were obtained with an average accuracy of 93%. The data from the students were grouped into six training and testing, each with a different splitting ratio and different training accuracy values for the various percentages of Felder-Silverman Learning Style dimensions.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Brydon Eastman ◽  
Michelle Przedborski ◽  
Mohammad Kohandel

AbstractThe in-silico development of a chemotherapeutic dosing schedule for treating cancer relies upon a parameterization of a particular tumour growth model to describe the dynamics of the cancer in response to the dose of the drug. In practice, it is often prohibitively difficult to ensure the validity of patient-specific parameterizations of these models for any particular patient. As a result, sensitivities to these particular parameters can result in therapeutic dosing schedules that are optimal in principle not performing well on particular patients. In this study, we demonstrate that chemotherapeutic dosing strategies learned via reinforcement learning methods are more robust to perturbations in patient-specific parameter values than those learned via classical optimal control methods. By training a reinforcement learning agent on mean-value parameters and allowing the agent periodic access to a more easily measurable metric, relative bone marrow density, for the purpose of optimizing dose schedule while reducing drug toxicity, we are able to develop drug dosing schedules that outperform schedules learned via classical optimal control methods, even when such methods are allowed to leverage the same bone marrow measurements.


Energies ◽  
2021 ◽  
Vol 14 (17) ◽  
pp. 5587
Author(s):  
Harri Aaltonen ◽  
Seppo Sierla ◽  
Rakshith Subramanya ◽  
Valeriy Vyatkin

Battery storages are an essential element of the emerging smart grid. Compared to other distributed intelligent energy resources, batteries have the advantage of being able to rapidly react to events such as renewable generation fluctuations or grid disturbances. There is a lack of research on ways to profitably exploit this ability. Any solution needs to consider rapid electrical phenomena as well as the much slower dynamics of relevant electricity markets. Reinforcement learning is a branch of artificial intelligence that has shown promise in optimizing complex problems involving uncertainty. This article applies reinforcement learning to the problem of trading batteries. The problem involves two timescales, both of which are important for profitability. Firstly, trading the battery capacity must occur on the timescale of the chosen electricity markets. Secondly, the real-time operation of the battery must ensure that no financial penalties are incurred from failing to meet the technical specification. The trading-related decisions must be done under uncertainties, such as unknown future market prices and unpredictable power grid disturbances. In this article, a simulation model of a battery system is proposed as the environment to train a reinforcement learning agent to make such decisions. The system is demonstrated with an application of the battery to Finnish primary frequency reserve markets.


Sign in / Sign up

Export Citation Format

Share Document