function learning
Recently Published Documents


TOTAL DOCUMENTS

195
(FIVE YEARS 51)

H-INDEX

21
(FIVE YEARS 5)

Automatica ◽  
2022 ◽  
Vol 137 ◽  
pp. 110060
Author(s):  
Milad Farjadnasab ◽  
Maryam Babazadeh

2021 ◽  
Author(s):  
◽  
Yiming Peng

<p>Reinforcement Learning (RL) problems appear in diverse real-world applications and are gaining substantial attention in academia and industry. Policy Direct Search (PDS) is widely recognized as an effective approach to RL problems. However, existing PDS algorithms have some major limitations. First, many step-wise Policy Gradient Search (PGS) algorithms cannot effectively utilize informative historical gradients to accurately estimate policy gradients. Second, although evolutionary PDS algorithms do not rely on accurate policy gradient estimations and can explore learning environments effectively, they are not sample efficient at learning policies in the form of deep neural networks. Third, existing PGS algorithms often diverge easily due to the lack of reliable and flexible techniques for value function learning. Fourth, existing PGS algorithms have not provided suitable mechanisms to learn proper state features automatically.  To address these limitations, the overall goal of this thesis is to develop effective policy direct search algorithms for tackling challenging RL problems through technical innovations in four key areas. First, the thesis aims to improve the accuracy of policy gradient estimation by utilizing historical gradients through a Primal-Dual Approximation technique. Second, the thesis targets on surpassing the state-of-the-art performance by properly balancing the exploration-exploitation trade-off via Covariance Matrix Adaption Evolutionary Strategy (CMA-ES) and Proximal Policy Optimization (PPO). Third, the thesis seeks to stabilize value function learning via a self-organized Sandpile Model (SM) meanwhile generalize the compatible condition to support flexible value function learning. Fourth, the thesis endeavors to develop innovative evolutionary feature learning techniques that are capable of automatically extracting useful state features so as to enhance various cutting-edge PGS algorithms.  In the thesis, we explore the four key technical areas by studying policies with increasing complexity. First of all, we start the research from a simple linear policy representation, and then proceed to a complex neural network based policy representation. Next, we consider a more complicated situation where policy learning is coupled with a value function learning. Subsequently, we consider policies modeled as a concatenation of two interrelated networks, one for feature learning and one for action selection.  To achieve the first goal, this thesis proposes a new policy gradient learning framework where a series of historical gradients are jointly exploited to obtain accurate policy gradient estimations via the Primal-Dual Approximation technique. Under the framework, three new PGS algorithms for step-wise policy training have been derived from three widely used PGS algorithms; meanwhile, the convergence properties of these new algorithms have been theoretically analyzed. The empirical results on several benchmark control problems further show that the newly proposed algorithms can significantly outperform their base algorithms.  To achieve the second goal, this thesis develops a new sample efficient evolutionary deep policy optimization algorithm based on CMA-ES and PPO. The algorithm has a layer-wise learning mechanism to improve computational efficiency in comparison to CMA-ES. Additionally, it uses a performance lower bound based surrogate model for fitness evaluation to significantly reduce the sample cost to the state-of-the-art level. More importantly, the best policy found by CMA-ES at every generation is further improved by PPO to properly balance exploration and exploitation. The experimental results confirm that the proposed algorithm outperforms various cutting-edge algorithms on many benchmark continuous control problems.  To achieve the third goal, this thesis develops new value function learning methods that are both reliable and flexible so as to further enhance the effectiveness of policy gradient search. Two Actor-Critic (AC) algorithms have been successfully developed from a commonly-used PGS algorithm, i.e., Regular Actor-Critic (RAC). The first algorithm adopts SM to stabilize value function learning, and the second algorithm generalizes the logarithm function used by the compatible condition to provide a flexible family of new compatible functions. The experimental results show that, with the help of reliable and flexible value function learning, the newly developed algorithms are more effective than RAC on several benchmark control problems.  To achieve the fourth goal, this thesis develops innovative NeuroEvolution algorithms for automated feature learning to enhance various cutting-edge PGS algorithms. The newly developed algorithms not only can extract useful state features but also learn good policies. The experimental analysis demonstrates that the newly proposed algorithms can achieve better performance on large-scale RL problems in comparison to both well-known PGS algorithms and NeuroEvolution techniques. Our experiments also confirm that the state features learned by NeuroEvolution on one RL task can be easily transferred to boost learning performance on similar but different tasks.</p>


2021 ◽  
Author(s):  
◽  
Yiming Peng

<p>Reinforcement Learning (RL) problems appear in diverse real-world applications and are gaining substantial attention in academia and industry. Policy Direct Search (PDS) is widely recognized as an effective approach to RL problems. However, existing PDS algorithms have some major limitations. First, many step-wise Policy Gradient Search (PGS) algorithms cannot effectively utilize informative historical gradients to accurately estimate policy gradients. Second, although evolutionary PDS algorithms do not rely on accurate policy gradient estimations and can explore learning environments effectively, they are not sample efficient at learning policies in the form of deep neural networks. Third, existing PGS algorithms often diverge easily due to the lack of reliable and flexible techniques for value function learning. Fourth, existing PGS algorithms have not provided suitable mechanisms to learn proper state features automatically.  To address these limitations, the overall goal of this thesis is to develop effective policy direct search algorithms for tackling challenging RL problems through technical innovations in four key areas. First, the thesis aims to improve the accuracy of policy gradient estimation by utilizing historical gradients through a Primal-Dual Approximation technique. Second, the thesis targets on surpassing the state-of-the-art performance by properly balancing the exploration-exploitation trade-off via Covariance Matrix Adaption Evolutionary Strategy (CMA-ES) and Proximal Policy Optimization (PPO). Third, the thesis seeks to stabilize value function learning via a self-organized Sandpile Model (SM) meanwhile generalize the compatible condition to support flexible value function learning. Fourth, the thesis endeavors to develop innovative evolutionary feature learning techniques that are capable of automatically extracting useful state features so as to enhance various cutting-edge PGS algorithms.  In the thesis, we explore the four key technical areas by studying policies with increasing complexity. First of all, we start the research from a simple linear policy representation, and then proceed to a complex neural network based policy representation. Next, we consider a more complicated situation where policy learning is coupled with a value function learning. Subsequently, we consider policies modeled as a concatenation of two interrelated networks, one for feature learning and one for action selection.  To achieve the first goal, this thesis proposes a new policy gradient learning framework where a series of historical gradients are jointly exploited to obtain accurate policy gradient estimations via the Primal-Dual Approximation technique. Under the framework, three new PGS algorithms for step-wise policy training have been derived from three widely used PGS algorithms; meanwhile, the convergence properties of these new algorithms have been theoretically analyzed. The empirical results on several benchmark control problems further show that the newly proposed algorithms can significantly outperform their base algorithms.  To achieve the second goal, this thesis develops a new sample efficient evolutionary deep policy optimization algorithm based on CMA-ES and PPO. The algorithm has a layer-wise learning mechanism to improve computational efficiency in comparison to CMA-ES. Additionally, it uses a performance lower bound based surrogate model for fitness evaluation to significantly reduce the sample cost to the state-of-the-art level. More importantly, the best policy found by CMA-ES at every generation is further improved by PPO to properly balance exploration and exploitation. The experimental results confirm that the proposed algorithm outperforms various cutting-edge algorithms on many benchmark continuous control problems.  To achieve the third goal, this thesis develops new value function learning methods that are both reliable and flexible so as to further enhance the effectiveness of policy gradient search. Two Actor-Critic (AC) algorithms have been successfully developed from a commonly-used PGS algorithm, i.e., Regular Actor-Critic (RAC). The first algorithm adopts SM to stabilize value function learning, and the second algorithm generalizes the logarithm function used by the compatible condition to provide a flexible family of new compatible functions. The experimental results show that, with the help of reliable and flexible value function learning, the newly developed algorithms are more effective than RAC on several benchmark control problems.  To achieve the fourth goal, this thesis develops innovative NeuroEvolution algorithms for automated feature learning to enhance various cutting-edge PGS algorithms. The newly developed algorithms not only can extract useful state features but also learn good policies. The experimental analysis demonstrates that the newly proposed algorithms can achieve better performance on large-scale RL problems in comparison to both well-known PGS algorithms and NeuroEvolution techniques. Our experiments also confirm that the state features learned by NeuroEvolution on one RL task can be easily transferred to boost learning performance on similar but different tasks.</p>


2021 ◽  
Vol 15 ◽  
Author(s):  
Jared J. Schwartzer ◽  
Dolores Garcia-Arocena ◽  
Amanda Jamal ◽  
Ali Izadi ◽  
Rob Willemsen ◽  
...  

Carriers of the fragile X premutation (PM) can develop a variety of early neurological symptoms, including depression, anxiety and cognitive impairment as well as being at risk for developing the late-onset fragile X-associated tremor/ataxia syndrome (FXTAS). The absence of effective treatments for FXTAS underscores the importance of developing efficacious therapies to reduce the neurological symptoms in elderly PM carriers and FXTAS patients. A recent preliminary study reported that weekly infusions of Allopregnanolone (Allop) may improve deficits in executive function, learning and memory in FXTAS patients. Based on this study we examined whether Allop would improve neurological function in the aged CGG knock-in (CGG KI) dutch mouse, B6.129P2(Cg)-Fmr1tm2Cgr/Cgr, that models much of the symptomatology in PM carriers and FXTAS patients. Wild type and CGG KI mice received 10 weekly injections of Allop (10 mg/kg, s.c.), followed by a battery of behavioral tests of motor function, anxiety, and repetitive behavior, and 5-bromo-2′-deoxyuridine (BrdU) labeling to examine adult neurogenesis. The results provided evidence that Allop in CGG KI mice normalized motor performance and reduced thigmotaxis in the open field, normalized repetitive digging behavior in the marble burying test, but did not appear to increase adult neurogenesis in the hippocampus. Considered together, these results support further examination of Allop as a therapeutic strategy in patients with FXTAS.


2021 ◽  
Author(s):  
Amadeus Maes ◽  
Mauricio Barahona ◽  
Claudia Clopath

The statistical structure of the environment is often important when making decisions. There are multiple theories of how the brain represents statistical structure. One such theory states that neural activity spontaneously samples from probability distributions. In other words, the network spends more time in states which encode high-probability stimuli. Existing spiking network models implementing sampling lack the ability to learn the statistical structure from observed stimuli and instead often hard-code a dynamics. Here, we focus on how arbitrary prior knowledge about the external world can both be learned and spontaneously recollected. We present a model based upon learning the inverse of the cumulative distribution function. Learning is entirely unsupervised using biophysical neurons and biologically plausible learning rules. We show how this prior knowledge can then be accessed to compute expectations and signal surprise in downstream networks. Sensory history effects emerge from the model as a consequence of ongoing learning.


Molecules ◽  
2021 ◽  
Vol 26 (17) ◽  
pp. 5338
Author(s):  
Hsin-Ping Liu ◽  
Yueh-Hsiung Kuo ◽  
Jack Cheng ◽  
Li-Zhong Chang ◽  
Meng-Shiun Chang ◽  
...  

Ergosta-7,9(11),22-trien-3β-ol (EK100) was isolated from the Taiwan-specific medicinal fungus Antrodia camphorata, which is known for its health-promotion and anti-aging effects in folk medicine. Alzheimer’s disease (AD) is a major aging-associated disease. We investigated the efficacy and potential mechanism of ergosta-7,9(11),22-trien-3β-ol for AD symptoms. Drosophila with the pan-neuronal overexpression of human amyloid-β (Aβ) was used as the AD model. We compared the life span, motor function, learning, memory, oxidative stress, and biomarkers of microglia activation and inflammation of the ergosta-7,9(11),22-trien-3β-ol-treated group to those of the untreated control. Ergosta-7,9(11),22-trien-3β-ol treatment effectively improved the life span, motor function, learning, and memory of the AD model compared to the untreated control. Biomarkers of microglia activation and inflammation were reduced, while the ubiquitous lipid peroxidation, catalase activity, and superoxide dismutase activity remained unchanged. In conclusion, ergosta-7,9(11),22-trien-3β-ol rescues AD deficits by modulating microglia activation but not oxidative stress.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jiahao Ren ◽  
Junfei Cai ◽  
Jinjin Li

AbstractState of health (SOH) prediction of supercapacitors aims to provide reliable lifetime control and avoid system failure. Gaussian process regression (GPR) has emerged for SOH prediction because of its capability of capturing nonlinear relationships between features, and tracking SOH attenuations effectively. However, traditional GPR methods based on explicit functions require multiple screenings of optimal mean and covariance functions, which results in data scarcity and increased time consumption. In this study, we propose a GPR-implicit function learning, which is a prior knowledge algorithm for calculating mean and covariance functions from a preliminary data set instead of screening. After introducing the implicit function, the average root mean square error (Average RMSE) is 0.0056 F and the average mean absolute percent error (Average MAPE) is 0.6%, where only the first 5% of the data are trained to predict the remaining 95% of the cycles, thereby decreasing the error by more than three times than previous studies. Furthermore, less cycles (i.e., 1%) are trained while still obtaining low prediction errors (i.e., Average RMSE is 0.0094 F and Average MAPE is 1.01%). This work highlights the strength of GPR-implicit function model for SOH prediction of energy storage devices with high precision and limited property data.


Sign in / Sign up

Export Citation Format

Share Document