function learning Latest Research Papers

Model-free LQR design by Q-function learning

Automatica ◽

10.1016/j.automatica.2021.110060 ◽

2022 ◽

Vol 137 ◽

pp. 110060

Author(s):

Milad Farjadnasab ◽

Maryam Babazadeh

Keyword(s):

Function Learning ◽

Model Free ◽

Q Function

Policy Direct Search for Effective Reinforcement Learning

10.26686/wgtn.17138678 ◽

2021 ◽

Author(s):

◽

Yiming Peng

Keyword(s):

Value Function ◽

Feature Learning ◽

Direct Search ◽

The State ◽

Cutting Edge ◽

Approximation Technique ◽

Control Problems ◽

Function Learning ◽

Policy Gradient ◽

Primal Dual

<p>Reinforcement Learning (RL) problems appear in diverse real-world applications and are gaining substantial attention in academia and industry. Policy Direct Search (PDS) is widely recognized as an effective approach to RL problems. However, existing PDS algorithms have some major limitations. First, many step-wise Policy Gradient Search (PGS) algorithms cannot effectively utilize informative historical gradients to accurately estimate policy gradients. Second, although evolutionary PDS algorithms do not rely on accurate policy gradient estimations and can explore learning environments effectively, they are not sample efficient at learning policies in the form of deep neural networks. Third, existing PGS algorithms often diverge easily due to the lack of reliable and flexible techniques for value function learning. Fourth, existing PGS algorithms have not provided suitable mechanisms to learn proper state features automatically. To address these limitations, the overall goal of this thesis is to develop effective policy direct search algorithms for tackling challenging RL problems through technical innovations in four key areas. First, the thesis aims to improve the accuracy of policy gradient estimation by utilizing historical gradients through a Primal-Dual Approximation technique. Second, the thesis targets on surpassing the state-of-the-art performance by properly balancing the exploration-exploitation trade-off via Covariance Matrix Adaption Evolutionary Strategy (CMA-ES) and Proximal Policy Optimization (PPO). Third, the thesis seeks to stabilize value function learning via a self-organized Sandpile Model (SM) meanwhile generalize the compatible condition to support flexible value function learning. Fourth, the thesis endeavors to develop innovative evolutionary feature learning techniques that are capable of automatically extracting useful state features so as to enhance various cutting-edge PGS algorithms. In the thesis, we explore the four key technical areas by studying policies with increasing complexity. First of all, we start the research from a simple linear policy representation, and then proceed to a complex neural network based policy representation. Next, we consider a more complicated situation where policy learning is coupled with a value function learning. Subsequently, we consider policies modeled as a concatenation of two interrelated networks, one for feature learning and one for action selection. To achieve the first goal, this thesis proposes a new policy gradient learning framework where a series of historical gradients are jointly exploited to obtain accurate policy gradient estimations via the Primal-Dual Approximation technique. Under the framework, three new PGS algorithms for step-wise policy training have been derived from three widely used PGS algorithms; meanwhile, the convergence properties of these new algorithms have been theoretically analyzed. The empirical results on several benchmark control problems further show that the newly proposed algorithms can significantly outperform their base algorithms. To achieve the second goal, this thesis develops a new sample efficient evolutionary deep policy optimization algorithm based on CMA-ES and PPO. The algorithm has a layer-wise learning mechanism to improve computational efficiency in comparison to CMA-ES. Additionally, it uses a performance lower bound based surrogate model for fitness evaluation to significantly reduce the sample cost to the state-of-the-art level. More importantly, the best policy found by CMA-ES at every generation is further improved by PPO to properly balance exploration and exploitation. The experimental results confirm that the proposed algorithm outperforms various cutting-edge algorithms on many benchmark continuous control problems. To achieve the third goal, this thesis develops new value function learning methods that are both reliable and flexible so as to further enhance the effectiveness of policy gradient search. Two Actor-Critic (AC) algorithms have been successfully developed from a commonly-used PGS algorithm, i.e., Regular Actor-Critic (RAC). The first algorithm adopts SM to stabilize value function learning, and the second algorithm generalizes the logarithm function used by the compatible condition to provide a flexible family of new compatible functions. The experimental results show that, with the help of reliable and flexible value function learning, the newly developed algorithms are more effective than RAC on several benchmark control problems. To achieve the fourth goal, this thesis develops innovative NeuroEvolution algorithms for automated feature learning to enhance various cutting-edge PGS algorithms. The newly developed algorithms not only can extract useful state features but also learn good policies. The experimental analysis demonstrates that the newly proposed algorithms can achieve better performance on large-scale RL problems in comparison to both well-known PGS algorithms and NeuroEvolution techniques. Our experiments also confirm that the state features learned by NeuroEvolution on one RL task can be easily transferred to boost learning performance on similar but different tasks.</p>

Policy Direct Search for Effective Reinforcement Learning

10.26686/wgtn.17138678.v1 ◽

2021 ◽

Author(s):

◽

Yiming Peng

Keyword(s):

Value Function ◽

Feature Learning ◽

Direct Search ◽

The State ◽

Cutting Edge ◽

Approximation Technique ◽

Control Problems ◽

Function Learning ◽

Policy Gradient ◽

Primal Dual

<p>Reinforcement Learning (RL) problems appear in diverse real-world applications and are gaining substantial attention in academia and industry. Policy Direct Search (PDS) is widely recognized as an effective approach to RL problems. However, existing PDS algorithms have some major limitations. First, many step-wise Policy Gradient Search (PGS) algorithms cannot effectively utilize informative historical gradients to accurately estimate policy gradients. Second, although evolutionary PDS algorithms do not rely on accurate policy gradient estimations and can explore learning environments effectively, they are not sample efficient at learning policies in the form of deep neural networks. Third, existing PGS algorithms often diverge easily due to the lack of reliable and flexible techniques for value function learning. Fourth, existing PGS algorithms have not provided suitable mechanisms to learn proper state features automatically. To address these limitations, the overall goal of this thesis is to develop effective policy direct search algorithms for tackling challenging RL problems through technical innovations in four key areas. First, the thesis aims to improve the accuracy of policy gradient estimation by utilizing historical gradients through a Primal-Dual Approximation technique. Second, the thesis targets on surpassing the state-of-the-art performance by properly balancing the exploration-exploitation trade-off via Covariance Matrix Adaption Evolutionary Strategy (CMA-ES) and Proximal Policy Optimization (PPO). Third, the thesis seeks to stabilize value function learning via a self-organized Sandpile Model (SM) meanwhile generalize the compatible condition to support flexible value function learning. Fourth, the thesis endeavors to develop innovative evolutionary feature learning techniques that are capable of automatically extracting useful state features so as to enhance various cutting-edge PGS algorithms. In the thesis, we explore the four key technical areas by studying policies with increasing complexity. First of all, we start the research from a simple linear policy representation, and then proceed to a complex neural network based policy representation. Next, we consider a more complicated situation where policy learning is coupled with a value function learning. Subsequently, we consider policies modeled as a concatenation of two interrelated networks, one for feature learning and one for action selection. To achieve the first goal, this thesis proposes a new policy gradient learning framework where a series of historical gradients are jointly exploited to obtain accurate policy gradient estimations via the Primal-Dual Approximation technique. Under the framework, three new PGS algorithms for step-wise policy training have been derived from three widely used PGS algorithms; meanwhile, the convergence properties of these new algorithms have been theoretically analyzed. The empirical results on several benchmark control problems further show that the newly proposed algorithms can significantly outperform their base algorithms. To achieve the second goal, this thesis develops a new sample efficient evolutionary deep policy optimization algorithm based on CMA-ES and PPO. The algorithm has a layer-wise learning mechanism to improve computational efficiency in comparison to CMA-ES. Additionally, it uses a performance lower bound based surrogate model for fitness evaluation to significantly reduce the sample cost to the state-of-the-art level. More importantly, the best policy found by CMA-ES at every generation is further improved by PPO to properly balance exploration and exploitation. The experimental results confirm that the proposed algorithm outperforms various cutting-edge algorithms on many benchmark continuous control problems. To achieve the third goal, this thesis develops new value function learning methods that are both reliable and flexible so as to further enhance the effectiveness of policy gradient search. Two Actor-Critic (AC) algorithms have been successfully developed from a commonly-used PGS algorithm, i.e., Regular Actor-Critic (RAC). The first algorithm adopts SM to stabilize value function learning, and the second algorithm generalizes the logarithm function used by the compatible condition to provide a flexible family of new compatible functions. The experimental results show that, with the help of reliable and flexible value function learning, the newly developed algorithms are more effective than RAC on several benchmark control problems. To achieve the fourth goal, this thesis develops innovative NeuroEvolution algorithms for automated feature learning to enhance various cutting-edge PGS algorithms. The newly developed algorithms not only can extract useful state features but also learn good policies. The experimental analysis demonstrates that the newly proposed algorithms can achieve better performance on large-scale RL problems in comparison to both well-known PGS algorithms and NeuroEvolution techniques. Our experiments also confirm that the state features learned by NeuroEvolution on one RL task can be easily transferred to boost learning performance on similar but different tasks.</p>

Allopregnanolone Improves Locomotor Activity and Arousal in the Aged CGG Knock-in Mouse Model of Fragile X-Associated Tremor/Ataxia Syndrome

Frontiers in Neuroscience ◽

10.3389/fnins.2021.752973 ◽

2021 ◽

Vol 15 ◽

Author(s):

Jared J. Schwartzer ◽

Dolores Garcia-Arocena ◽

Amanda Jamal ◽

Ali Izadi ◽

Rob Willemsen ◽

...

Keyword(s):

Adult Neurogenesis ◽

Late Onset ◽

Fragile X ◽

Repetitive Behavior ◽

Neurological Symptoms ◽

Function Learning ◽

Digging Behavior ◽

Marble Burying ◽

Brdu Labeling ◽

Ataxia Syndrome

Carriers of the fragile X premutation (PM) can develop a variety of early neurological symptoms, including depression, anxiety and cognitive impairment as well as being at risk for developing the late-onset fragile X-associated tremor/ataxia syndrome (FXTAS). The absence of effective treatments for FXTAS underscores the importance of developing efficacious therapies to reduce the neurological symptoms in elderly PM carriers and FXTAS patients. A recent preliminary study reported that weekly infusions of Allopregnanolone (Allop) may improve deficits in executive function, learning and memory in FXTAS patients. Based on this study we examined whether Allop would improve neurological function in the aged CGG knock-in (CGG KI) dutch mouse, B6.129P2(Cg)-Fmr1tm2Cgr/Cgr, that models much of the symptomatology in PM carriers and FXTAS patients. Wild type and CGG KI mice received 10 weekly injections of Allop (10 mg/kg, s.c.), followed by a battery of behavioral tests of motor function, anxiety, and repetitive behavior, and 5-bromo-2′-deoxyuridine (BrdU) labeling to examine adult neurogenesis. The results provided evidence that Allop in CGG KI mice normalized motor performance and reduced thigmotaxis in the open field, normalized repetitive digging behavior in the marble burying test, but did not appear to increase adult neurogenesis in the hippocampus. Considered together, these results support further examination of Allop as a therapeutic strategy in patients with FXTAS.

Long- and short-term history effects in a spiking network model of statistical learning

10.1101/2021.09.22.461372 ◽

2021 ◽

Author(s):

Amadeus Maes ◽

Mauricio Barahona ◽

Claudia Clopath

Keyword(s):

Prior Knowledge ◽

Probability Distributions ◽

Network Models ◽

Cumulative Distribution ◽

Function Learning ◽

Statistical Structure ◽

History Effects ◽

Spiking Network ◽

Sensory History ◽

The Brain

The statistical structure of the environment is often important when making decisions. There are multiple theories of how the brain represents statistical structure. One such theory states that neural activity spontaneously samples from probability distributions. In other words, the network spends more time in states which encode high-probability stimuli. Existing spiking network models implementing sampling lack the ability to learn the statistical structure from observed stimuli and instead often hard-code a dynamics. Here, we focus on how arbitrary prior knowledge about the external world can both be learned and spontaneously recollected. We present a model based upon learning the inverse of the cumulative distribution function. Learning is entirely unsupervised using biophysical neurons and biologically plausible learning rules. We show how this prior knowledge can then be accessed to compute expectations and signal surprise in downstream networks. Sensory history effects emerge from the model as a consequence of ongoing learning.

Ergosta-7,9(11),22-trien-3β-ol Rescues AD Deficits by Modulating Microglia Activation but Not Oxidative Stress

Molecules ◽

10.3390/molecules26175338 ◽

2021 ◽

Vol 26 (17) ◽

pp. 5338

Author(s):

Hsin-Ping Liu ◽

Yueh-Hsiung Kuo ◽

Jack Cheng ◽

Li-Zhong Chang ◽

Meng-Shiun Chang ◽

...

Keyword(s):

Oxidative Stress ◽

Life Span ◽

Motor Function ◽

Untreated Control ◽

Folk Medicine ◽

Amyloid Β ◽

Treated Group ◽

Microglia Activation ◽

Aging Effects ◽

Function Learning

Ergosta-7,9(11),22-trien-3β-ol (EK100) was isolated from the Taiwan-specific medicinal fungus Antrodia camphorata, which is known for its health-promotion and anti-aging effects in folk medicine. Alzheimer’s disease (AD) is a major aging-associated disease. We investigated the efficacy and potential mechanism of ergosta-7,9(11),22-trien-3β-ol for AD symptoms. Drosophila with the pan-neuronal overexpression of human amyloid-β (Aβ) was used as the AD model. We compared the life span, motor function, learning, memory, oxidative stress, and biomarkers of microglia activation and inflammation of the ergosta-7,9(11),22-trien-3β-ol-treated group to those of the untreated control. Ergosta-7,9(11),22-trien-3β-ol treatment effectively improved the life span, motor function, learning, and memory of the AD model compared to the untreated control. Biomarkers of microglia activation and inflammation were reduced, while the ubiquitous lipid peroxidation, catalase activity, and superoxide dismutase activity remained unchanged. In conclusion, ergosta-7,9(11),22-trien-3β-ol rescues AD deficits by modulating microglia activation but not oxidative stress.

Error function learning with interpretable compositional networks for constraint-based local search

Proceedings of the Genetic and Evolutionary Computation Conference Companion ◽

10.1145/3449726.3459464 ◽

2021 ◽

Author(s):

Florian Richoux ◽

Jean-François Baffier

Keyword(s):

Local Search ◽

Error Function ◽

Function Learning

Extrapolation accuracy underestimates rule learning: Evidence from the function-learning paradigm

Acta Psychologica ◽

10.1016/j.actpsy.2021.103356 ◽

2021 ◽

Vol 218 ◽

pp. 103356

Author(s):

Nadia Said ◽

Helen Fischer

Keyword(s):

Rule Learning ◽

Learning Paradigm ◽

Function Learning

Structure Function Learning of Hierarchical Multi-State Systems with Incomplete Observation Sequences

Reliability Engineering & System Safety ◽

10.1016/j.ress.2021.107902 ◽

2021 ◽

pp. 107902

Author(s):

Yi-Xuan Zheng ◽

Tangfan Xiahou ◽

Yu Liu ◽

Chaoyang Xie

Keyword(s):

Structure Function ◽

Function Learning

High precision implicit function learning for forecasting supercapacitor state of health based on Gaussian process regression

Scientific Reports ◽

10.1038/s41598-021-91241-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jiahao Ren ◽

Junfei Cai ◽

Jinjin Li

Keyword(s):

Gaussian Process ◽

High Precision ◽

Gaussian Process Regression ◽

Prediction Errors ◽

Implicit Function ◽

State Of Health ◽

Energy Storage Devices ◽

Function Learning ◽

Data Set ◽

Covariance Functions

AbstractState of health (SOH) prediction of supercapacitors aims to provide reliable lifetime control and avoid system failure. Gaussian process regression (GPR) has emerged for SOH prediction because of its capability of capturing nonlinear relationships between features, and tracking SOH attenuations effectively. However, traditional GPR methods based on explicit functions require multiple screenings of optimal mean and covariance functions, which results in data scarcity and increased time consumption. In this study, we propose a GPR-implicit function learning, which is a prior knowledge algorithm for calculating mean and covariance functions from a preliminary data set instead of screening. After introducing the implicit function, the average root mean square error (Average RMSE) is 0.0056 F and the average mean absolute percent error (Average MAPE) is 0.6%, where only the first 5% of the data are trained to predict the remaining 95% of the cycles, thereby decreasing the error by more than three times than previous studies. Furthermore, less cycles (i.e., 1%) are trained while still obtaining low prediction errors (i.e., Average RMSE is 0.0094 F and Average MAPE is 1.01%). This work highlights the strength of GPR-implicit function model for SOH prediction of energy storage devices with high precision and limited property data.

function learning
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Model-free LQR design by Q-function learning

Policy Direct Search for Effective Reinforcement Learning

Policy Direct Search for Effective Reinforcement Learning

Allopregnanolone Improves Locomotor Activity and Arousal in the Aged CGG Knock-in Mouse Model of Fragile X-Associated Tremor/Ataxia Syndrome

Long- and short-term history effects in a spiking network model of statistical learning

Ergosta-7,9(11),22-trien-3β-ol Rescues AD Deficits by Modulating Microglia Activation but Not Oxidative Stress

Error function learning with interpretable compositional networks for constraint-based local search

Extrapolation accuracy underestimates rule learning: Evidence from the function-learning paradigm

Structure Function Learning of Hierarchical Multi-State Systems with Incomplete Observation Sequences

High precision implicit function learning for forecasting supercapacitor state of health based on Gaussian process regression

Export Citation Format

function learningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Model-free LQR design by Q-function learning

Policy Direct Search for Effective Reinforcement Learning

Policy Direct Search for Effective Reinforcement Learning

Allopregnanolone Improves Locomotor Activity and Arousal in the Aged CGG Knock-in Mouse Model of Fragile X-Associated Tremor/Ataxia Syndrome

Long- and short-term history effects in a spiking network model of statistical learning

Ergosta-7,9(11),22-trien-3β-ol Rescues AD Deficits by Modulating Microglia Activation but Not Oxidative Stress

Error function learning with interpretable compositional networks for constraint-based local search

Extrapolation accuracy underestimates rule learning: Evidence from the function-learning paradigm

Structure Function Learning of Hierarchical Multi-State Systems with Incomplete Observation Sequences

High precision implicit function learning for forecasting supercapacitor state of health based on Gaussian process regression

function learning
Recently Published Documents