scholarly journals A NOVEL PARADIGM FOR DEEP REINFORCEMENT LEARNING OF BIOMIMETIC SYSTEMS

2021 ◽  
Author(s):  
Raghu Sesha Iyengar ◽  
Kapardi Mallampalli ◽  
Mohan Raghavan

Mechanisms behind neural control of movement have been an active area of research. Goal-directed movement is a common experimental setup used to understand these mechanisms and neural pathways. On the one hand, optimal feedback control theory is used to model and make quantitative predictions of the coordinated activations of the effectors, such as muscles, joints or limbs. While on the other hand, evidence shows that higher centres such as Basal Ganglia and Cerebellum are involved in activities such as reinforcement learning and error correction. In this paper, we provide a framework to build a digital twin of relevant sections of the human spinal cord using our NEUROiD platform. The digital twin is anatomically and physiologically realistic model of the spinal cord at cellular, spinal networks and system level. We then build a framework to learn the supraspinal activations necessary to perform a simple goal directed movement of the upper limb. The NEUROiD model is interfaced to an Opensim model for all the musculoskeletal simulations. We use Deep Reinforcement Learning to obtain the supraspinal activations necessary to perform the goal directed movement. As per our knowledge, this is the first time an attempt is made to learn the stimulation pattern at the spinal cord level, especially by limiting the observation space to only the afferent feedback received on the Ia, II and Ib fibers. Such a setup results in a biologically realistic constrained environment for learning. Our results show that (1) Reinforcement Learning algorithm converges naturally to the triphasic response observed during goal directed movement (2) Increasing the complexity of the goal gradually was very important to accelerate learning (3) Modulation of the afferent inputs were sufficient to execute tasks which were not explicitly learned, but were closely related to the learnt task.

2004 ◽  
Vol 92 (2) ◽  
pp. 673-685 ◽  
Author(s):  
Robert E. Steldt ◽  
Brian D. Schmit

Individuals with chronic spinal cord injury (SCI) often demonstrate multijoint reflex activity that is clinically classified as an extensor spasm. These responses are commonly observed in conjunction with an imposed extension movement of the hips, such as movement from a sit to a supine position. Coincidentally, afferent feedback from hip proprioceptors has also been implicated in the control of locomotion in the spinalized cat. Because of this concurrence, we postulated that extensor spasms that are triggered by hip extension might involve activation of organized interneuronal circuits that also have a role in locomotion. If true, imposed oscillations of the hip would be expected to produce activity of the leg musculature in a locomotor pattern. Furthermore, this muscle activity would be entrained to the hip movement. The right hip joints of 10 individuals with chronic SCI, consisting of both complete [American Spinal Injury Association (ASIA) A] and incomplete (ASIA B,C) injuries, were subjected to ramp and hold (10 s) movements at 60°/s and sinusoidal oscillations at 1.2, 1.88, and 2.2 rad/s over ranges from 40 to –15° (±5°) using a custom servomotor system. Surface EMG from seven lower extremity muscles and sagittal-plane joint torques were recorded to characterize the response. Ramp and hold perturbations produced coactivation at the hip, knee, and ankle joints, with a long duration (5–10 s). Sinusoidal perturbations yielded consistent muscle timing patterns that resulted in alternating flexor and extensor joint torques. EMG and joint torques were commonly entrained to the frequency of movement, with rectus femoris, vastus medialis, and soleus activity coinciding with hip extension and medial hamstrings activity occurring during hip flexion. Individual muscle timing patterns were consistent with hip position during normal gait, except for the vastus medialis. These results suggest that reflexes associated with extensor spasms may occur through organized interneuronal pathways, such as spinal centers for locomotion.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 471
Author(s):  
Jai Hoon Park ◽  
Kang Hoon Lee

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter Morales ◽  
Rajmonda Sulo Caceres ◽  
Tina Eliassi-Rad

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.


Sign in / Sign up

Export Citation Format

Share Document