Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation

Learning to maintain postural balance while standing requires a significant fine coordination effort between the neuromuscular system and the sensory system. It is one of the key contributing factors towards fall prevention, especially in the older population. Using artificial intelligence (AI), we can similarly teach an agent to maintain a standing posture, and thus teach the agent not to fall. In this paper, we investigate the learning progress of an AI agent and how it maintains a stable standing posture through reinforcement learning. During training, the AI agent learnt three policies. First, it learnt to maintain the Centre-of-Gravity and Zero-Moment-Point in front of the body. Then, it learnt to shift the load of the entire body on one leg while using the other leg for fine tuning the balancing action. Finally, it started to learn the coordination between the two pre-trained policies. This study shows the potentials of using deep reinforcement learning in human movement studies. The learnt AI behaviour also exhibited attempts to achieve an unplanned goal because it correlated with the set goal (e.g. walking in order to prevent falling). The failed attempts to maintain a standing posture is an interesting by-product which can enrich the fall detection and prevention research efforts.

Download Full-text

Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation

10.20944/preprints202006.0046.v1 ◽

2020 ◽

Author(s):

Mohammed Hossny ◽

Julie Iskander

Keyword(s):

Reinforcement Learning ◽

Sensory System ◽

Human Movement ◽

Fall Detection ◽

The Body ◽

Fine Tuning ◽

Entire Body ◽

Standing Posture ◽

Contributing Factors ◽

Learning Journey

Learning to maintain postural balance while standing requires a significant fine coordination effort between the neuromuscular system and the sensory system. It is one of the key contributing factors towards fall prevention, especially in the older population. Using artificial intelligence (AI), we can similarly teach an agent to maintain a standing posture, and thus teach the agent not to fall. In this paper, we investigate the learning progress of an AI agent and how it maintains a stable standing posture through reinforcement learning. During training, the AI agent learnt three policies. First, it learnt to maintain the Centre-of-Gravity and Zero-Moment-Point in front of the body. Then, it learnt to shift the load of the entire body on one leg while using the other leg for fine tuning the balancing action. Finally, it started to learn the coordination between the two pre-trained policies. This study shows the potentials of using deep reinforcement learning in human movement studies. The learnt AI behaviour also exhibited attempts to achieve an unplanned goal because it correlated with the set goal (e.g. walking in order to prevent falling). The failed attempts to maintain a standing posture is an interesting by-product which can enrich the fall detection and prevention research efforts.

Download Full-text

FALL DETECTION USING THREE WEARABLE TRIAXIAL ACCELEROMETERS AND A DECISION-TREE CLASSIFIER

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237214500598 ◽

2014 ◽

Vol 26 (05) ◽

pp. 1450059 ◽

Cited By ~ 3

Author(s):

Kan Luo ◽

Jianqing Li ◽

Jianfeng Wu ◽

Hua Yang ◽

Gaozhi Xu

Keyword(s):

Decision Tree ◽

Detection System ◽

Human Movement ◽

Fall Detection ◽

The Body ◽

Body Tilt ◽

Decision Tree Classifier ◽

Tree Model ◽

Tree Classifier ◽

Unintentional Falls

Unintentional falls cause serious health problem and high medical cost, particularly among the elders. Efficient fall detection can ensure fallen subjects with timely rescue, less pain and lower health-care expense. However, the accuracy of the present fall detection system with single accelerometer does not meet the requirement of practical application. In this paper, a fall detection method using three wearable triaxial accelerometers and a decision-tree classifier is proposed. The three triaxial accelerometers are, respectively mounted on the head, the waist and the ankle to capture the acceleration signals of human movement. A Kalman filter is adopted to estimate the body tilt angle. After the features are extracted, the trained decision-tree model is used to predict the fall. The efficiency improvement is evidenced by the scripted and unscripted lateral fall experiments, involving five young healthy volunteers (three males and two females; age: 23.3 ± 1 years). The classification of fall and activities of daily living (ADL) achieve recall, precision and F-value of 93.1%, 95.9%, and 94.5%, respectively, and the system detects all falls during the extended unscripted trials. The experimental results indicate that the complementary movement information coming from three accelerometers can enhance the performance of fall detection. The proposed method is efficient, and it has remarkable improvements in comparison to the method of using one or two accelerometers.

Download Full-text

Model-Free Attitude Control of Spacecraft Based on PID-Guide TD3 Algorithm

International Journal of Aerospace Engineering ◽

10.1155/2020/8874619 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

ZhiBin Zhang ◽

XinHong Li ◽

JiPing An ◽

WanXin Man ◽

GuoHui Zhang

Keyword(s):

Reinforcement Learning ◽

High Precision ◽

Attitude Control ◽

Response Speed ◽

Fast Response ◽

Fine Tuning ◽

Training Time ◽

Model Free ◽

Attitude Tracking ◽

Policy Gradient

This paper is devoted to model-free attitude control of rigid spacecraft in the presence of control torque saturation and external disturbances. Specifically, a model-free deep reinforcement learning (DRL) controller is proposed, which can learn continuously according to the feedback of the environment and realize the high-precision attitude control of spacecraft without repeatedly adjusting the controller parameters. Considering the continuity of state space and action space, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm based on actor-critic architecture is adopted. Compared with the Deep Deterministic Policy Gradient (DDPG) algorithm, TD3 has better performance. TD3 obtains the optimal policy by interacting with the environment without using any prior knowledge, so the learning process is time-consuming. Aiming at this problem, the PID-Guide TD3 algorithm is proposed, which can speed up the training speed and improve the convergence precision of the TD3 algorithm. Aiming at the problem that reinforcement learning (RL) is difficult to deploy in the actual environment, the pretraining/fine-tuning method is proposed for deployment, which can not only save training time and computing resources but also achieve good results quickly. The experimental results show that DRL controller can realize high-precision attitude stabilization and attitude tracking control, with fast response speed and small overshoot. The proposed PID-Guide TD3 algorithm has faster training speed and higher stability than the TD3 algorithm.

Download Full-text

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3452008 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1-21

Author(s):

Shilei Li ◽

Meng Li ◽

Jiongming Su ◽

Shaofei Chen ◽

Zhimin Yuan ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Gradient Methods ◽

Action Space ◽

Fine Tuning ◽

Continuous Control ◽

Parametric Perturbation ◽

Gradient Information ◽

Policy Gradient ◽

Gradient Based

Efficient and stable exploration remains a key challenge for deep reinforcement learning (DRL) operating in high-dimensional action and state spaces. Recently, a more promising approach by combining the exploration in the action space with the exploration in the parameters space has been proposed to get the best of both methods. In this article, we propose a new iterative and close-loop framework by combining the evolutionary algorithm (EA), which does explorations in a gradient-free manner directly in the parameters space with an actor-critic, and the deep deterministic policy gradient (DDPG) reinforcement learning algorithm, which does explorations in a gradient-based manner in the action space to make these two methods cooperate in a more balanced and efficient way. In our framework, the policies represented by the EA population (the parametric perturbation part) can evolve in a guided manner by utilizing the gradient information provided by the DDPG and the policy gradient part (DDPG) is used only as a fine-tuning tool for the best individual in the EA population to improve the sample efficiency. In particular, we propose a criterion to determine the training steps required for the DDPG to ensure that useful gradient information can be generated from the EA generated samples and the DDPG and EA part can work together in a more balanced way during each generation. Furthermore, within the DDPG part, our algorithm can flexibly switch between fine-tuning the same previous RL-Actor and fine-tuning a new one generated by the EA according to different situations to further improve the efficiency. Experiments on a range of challenging continuous control benchmarks demonstrate that our algorithm outperforms related works and offers a satisfactory trade-off between stability and sample efficiency.

Download Full-text