KSM Based Machine Learning for Markerless Motion Capture

Author(s):  
Therdsak Tangkuampien ◽  
David Suter

A marker-less motion capture system, based on machine learning, is proposed and tested. Pose information is inferred from images captured from multiple (as few as two) synchronized cameras. The central concept of which, we call: Kernel Subspace Mapping (KSM). The images-to-pose learning could be done with large numbers of images of a large variety of people (and with the ground truth poses accurately known). Of course, obtaining the ground-truth poses could be problematic. Here we choose to use synthetic data (both for learning and for, at least some of, testing). The system needs to generalizes well to novel inputs:unseen poses (not in the training database) and unseen actors. For the learning we use a generic and relatively low fidelity computer graphic model and for testing we sometimes use a more accurate model (made to resemble the first author). What makes machine learning viable for human motion capture is that a high percentage of human motion is coordinated. Indeed, it is now relatively well known that there is large redundancy in the set of possible images of a human (these images form som sort of relatively smooth lower dimensional manifold in the huge dimensional space of all possible images) and in the set of pose angles (again, a low dimensional and smooth sub-manifold of the moderately high dimensional space of all possible joint angles). KSM, is based on the KPCA (Kernel PCA) algorithm, which is costly. We show that the Greedy Kernel PCA (GKPCA) algorithm can be used to speed up KSM, with relatively minor modifications. At the core, then, is two KPCA’s (or two GKPCA’s) - one for the learning of pose manifold and one for the learning image manifold. Then we use a modification of Local Linear Embedding (LLE) to bridge between pose and image manifolds.

Author(s):  
Diana Mateus ◽  
Christian Wachinger ◽  
Selen Atasoy ◽  
Loren Schwarz ◽  
Nassir Navab

Computer aided diagnosis is often confronted with processing and analyzing high dimensional data. One alternative to deal with such data is dimensionality reduction. This chapter focuses on manifold learning methods to create low dimensional data representations adapted to a given application. From pairwise non-linear relations between neighboring data-points, manifold learning algorithms first approximate the low dimensional manifold where data lives with a graph; then, they find a non-linear map to embed this graph into a low dimensional space. Since the explicit pairwise relations and the neighborhood system can be designed according to the application, manifold learning methods are very flexible and allow easy incorporation of domain knowledge. The authors describe different assumptions and design elements that are crucial to building successful low dimensional data representations with manifold learning for a variety of applications. In particular, they discuss examples for visualization, clustering, classification, registration, and human-motion modeling.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1801 ◽  
Author(s):  
Haitao Guo ◽  
Yunsick Sung

The importance of estimating human movement has increased in the field of human motion capture. HTC VIVE is a popular device that provides a convenient way of capturing human motions using several sensors. Recently, the motion of only users’ hands has been captured, thereby greatly reducing the range of motion captured. This paper proposes a framework to estimate single-arm orientations using soft sensors mainly by combining a Bi-long short-term memory (Bi-LSTM) and two-layer LSTM. Positions of the two hands are measured using an HTC VIVE set, and the orientations of a single arm, including its corresponding upper arm and forearm, are estimated using the proposed framework based on the estimated positions of the two hands. Given that the proposed framework is meant for a single arm, if orientations of two arms are required to be estimated, the estimations are performed twice. To obtain the ground truth of the orientations of single-arm movements, two Myo gesture-control sensory armbands are employed on the single arm: one for the upper arm and the other for the forearm. The proposed framework analyzed the contextual features of consecutive sensory arm movements, which provides an efficient way to improve the accuracy of arm movement estimation. In comparison with the ground truth, the proposed method estimated the arm movements using a dynamic time warping distance, which was the average of 73.90% less than that of a conventional Bayesian framework. The distinct feature of our proposed framework is that the number of sensors attached to end-users is reduced. Additionally, with the use of our framework, the arm orientations can be estimated with any soft sensor, and good accuracy of the estimations can be ensured. Another contribution is the suggestion of the combination of the Bi-LSTM and two-layer LSTM.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6933
Author(s):  
Georgios Giarmatzis ◽  
Evangelia I. Zacharaki ◽  
Konstantinos Moustakas

Conventional biomechanical modelling approaches involve the solution of large systems of equations that encode the complex mathematical representation of human motion and skeletal structure. To improve stability and computational speed, being a common bottleneck in current approaches, we apply machine learning to train surrogate models and to predict in near real-time, previously calculated medial and lateral knee contact forces (KCFs) of 54 young and elderly participants during treadmill walking in a speed range of 3 to 7 km/h. Predictions are obtained by fusing optical motion capture and musculoskeletal modeling-derived kinematic and force variables, into regression models using artificial neural networks (ANNs) and support vector regression (SVR). Training schemes included either data from all subjects (LeaveTrialsOut) or only from a portion of them (LeaveSubjectsOut), in combination with inclusion of ground reaction forces (GRFs) in the dataset or not. Results identify ANNs as the best-performing predictor of KCFs, both in terms of Pearson R (0.89–0.98 for LeaveTrialsOut and 0.45–0.85 for LeaveSubjectsOut) and percentage normalized root mean square error (0.67–2.35 for LeaveTrialsOut and 1.6–5.39 for LeaveSubjectsOut). When GRFs were omitted from the dataset, no substantial decrease in prediction power of both models was observed. Our findings showcase the strength of ANNs to predict simultaneously multi-component KCF during walking at different speeds—even in the absence of GRFs—particularly applicable in real-time applications that make use of knee loading conditions to guide and treat patients.


2015 ◽  
Vol 2015 ◽  
pp. 1-21
Author(s):  
Wanyi Li ◽  
Jifeng Sun

This paper proposes a novel algorithm called low dimensional space incremental learning (LDSIL) to estimate the human motion in 3D from the silhouettes of human motion multiview images. The proposed algorithm takes the advantage of stochastic extremum memory adaptive searching (SEMAS) and incremental probabilistic dimension reduction model (IPDRM) to collect new high dimensional data samples. The high dimensional data samples can be selected to update the mapping from low dimensional space to high dimensional space, so that incremental learning can be achieved to estimate human motion from small amount of samples. Compared with three traditional algorithms, the proposed algorithm can make human motion estimation achieve a good performance in disambiguating silhouettes, overcoming the transient occlusion, and reducing estimation error.


Author(s):  
MATHIAS FONTMARTY ◽  
PATRICK DANÈS ◽  
FRÉDÉRIC LERASLE

This paper presents a thorough study of some particle filter (PF) strategies dedicated to human motion capture from a trinocular vision surveillance setup. An experimental procedure is used, based on a commercial motion capture ring to provide ground truth. Metrics are proposed to assess performances in terms of accuracy, robustness, but also estimator dispersion which is often neglected elsewhere. Relative performances are discussed through some quantitative and qualitative evaluations on a video database. PF strategies based on Quasi Monte Carlo sampling, a scheme which is surprisingly seldom exploited in the Vision community, provide an interesting way to explore. Future works are finally discussed.


2011 ◽  
Vol 201-203 ◽  
pp. 2517-2520
Author(s):  
Sen Xu ◽  
Tian Zhou ◽  
Hua Long Yu

Clustering combination has recently become a hotspot in machine learning, while its critical problem lies on how to combine multiple clusterings to yield a final superior result. In this paper, a low dimensional embedding method is proposed. It first obtains the low dimensional embeddings of hyperedges by performing spectral clustering algorithms and then obtains the low dimensional embeddings of objects indirectly by composition of mappings and finally performs K-means algorithm to cluster the objects according to their coordinates in the low dimensional space. Experimentally the proposed method is shown to perform well.


2018 ◽  
Author(s):  
Jacqueline B. Hynes ◽  
David M. Brandman ◽  
Jonas B. Zimmerman ◽  
John P. Donoghue ◽  
Carlos E. Vargas-Irwin

AbstractRecent technological advances have made it possible to simultaneously record the activity of thousands of individual neurons in the cortex of awake behaving animals. However, the comparatively slower development of analytical tools capable of handling the scale and complexity of large-scale recordings is a growing problem for the field of neuroscience. We present the Similarity Networks (SIMNETS) algorithm: a computationally efficient and scalable method for identifying and visualizing sub-networks of functionally similar neurons within larger simultaneously recorded ensembles. While traditional approaches tend to group neurons according to the statistical similarities of inter-neuron spike patterns, our approach begins by mathematically capturing the intrinsic relationship between the spike train outputs of each neuron across experimental conditions, before any comparisons are made between neurons. This strategy estimates the intrinsic geometry of each neuron’s output space, allowing us to capture the information processing properties of each neuron in a common format that is easily compared between neurons. Dimensionality reduction tools are then used to map high-dimensional neuron similarity vectors into a low-dimensional space where functional groupings are identified using clustering and statistical techniques. SIMNETS makes minimal assumptions about single neuron encoding properties; is efficient enough to run on consumer-grade hardware (100 neurons < 4s run-time); and has a computational complexity that scales near-linearly with neuron number. These properties make SIMNETS well-suited for examining large networks of neurons during complex behaviors. We validate the ability of our approach for detecting statistically and physiologically meaningful functional groupings in a population of synthetic neurons with known ground-truth, as well three publicly available datasets of ensemble recordings from primate primary visual and motor cortex and the rat hippocampal CA1 region.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Wanyi Li ◽  
Yuqi Zeng ◽  
Qian Zhang ◽  
Yilin Wu ◽  
Guoming Chen

Three-dimensional (3D) human motion capture is a hot researching topic at present. The network becomes advanced nowadays, the appearance of 3D human motion is indispensable in the multimedia works, such as image, video, and game. 3D human motion plays an important role in the publication and expression of all kinds of medium. How to capture the 3D human motion is the key technology of multimedia product. Therefore, a new algorithm called incremental dimension reduction and projection position optimization (IDRPPO) is proposed in this paper. This algorithm can help to learn sparse 3D human motion samples and generate the new ones. Thus, it can provide the technique for making 3D character animation. By taking advantage of the Gaussian incremental dimension reduction model (GIDRM) and projection position optimization, the proposed algorithm can learn the existing samples and establish the relevant mapping between the low dimensional (LD) data and the high dimensional (HD) data. Finally, the missing frames of input 3D human motion and the other type of 3D human motion can be generated by the IDRPPO.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0253157
Author(s):  
Saeed Ghorbani ◽  
Kimia Mahdaviani ◽  
Anne Thaler ◽  
Konrad Kording ◽  
Douglas James Cook ◽  
...  

Large high-quality datasets of human body shape and kinematics lay the foundation for modelling and simulation approaches in computer vision, computer graphics, and biomechanics. Creating datasets that combine naturalistic recordings with high-accuracy data about ground truth body shape and pose is challenging because different motion recording systems are either optimized for one or the other. We address this issue in our dataset by using different hardware systems to record partially overlapping information and synchronized data that lend themselves to transfer learning. This multimodal dataset contains 9 hours of optical motion capture data, 17 hours of video data from 4 different points of view recorded by stationary and hand-held cameras, and 6.6 hours of inertial measurement units data recorded from 60 female and 30 male actors performing a collection of 21 everyday actions and sports movements. The processed motion capture data is also available as realistic 3D human meshes. We anticipate use of this dataset for research on human pose estimation, action recognition, motion modelling, gait analysis, and body shape reconstruction.


Sign in / Sign up

Export Citation Format

Share Document