scholarly journals Mixed matrix factorization: a novel algorithm for the extraction of kinematic-muscular synergies

Author(s):  
Alessandro Scano ◽  
Robert Mihai Mira ◽  
Andrea d'Avella

Synergistic models have been employed to investigate motor coordination separately in the muscular and kinematic domains. However, the relationship between muscle synergies, constrained to be non-negative, and kinematic synergies, whose elements can be positive and negative, has received limited attention. Existing algorithms for extracting synergies from combined kinematic and muscular data either do not enforce non-negativity constraints or separate non-negative variables into positive and negative components. We propose a mixed matrix factorization (MMF) algorithm based on a gradient descent update rule which overcomes these limitations. It allows to directly assess the relationship between kinematic and muscle activity variables, by enforcing the non-negativity constrain on a subset of variables. We validated the algorithm on simulated kinematic-muscular data generated from known spatial synergies and temporal coefficients, by evaluating the similarity between extracted and ground truth synergies and temporal coefficients when the data are corrupted by different noise levels. We also compared the performance of MMF to that of non-negative matrix factorization applied to separate positive and negative components (NMFpn). Finally, we factorized kinematic and EMG data collected during upper-limb movements to demonstrate the potential of the algorithm. MMF achieved almost perfect reconstruction on noiseless simulated data. It performed better than NMFpn in recovering the correct spatial synergies and temporal coefficients with noisy simulated data. It also allowed to correctly select the original number of ground truth synergies. We showed meaningful applicability to real data; MMF can also be applied to any multivariate data that contains both non-negative and unconstrained variables.

2021 ◽  
Author(s):  
Alessandro Scano ◽  
Robert Mihai Mira ◽  
Andrea d'Avella

Synergistic models have been employed to investigate motor coordination separately in the muscular and kinematic domains. However, the relationship between muscle synergies, constrained to be non-negative, and kinematic synergies, whose elements can be positive and negative, has received limited attention. Existing algorithms for extracting synergies from combined kinematic and muscular data either do not enforce non-negativity constraints or separate non-negative variables into positive and negative components. We propose a mixed matrix factorization (MMF) algorithm based on a gradient descent update rule which overcomes these limitations. It directly assesses the relationship between kinematic and muscle activity variables, by enforcing the non-negativity constrain on a subset of variables. We validated the algorithm on simulated kinematic-muscular data generated from known spatial synergies and temporal coefficients, by assessing the similarity between extracted and ground truth synergies and temporal coefficients when the data are corrupted by different noise levels. We also compared the performance of MMF to that of non-negative matrix factorization applied to separate positive and negative components (NMFpn). Finally, we factorized kinematic and EMG data collected during upper-limb movements to demonstrate the potential of the algorithm. MMF achieved almost perfect reconstruction on noiseless simulated data. It performed better than NMFpn in recovering the correct spatial synergies and temporal coefficients with noisy simulated data. It allowed to correctly select the original number of ground truth synergies. We showed meaningful applicability to real data. MMF can also be applied to any multivariate data that contains both non-negative and unconstrained variables.


2020 ◽  
Author(s):  
Yoonjee Kang ◽  
Denis Thieffry ◽  
Laura Cantini

AbstractNetworks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth.Here, we benchmark four single-cell network inference methods based on their reproducibility, i.e. their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis.GENIE3 results to be the most reproducible algorithm, independently from the single-cell sequencing platform, the cell type annotation system, the number of cells constituting the dataset, or the thresholding applied to the links of the inferred networks. In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Francesca Pizzorni Ferrarese ◽  
Flavio Simonetti ◽  
Roberto Israel Foroni ◽  
Gloria Menegaz

Validation and accuracy assessment are the main bottlenecks preventing the adoption of image processing algorithms in the clinical practice. In the classical approach, a posteriori analysis is performed through objective metrics. In this work, a different approach based on Petri nets is proposed. The basic idea consists in predicting the accuracy of a given pipeline based on the identification and characterization of the sources of inaccuracy. The concept is demonstrated on a case study: intrasubject rigid and affine registration of magnetic resonance images. Both synthetic and real data are considered. While synthetic data allow the benchmarking of the performance with respect to the ground truth, real data enable to assess the robustness of the methodology in real contexts as well as to determine the suitability of the use of synthetic data in the training phase. Results revealed a higher correlation and a lower dispersion among the metrics for simulated data, while the opposite trend was observed for pathologic ones. Results show that the proposed model not only provides a good prediction performance but also leads to the optimization of the end-to-end chain in terms of accuracy and robustness, setting the ground for its generalization to different and more complex scenarios.


Life ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 716
Author(s):  
Yunhe Liu ◽  
Aoshen Wu ◽  
Xueqing Peng ◽  
Xiaona Liu ◽  
Gang Liu ◽  
...  

Despite the scRNA-seq analytic algorithms developed, their performance for cell clustering cannot be quantified due to the unknown “true” clusters. Referencing the transcriptomic heterogeneity of cell clusters, a “true” mRNA number matrix of cell individuals was defined as ground truth. Based on the matrix and the actual data generation procedure, a simulation program (SSCRNA) for raw data was developed. Subsequently, the consistency between simulated data and real data was evaluated. Furthermore, the impact of sequencing depth and algorithms for analyses on cluster accuracy was quantified. As a result, the simulation result was highly consistent with that of the actual data. Among the clustering algorithms, the Gaussian normalization method was the more recommended. As for the clustering algorithms, the K-means clustering method was more stable than K-means plus Louvain clustering. In conclusion, the scRNA simulation algorithm developed restores the actual data generation process, discovers the impact of parameters on classification, compares the normalization/clustering algorithms, and provides novel insight into scRNA analyses.


2018 ◽  
Author(s):  
Yichen Li ◽  
Rebecca Saxe ◽  
Stefano Anzellotti

AbstractNoise is a major challenge for the analysis of fMRI data in general and for connectivity analyses in particular. As researchers develop increasingly sophisticated tools to model statistical dependence between the fMRI signal in different brain regions, there is a risk that these models may increasingly capture artifactual relationships between regions, that are the result of noise. Thus, choosing optimal denoising methods is a crucial step to maximize the accuracy and reproducibility of connectivity models. Most comparisons between denoising methods require knowledge of the ground truth: of what is the ‘real signal’. For this reason, they are usually based on simulated fMRI data. However, simulated data may not match the statistical properties of real data, limiting the generalizability of the conclusions. In this article, we propose an approach to evaluate denoising methods using real (non-simulated) fMRI data. First, we introduce an intersubject version of multivariate pattern dependence (iMVPD) that computes the statistical dependence between a brain region in one participant, and another brain region in a different participant. iMVPD has the following advantages: 1) it is multivariate, 2) it trains and tests models on independent folds of the real fMRI data, and 3) it generates predictions that are both between subjects and between regions. Since whole-brain sources of noise are more strongly correlated within subject than between subjects, we can use the difference between standard MVPD and iMVPD as a ‘discrepancy metric’ to evaluate denoising techniques (where more effective techniques should yield smaller differences). As predicted, the difference is the greatest in the absence of denoising methods. Furthermore, a combination of removal of the global signal and CompCorr optimizes denoising (among the set of denoising options tested).


2018 ◽  
Author(s):  
Yu-Chuan Chang ◽  
June-Tai Wu ◽  
Ming-Yi Hong ◽  
Yi-An Tung ◽  
Ping-Han Hsieh ◽  
...  

AbstractGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer’s disease (AD). In this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power. Availability: GenEpi is an open-source python package and available free of charge only for non-commercial users. The package can be downloaded from https://github.com/Chester75321/GenEpi, and has also been published on The Python Package Index.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xiangnan Xu ◽  
Samantha M. Solon-Biet ◽  
Alistair Senior ◽  
David Raubenheimer ◽  
Stephen J. Simpson ◽  
...  

Abstract Background Nutrigenomics aims at understanding the interaction between nutrition and gene information. Due to the complex interactions of nutrients and genes, their relationship exhibits non-linearity. One of the most effective and efficient methods to explore their relationship is the nutritional geometry framework which fits a response surface for the gene expression over two prespecified nutrition variables. However, when the number of nutrients involved is large, it is challenging to find combinations of informative nutrients with respect to a certain gene and to test whether the relationship is stronger than chance. Methods for identifying informative combinations are essential to understanding the relationship between nutrients and genes. Results We introduce Local Consistency Nutrition to Graphics (LC-N2G), a novel approach for ranking and identifying combinations of nutrients with gene expression. In LC-N2G, we first propose a model-free quantity called Local Consistency statistic to measure whether there is non-random relationship between combinations of nutrients and gene expression measurements based on (1) the similarity between samples in the nutrient space and (2) their difference in gene expression. Then combinations with small LC are selected and a permutation test is performed to evaluate their significance. Finally, the response surfaces are generated for the subset of significant relationships. Evaluation on simulated data and real data shows the LC-N2G can accurately find combinations that are correlated with gene expression. Conclusion The LC-N2G is practically powerful for identifying the informative nutrition variables correlated with gene expression. Therefore, LC-N2G is important in the area of nutrigenomics for understanding the relationship between nutrition and gene expression information.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3632
Author(s):  
Alessandra Anzolin ◽  
Jlenia Toppi ◽  
Manuela Petti ◽  
Febo Cincotti ◽  
Laura Astolfi

EEG signals are widely used to estimate brain circuits associated with specific tasks and cognitive processes. The testing of connectivity estimators is still an open issue because of the lack of a ground-truth in real data. Existing solutions such as the generation of simulated data based on a manually imposed connectivity pattern or mass oscillators can model only a few real cases with limited number of signals and spectral properties that do not reflect those of real brain activity. Furthermore, the generation of time series reproducing non-ideal and non-stationary ground-truth models is still missing. In this work, we present the SEED-G toolbox for the generation of pseudo-EEG data with imposed connectivity patterns, overcoming the existing limitations and enabling control of several parameters for data simulation according to the user’s needs. We first described the toolbox including guidelines for its correct use and then we tested its performances showing how, in a wide range of conditions, datasets composed by up to 60 time series were successfully generated in less than 5 s and with spectral features similar to real data. Then, SEED-G is employed for studying the effect of inter-trial variability Partial Directed Coherence (PDC) estimates, confirming its robustness.


Author(s):  
Nipon Theera-Umpon ◽  
◽  
Udomsak Boonprasert ◽  

This paper demonstrates an application of support vector machine (SVM) to the oceanic disasters search and rescue operation. The support vector regression (SVR) for system identification of a nonlinear black-box model is utilized in this research. The SVR-based ocean model helps the search and rescue unit by predicting the disastrous target’s position at any given time instant. The closer the predicted location to the actual location would shorten the searching time and minimize the loss. One of the most popular ocean models, namely the Princeton ocean model, is applied to provide the ground truth of the target leeway. From the experiments, the results on the simulated data show that the proposed SVR-based ocean model provides a good prediction compared to the Princeton ocean model. Moreover, the experimental results on the real data collected by the Royal Thai Navy also show that the proposed model can be used as an auxiliary tool in the search and rescue operation.


2021 ◽  
pp. 089443932110408
Author(s):  
Jose M. Pavía

Ecological inference models aim to infer individual-level relationships using aggregate data. They are routinely used to estimate voter transitions between elections, disclose split-ticket voting behaviors, or infer racial voting patterns in U.S. elections. A large number of procedures have been proposed in the literature to solve these problems; therefore, an assessment and comparison of them are overdue. The secret ballot however makes this a difficult endeavor since real individual data are usually not accessible. The most recent work on ecological inference has assessed methods using a very small number of data sets with ground truth, combined with artificial, simulated data. This article dramatically increases the number of real instances by presenting a unique database (available in the R package ei.Datasets) composed of data from more than 550 elections where the true inner-cell values of the global cross-classification tables are known. The article describes how the data sets are organized, details the data curation and data wrangling processes performed, and analyses the main features characterizing the different data sets.


Sign in / Sign up

Export Citation Format

Share Document