Mixed matrix factorization: a novel algorithm for the extraction of kinematic-muscular synergies

Synergistic models have been employed to investigate motor coordination separately in the muscular and kinematic domains. However, the relationship between muscle synergies, constrained to be non-negative, and kinematic synergies, whose elements can be positive and negative, has received limited attention. Existing algorithms for extracting synergies from combined kinematic and muscular data either do not enforce non-negativity constraints or separate non-negative variables into positive and negative components. We propose a mixed matrix factorization (MMF) algorithm based on a gradient descent update rule which overcomes these limitations. It allows to directly assess the relationship between kinematic and muscle activity variables, by enforcing the non-negativity constrain on a subset of variables. We validated the algorithm on simulated kinematic-muscular data generated from known spatial synergies and temporal coefficients, by evaluating the similarity between extracted and ground truth synergies and temporal coefficients when the data are corrupted by different noise levels. We also compared the performance of MMF to that of non-negative matrix factorization applied to separate positive and negative components (NMFpn). Finally, we factorized kinematic and EMG data collected during upper-limb movements to demonstrate the potential of the algorithm. MMF achieved almost perfect reconstruction on noiseless simulated data. It performed better than NMFpn in recovering the correct spatial synergies and temporal coefficients with noisy simulated data. It also allowed to correctly select the original number of ground truth synergies. We showed meaningful applicability to real data; MMF can also be applied to any multivariate data that contains both non-negative and unconstrained variables.

Download Full-text

Mixed matrix factorization: a novel algorithm for the extraction of kinematic-muscular synergies

10.1101/2021.08.05.455189 ◽

2021 ◽

Author(s):

Alessandro Scano ◽

Robert Mihai Mira ◽

Andrea d'Avella

Keyword(s):

Matrix Factorization ◽

Motor Coordination ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Perfect Reconstruction ◽

Limited Attention ◽

Mixed Matrix ◽

Upper Limb Movements ◽

The Relationship

Synergistic models have been employed to investigate motor coordination separately in the muscular and kinematic domains. However, the relationship between muscle synergies, constrained to be non-negative, and kinematic synergies, whose elements can be positive and negative, has received limited attention. Existing algorithms for extracting synergies from combined kinematic and muscular data either do not enforce non-negativity constraints or separate non-negative variables into positive and negative components. We propose a mixed matrix factorization (MMF) algorithm based on a gradient descent update rule which overcomes these limitations. It directly assesses the relationship between kinematic and muscle activity variables, by enforcing the non-negativity constrain on a subset of variables. We validated the algorithm on simulated kinematic-muscular data generated from known spatial synergies and temporal coefficients, by assessing the similarity between extracted and ground truth synergies and temporal coefficients when the data are corrupted by different noise levels. We also compared the performance of MMF to that of non-negative matrix factorization applied to separate positive and negative components (NMFpn). Finally, we factorized kinematic and EMG data collected during upper-limb movements to demonstrate the potential of the algorithm. MMF achieved almost perfect reconstruction on noiseless simulated data. It performed better than NMFpn in recovering the correct spatial synergies and temporal coefficients with noisy simulated data. It allowed to correctly select the original number of ground truth synergies. We showed meaningful applicability to real data. MMF can also be applied to any multivariate data that contains both non-negative and unconstrained variables.

Download Full-text

Evaluating the reproducibility of single-cell gene regulatory network inference algorithms

10.1101/2020.11.10.375923 ◽

2020 ◽

Author(s):

Yoonjee Kang ◽

Denis Thieffry ◽

Laura Cantini

Keyword(s):

Single Cell ◽

Network Inference ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Gene Regulatory Network Inference ◽

Sequencing Platform ◽

Cell Network ◽

Inference Algorithms ◽

Inference Methods

AbstractNetworks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth.Here, we benchmark four single-cell network inference methods based on their reproducibility, i.e. their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis.GENIE3 results to be the most reproducible algorithm, independently from the single-cell sequencing platform, the cell type annotation system, the number of cells constituting the dataset, or the thresholding applied to the links of the inferred networks. In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.

Download Full-text

A Framework for the Objective Assessment of Registration Accuracy

International Journal of Biomedical Imaging ◽

10.1155/2014/128324 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Francesca Pizzorni Ferrarese ◽

Flavio Simonetti ◽

Roberto Israel Foroni ◽

Gloria Menegaz

Keyword(s):

Accuracy Assessment ◽

Objective Assessment ◽

Synthetic Data ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Magnetic Resonance Images ◽

Good Prediction ◽

Registration Accuracy ◽

Affine Registration

Validation and accuracy assessment are the main bottlenecks preventing the adoption of image processing algorithms in the clinical practice. In the classical approach, a posteriori analysis is performed through objective metrics. In this work, a different approach based on Petri nets is proposed. The basic idea consists in predicting the accuracy of a given pipeline based on the identification and characterization of the sources of inaccuracy. The concept is demonstrated on a case study: intrasubject rigid and affine registration of magnetic resonance images. Both synthetic and real data are considered. While synthetic data allow the benchmarking of the performance with respect to the ground truth, real data enable to assess the robustness of the methodology in real contexts as well as to determine the suitability of the use of synthetic data in the training phase. Results revealed a higher correlation and a lower dispersion among the metrics for simulated data, while the opposite trend was observed for pathologic ones. Results show that the proposed model not only provides a good prediction performance but also leads to the optimization of the end-to-end chain in terms of accuracy and robustness, setting the ground for its generalization to different and more complex scenarios.

Download Full-text

Single-Cell Transcriptome Profiling Simulation Reveals the Impact of Sequencing Parameters and Algorithms on Clustering

Life ◽

10.3390/life11070716 ◽

2021 ◽

Vol 11 (7) ◽

pp. 716

Author(s):

Yunhe Liu ◽

Aoshen Wu ◽

Xueqing Peng ◽

Xiaona Liu ◽

Gang Liu ◽

...

Keyword(s):

Clustering Algorithms ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Transcriptome Profiling ◽

Actual Data ◽

Generation Process ◽

Data Generation ◽

The Matrix ◽

The Impact

Despite the scRNA-seq analytic algorithms developed, their performance for cell clustering cannot be quantified due to the unknown “true” clusters. Referencing the transcriptomic heterogeneity of cell clusters, a “true” mRNA number matrix of cell individuals was defined as ground truth. Based on the matrix and the actual data generation procedure, a simulation program (SSCRNA) for raw data was developed. Subsequently, the consistency between simulated data and real data was evaluated. Furthermore, the impact of sequencing depth and algorithms for analyses on cluster accuracy was quantified. As a result, the simulation result was highly consistent with that of the actual data. Among the clustering algorithms, the Gaussian normalization method was the more recommended. As for the clustering algorithms, the K-means clustering method was more stable than K-means plus Louvain clustering. In conclusion, the scRNA simulation algorithm developed restores the actual data generation process, discovers the impact of parameters on classification, compares the normalization/clustering algorithms, and provides novel insight into scRNA analyses.

Download Full-text

Intersubject MVPD: Empirical Comparison of fMRI Denoising Methods for Connectivity Analysis

10.1101/456970 ◽

2018 ◽

Author(s):

Yichen Li ◽

Rebecca Saxe ◽

Stefano Anzellotti

Keyword(s):

Brain Region ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Brain Regions ◽

Fmri Data ◽

Statistical Dependence ◽

The Real ◽

The Difference ◽

Multivariate Pattern

AbstractNoise is a major challenge for the analysis of fMRI data in general and for connectivity analyses in particular. As researchers develop increasingly sophisticated tools to model statistical dependence between the fMRI signal in different brain regions, there is a risk that these models may increasingly capture artifactual relationships between regions, that are the result of noise. Thus, choosing optimal denoising methods is a crucial step to maximize the accuracy and reproducibility of connectivity models. Most comparisons between denoising methods require knowledge of the ground truth: of what is the ‘real signal’. For this reason, they are usually based on simulated fMRI data. However, simulated data may not match the statistical properties of real data, limiting the generalizability of the conclusions. In this article, we propose an approach to evaluate denoising methods using real (non-simulated) fMRI data. First, we introduce an intersubject version of multivariate pattern dependence (iMVPD) that computes the statistical dependence between a brain region in one participant, and another brain region in a different participant. iMVPD has the following advantages: 1) it is multivariate, 2) it trains and tests models on independent folds of the real fMRI data, and 3) it generates predictions that are both between subjects and between regions. Since whole-brain sources of noise are more strongly correlated within subject than between subjects, we can use the difference between standard MVPD and iMVPD as a ‘discrepancy metric’ to evaluate denoising techniques (where more effective techniques should yield smaller differences). As predicted, the difference is the greatest in the absence of denoising methods. Furthermore, a combination of removal of the global signal and CompCorr optimizes denoising (among the set of denoising options tested).

Download Full-text

GenEpi: Gene-based Epistasis Discovery Using Machine Learning

10.1101/421719 ◽

2018 ◽

Author(s):

Yu-Chuan Chang ◽

June-Tai Wu ◽

Ming-Yi Hong ◽

Yi-An Tung ◽

Ping-Han Hsieh ◽

...

Keyword(s):

Machine Learning ◽

Genetic Variants ◽

Prediction Models ◽

Association Studies ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Genome Wide Association Studies ◽

Combinatorial Encoding ◽

Python Package

AbstractGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer’s disease (AD). In this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power. Availability: GenEpi is an open-source python package and available free of charge only for non-commercial users. The package can be downloaded from https://github.com/Chester75321/GenEpi, and has also been published on The Python Package Index.

Download Full-text

LC-N2G: a local consistency approach for nutrigenomics data analysis

BMC Bioinformatics ◽

10.1186/s12859-020-03861-3 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Xiangnan Xu ◽

Samantha M. Solon-Biet ◽

Alistair Senior ◽

David Raubenheimer ◽

Stephen J. Simpson ◽

...

Keyword(s):

Gene Expression ◽

Permutation Test ◽

Simulated Data ◽

Real Data ◽

Response Surfaces ◽

Local Consistency ◽

Model Free ◽

Complex Interactions ◽

Novel Approach ◽

The Relationship

Abstract Background Nutrigenomics aims at understanding the interaction between nutrition and gene information. Due to the complex interactions of nutrients and genes, their relationship exhibits non-linearity. One of the most effective and efficient methods to explore their relationship is the nutritional geometry framework which fits a response surface for the gene expression over two prespecified nutrition variables. However, when the number of nutrients involved is large, it is challenging to find combinations of informative nutrients with respect to a certain gene and to test whether the relationship is stronger than chance. Methods for identifying informative combinations are essential to understanding the relationship between nutrients and genes. Results We introduce Local Consistency Nutrition to Graphics (LC-N2G), a novel approach for ranking and identifying combinations of nutrients with gene expression. In LC-N2G, we first propose a model-free quantity called Local Consistency statistic to measure whether there is non-random relationship between combinations of nutrients and gene expression measurements based on (1) the similarity between samples in the nutrient space and (2) their difference in gene expression. Then combinations with small LC are selected and a permutation test is performed to evaluate their significance. Finally, the response surfaces are generated for the subset of significant relationships. Evaluation on simulated data and real data shows the LC-N2G can accurately find combinations that are correlated with gene expression. Conclusion The LC-N2G is practically powerful for identifying the informative nutrition variables correlated with gene expression. Therefore, LC-N2G is important in the area of nutrigenomics for understanding the relationship between nutrition and gene expression information.

Download Full-text

SEED-G: Simulated EEG Data Generator for Testing Connectivity Algorithms

Sensors ◽

10.3390/s21113632 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3632

Author(s):

Alessandra Anzolin ◽

Jlenia Toppi ◽

Manuela Petti ◽

Febo Cincotti ◽

Laura Astolfi

Keyword(s):

Time Series ◽

Brain Activity ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Connectivity Pattern ◽

Data Simulation ◽

Eeg Data ◽

Data Generator ◽

Wide Range

EEG signals are widely used to estimate brain circuits associated with specific tasks and cognitive processes. The testing of connectivity estimators is still an open issue because of the lack of a ground-truth in real data. Existing solutions such as the generation of simulated data based on a manually imposed connectivity pattern or mass oscillators can model only a few real cases with limited number of signals and spectral properties that do not reflect those of real brain activity. Furthermore, the generation of time series reproducing non-ideal and non-stationary ground-truth models is still missing. In this work, we present the SEED-G toolbox for the generation of pseudo-EEG data with imposed connectivity patterns, overcoming the existing limitations and enabling control of several parameters for data simulation according to the user’s needs. We first described the toolbox including guidelines for its correct use and then we tested its performances showing how, in a wide range of conditions, datasets composed by up to 60 time series were successfully generated in less than 5 s and with spectral features similar to real data. Then, SEED-G is employed for studying the effect of inter-trial variability Partial Directed Coherence (PDC) estimates, confirming its robustness.

Download Full-text

Leeway Prediction of Oceanic Disastrous Target via Support Vector Regression

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2004.p0606 ◽

2004 ◽

Vol 8 (6) ◽

pp. 606-612 ◽

Cited By ~ 2

Author(s):

Nipon Theera-Umpon ◽

◽

Udomsak Boonprasert ◽

Keyword(s):

Support Vector Regression ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Ocean Model ◽

Search And Rescue ◽

Support Vector ◽

Good Prediction ◽

Rescue Operation ◽

Princeton Ocean Model

This paper demonstrates an application of support vector machine (SVM) to the oceanic disasters search and rescue operation. The support vector regression (SVR) for system identification of a nonlinear black-box model is utilized in this research. The SVR-based ocean model helps the search and rescue unit by predicting the disastrous target’s position at any given time instant. The closer the predicted location to the actual location would shorten the searching time and minimize the loss. One of the most popular ocean models, namely the Princeton ocean model, is applied to provide the ground truth of the target leeway. From the experiments, the results on the simulated data show that the proposed SVR-based ocean model provides a good prediction compared to the Princeton ocean model. Moreover, the experimental results on the real data collected by the Royal Thai Navy also show that the proposed model can be used as an auxiliary tool in the search and rescue operation.

Download Full-text

ei.Datasets: Real Data Sets for Assessing Ecological Inference Algorithms

Social Science Computer Review ◽

10.1177/08944393211040808 ◽

2021 ◽

pp. 089443932110408

Author(s):

Jose M. Pavía

Keyword(s):

Simulated Data ◽

Ground Truth ◽

Real Data ◽

R Package ◽

Data Sets ◽

Ecological Inference ◽

Inference Models ◽

Individual Level ◽

Inference Algorithms ◽

Cross Classification

Ecological inference models aim to infer individual-level relationships using aggregate data. They are routinely used to estimate voter transitions between elections, disclose split-ticket voting behaviors, or infer racial voting patterns in U.S. elections. A large number of procedures have been proposed in the literature to solve these problems; therefore, an assessment and comparison of them are overdue. The secret ballot however makes this a difficult endeavor since real individual data are usually not accessible. The most recent work on ecological inference has assessed methods using a very small number of data sets with ground truth, combined with artificial, simulated data. This article dramatically increases the number of real instances by presenting a unique database (available in the R package ei.Datasets) composed of data from more than 550 elections where the true inner-cell values of the global cross-classification tables are known. The article describes how the data sets are organized, details the data curation and data wrangling processes performed, and analyses the main features characterizing the different data sets.

Download Full-text