scholarly journals A sequential algorithm to detect diffusion switching along intracellular particle trajectories

2019 ◽  
Vol 36 (1) ◽  
pp. 317-329 ◽  
Author(s):  
Vincent Briane ◽  
Myriam Vimond ◽  
Cesar Augusto Valades-Cruz ◽  
Antoine Salomon ◽  
Christian Wunder ◽  
...  

Abstract Motivation Recent advances in molecular biology and fluorescence microscopy imaging have made possible the inference of the dynamics of single molecules in living cells. Changes of dynamics can occur along a trajectory. Then, an issue is to estimate the temporal change-points that is the times at which a change of dynamics occurs. The number of points in the trajectory required to detect such changes will depend on both the magnitude and type of the motion changes. Here, the number of points per trajectory is of the order of 102, even if in practice dramatic motion changes can be detected with less points. Results We propose a non-parametric procedure based on test statistics computed on local windows along the trajectory to detect the change-points. This algorithm controls the number of false change-point detections in the case where the trajectory is fully Brownian. We also develop a strategy for aggregating the detections obtained with different window sizes so that the window size is no longer a parameter to optimize. A Monte Carlo study is proposed to demonstrate the performances of the method and also to compare the procedure to two competitive algorithms. At the end, we illustrate the efficacy of the method on real data in 2D and 3D, depicting the motion of mRNA complexes—called mRNA-binding proteins—in neuronal dendrites, Galectin-3 endocytosis and trafficking within the cell. Availability and implementation A user-friendly Matlab package containing examples and the code of the simulations used in the paper is available at http://serpico.rennes.inria.fr/doku.php? id=software:cpanalysis:index. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Kodai Minoura ◽  
Ko Abe ◽  
Yuka Maeda ◽  
Hiroyoshi Nishikawa ◽  
Teppei Shimamura

Abstract Summary Recent advancements in high-dimensional single-cell technologies, such as mass cytometry, enable longitudinal experiments to track dynamics of cell populations and identify change points where the proportions vary significantly. However, current research is limited by the lack of tools specialized for analyzing longitudinal mass cytometry data. In order to infer cell population dynamics from such data, we developed a statistical framework named CYBERTRACK2.0. The framework’s analytic performance was validated against synthetic and real data, showing that its results are consistent with previous research. Availability and implementation CYBERTRACK2.0 is available at https://github.com/kodaim1115/CYBERTRACK2. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Alma Andersson ◽  
Joakim Lundeberg

Abstract Motivation Collection of spatial signals in large numbers has become a routine task in multiple omics-fields, but parsing of these rich datasets still pose certain challenges. In whole or near-full transcriptome spatial techniques, spurious expression profiles are intermixed with those exhibiting an organized structure. To distinguish profiles with spatial patterns from the background noise, a metric that enables quantification of spatial structure is desirable. Current methods designed for similar purposes tend to be built around a framework of statistical hypothesis testing, hence we were compelled to explore a fundamentally different strategy. Results We propose an unexplored approach to analyze spatial transcriptomics data, simulating diffusion of individual transcripts to extract genes with spatial patterns. The method performed as expected when presented with synthetic data. When applied to real data, it identified genes with distinct spatial profiles, involved in key biological processes or characteristic for certain cell types. Compared to existing methods, ours seemed to be less informed by the genes’ expression levels and showed better time performance when run with multiple cores. Availabilityand implementation Open-source Python package with a command line interface (CLI), freely available at https://github.com/almaan/sepal under an MIT licence. A mirror of the GitHub repository can be found at Zenodo, doi: 10.5281/zenodo.4573237. Supplementary information Supplementary data are available at Bioinformatics online.


Metrika ◽  
2021 ◽  
Author(s):  
Andreas Anastasiou ◽  
Piotr Fryzlewicz

AbstractWe introduce a new approach, called Isolate-Detect (ID), for the consistent estimation of the number and location of multiple generalized change-points in noisy data sequences. Examples of signal changes that ID can deal with are changes in the mean of a piecewise-constant signal and changes, continuous or not, in the linear trend. The number of change-points can increase with the sample size. Our method is based on an isolation technique, which prevents the consideration of intervals that contain more than one change-point. This isolation enhances ID’s accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. In ID, model selection is carried out via thresholding, or an information criterion, or SDLL, or a hybrid involving the former two. The hybrid model selection leads to a general method with very good practical performance and minimal parameter choice. In the scenarios tested, ID is at least as accurate as the state-of-the-art methods; most of the times it outperforms them. ID is implemented in the R packages IDetect and breakfast, available from CRAN.


2020 ◽  
Vol 36 (12) ◽  
pp. 3890-3891
Author(s):  
Linjie Wu ◽  
Han Wang ◽  
Yuchao Xia ◽  
Ruibin Xi

Abstract Motivation Whole-genome sequencing (WGS) is widely used for copy number variation (CNV) detection. However, for most bacteria, their circular genome structure and high replication rate make reads more enriched near the replication origin. CNV detection based on read depth could be seriously influenced by such replication bias. Results We show that the replication bias is widespread using ∼200 bacterial WGS data. We develop CNV-BAC (CNV-Bacteria) that can properly normalize the replication bias and other known biases in bacterial WGS data and can accurately detect CNVs. Simulation and real data analysis show that CNV-BAC achieves the best performance in CNV detection compared with available algorithms. Availability and implementation CNV-BAC is available at https://github.com/XiDsLab/CNV-BAC. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2017-2024
Author(s):  
Weiwei Zhang ◽  
Ziyi Li ◽  
Nana Wei ◽  
Hua-Jun Wu ◽  
Xiaoqi Zheng

Abstract Motivation Inference of differentially methylated (DM) CpG sites between two groups of tumor samples with different geno- or pheno-types is a critical step to uncover the epigenetic mechanism of tumorigenesis, and identify biomarkers for cancer subtyping. However, as a major source of confounding factor, uneven distributions of tumor purity between two groups of tumor samples will lead to biased discovery of DM sites if not properly accounted for. Results We here propose InfiniumDM, a generalized least square model to adjust tumor purity effect for differential methylation analysis. Our method is applicable to a variety of experimental designs including with or without normal controls, different sources of normal tissue contaminations. We compared our method with conventional methods including minfi, limma and limma corrected by tumor purity using simulated datasets. Our method shows significantly better performance at different levels of differential methylation thresholds, sample sizes, mean purity deviations and so on. We also applied the proposed method to breast cancer samples from TCGA database to further evaluate its performance. Overall, both simulation and real data analyses demonstrate favorable performance over existing methods serving similar purpose. Availability and implementation InfiniumDM is a part of R package InfiniumPurify, which is freely available from GitHub (https://github.com/Xiaoqizheng/InfiniumPurify). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Zahra Amini Farsani ◽  
Volker J. Schmid

AbstractCo-localization analysis is a popular method for quantitative analysis in fluorescence microscopy imaging. The localization of marked proteins in the cell nucleus allows a deep insight into biological processes in the nucleus. Several metrics have been developed for measuring the co-localization of two markers, however, they depend on subjective thresholding of background and the assumption of linearity. We propose a robust method to estimate the bivariate distribution function of two color channels. From this, we can quantify their co- or anti-colocalization. The proposed method is a combination of the Maximum Entropy Method (MEM) and a Gaussian Copula, which we call the Maximum Entropy Copula (MEC). This new method can measure the spatial and nonlinear correlation of signals to determine the marker colocalization in fluorescence microscopy images. The proposed method is compared with MEM for bivariate probability distributions. The new colocalization metric is validated on simulated and real data. The results show that MEC can determine co- and anti-colocalization even in high background settings. MEC can, therefore, be used as a robust tool for colocalization analysis.


2021 ◽  
Author(s):  
Lajos Horváth ◽  
Zhenya Liu ◽  
Gregory Rice ◽  
Yuqian Zhao

Abstract The problem of detecting change points in the mean of high dimensional panel data with potentially strong cross–sectional dependence is considered. Under the assumption that the cross–sectional dependence is captured by an unknown number of common factors, a new CUSUM type statistic is proposed. We derive its asymptotic properties under three scenarios depending on to what extent the common factors are asymptotically dominant. With panel data consisting of N cross sectional time series of length T, the asymptotic results hold under the mild assumption that min {N, T} → ∞, with an otherwise arbitrary relationship between N and T, allowing the results to apply to most panel data examples. Bootstrap procedures are proposed to approximate the sampling distribution of the test statistics. A Monte Carlo simulation study showed that our test outperforms several other existing tests in finite samples in a number of cases, particularly when N is much larger than T. The practical application of the proposed results are demonstrated with real data applications to detecting and estimating change points in the high dimensional FRED-MD macroeconomic data set.


2020 ◽  
Vol 36 (11) ◽  
pp. 3431-3438
Author(s):  
Ziyi Li ◽  
Zhenxing Guo ◽  
Ying Cheng ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. Results We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. Availability and implementation The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Shengji Jia ◽  
Lei Shi

Abstract Motivation Knowing the number and the exact locations of multiple change points in genomic sequences serves several biological needs. The cumulative segmented algorithm (cumSeg) has been recently proposed as a computationally efficient approach for multiple change-points detection, which is based on a simple transformation of data and provides results quite robust to model mis-specifications. However, the errors are also accumulated in the transformed model so that heteroscedasticity and serial correlation will show up, and thus the variations of the estimated change points will be quite different, while the locations of the change points should be of the same importance in the original genomic sequences. Results In this study, we develop two new change-points detection procedures in the framework of cumulative segmented regression. Simulations reveal that the proposed methods not only improve the efficiency of each change point estimator substantially but also provide the estimators with similar variations for all the change points. By applying these proposed algorithms to Coriel and SNP genotyping data, we illustrate their performance on detecting copy number variations. Supplementary information The proposed algorithms are implemented in R program and are available at Bioinformatics online.


2019 ◽  
Vol 35 (14) ◽  
pp. i427-i435 ◽  
Author(s):  
Héctor Climente-González ◽  
Chloé-Agathe Azencott ◽  
Samuel Kaski ◽  
Makoto Yamada

AbstractMotivationFinding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks.ResultsWe compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons.Availability and implementationBlock HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso).Supplementary informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document