scholarly journals Inference of genome 3D architecture by modeling overdispersion of Hi-C data

2021 ◽  
Author(s):  
Nelle Varoquaux ◽  
William S. Noble ◽  
Jean-Philippe Vert

We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two step algorithm: first convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data.We first confirm the presence of overdispersion in several real Hi-C data sets, and we show that the overdispersion arises even in simulated data sets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms: three MDS-based methods (ShRec3D, ChromSDE, and Pastis-MDS) and a statistical methods based on a Poisson model of the data (Pastis-PM). We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions.A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD licenseSupplementary information is available at https://nellev.github.io/pastisnb/

2012 ◽  
Vol 36 (2) ◽  
pp. 88-103 ◽  
Author(s):  
Lai-Fa Hung

Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an overdispersion framework and proposes new estimation methods. The parameters in the proposed model can be estimated using the Markov chain Monte Carlo method implemented in WinBUGS and the marginal maximum likelihood method implemented in SAS. An empirical example based on models generated by the results of empirical data, which are fitted and discussed, is examined.


2021 ◽  
Author(s):  
Benbo Gao ◽  
Jing Zhu ◽  
Soumya Negi ◽  
Xinmin Zhang ◽  
Stefka Gyoneva ◽  
...  

AbstractSummaryWe developed Quickomics, a feature-rich R Shiny-powered tool to enable biologists to fully explore complex omics data and perform advanced analysis in an easy-to-use interactive interface. It covers a broad range of secondary and tertiary analytical tasks after primary analysis of omics data is completed. Each functional module is equipped with customized configurations and generates both interactive and publication-ready high-resolution plots to uncover biological insights from data. The modular design makes the tool extensible with ease.AvailabilityResearchers can experience the functionalities with their own data or demo RNA-Seq and proteomics data sets by using the app hosted at http://quickomics.bxgenomics.com and following the tutorial, https://bit.ly/3rXIyhL. The source code under GPLv3 license is provided at https://github.com/interactivereport/[email protected], [email protected] informationSupplementary materials are available at https://bit.ly/37HP17g.


1985 ◽  
Vol 5 (1) ◽  
pp. 59-65 ◽  
Author(s):  
Edward F. Vonesh

Recurrent peritonitis is a major complication of Continuous Ambulatory Peritoneal Dialysis (CAPD). As a therapy for patients with end stage renal disease, CAPD entails a continuous interaction between patient and various medical devices. The assumptions one makes regarding this interaction play an essential role when estimating the rate of recurrent peritonitis for a given patient population. Assuming that each patient has a constant rate of peritonitis, two models for evaluating the risk of recurrent peritonitis are considered. One model, the Poisson probability model, applies when the rate of peritonitis is the same from patient to patient. When this occurs, the frequency of peritoneal infections will be randomly distributed among patients (Corey, 1981). A second model, the negative binomial probability model, applies when the rate of peritonitis varies from one patient to another. In this event, the distribution of peritoneal infections will differ from patient to patient. The poisson model would be applicable when, for example, patients behave similarly with respect to their interactions with the medical devices and with potential risk factors. The negative binomial model, on the other hand, makes allowances for patient differences both in terms of their handling of routine exchanges and in their exposure to various risk factors. This paper provides methods for estimating the mean peritonitis rate under each model. In addition, “survival” curve estimates depicting the probability of remaining peritonitis free (i.e. “surviving”) over time are provided. It is shown, using data from a multi-center clinical trial, that the risk of peritonitis is best described in terms of survival curves rather than the mean peritonitis rate. For both models, the mean peritonitis rate was found to be 0.85 episodes per year. However, under the negative binomial model, the one-year survival rate, expressed as the percentage of patients remaining free of peritonitis, is 52% as compared with only 42% under the Poisson model. Moreover, the negative binomial model provided a significantly better fit to the observed frequency of peritonitis. These findings suggest that the negative binomial model provides a more realistic and accurate portrayal of the risk of peritonitis and that this risk is not nearly as high as would otherwise be indicated by a Poisson analysis.


Genetics ◽  
1999 ◽  
Vol 151 (4) ◽  
pp. 1605-1619 ◽  
Author(s):  
Mikko J Sillanpää ◽  
Elja Arjas

Abstract A general fine-scale Bayesian quantitative trait locus (QTL) mapping method for outcrossing species is presented. It is suitable for an analysis of complete and incomplete data from experimental designs of F2 families or backcrosses. The amount of genotyping of parents and grandparents is optional, as well as the assumption that the QTL alleles in the crossed lines are fixed. Grandparental origin indicators are used, but without forgetting the original genotype or allelic origin information. The method treats the number of QTL in the analyzed chromosome as a random variable and allows some QTL effects from other chromosomes to be taken into account in a composite interval mapping manner. A block-update of ordered genotypes (haplotypes) of the whole family is sampled once in each marker locus during every round of the Markov Chain Monte Carlo algorithm used in the numerical estimation. As a byproduct, the method gives the posterior distributions for linkage phases in the family and therefore it can also be used as a haplotyping algorithm. The Bayesian method is tested and compared with two frequentist methods using simulated data sets, considering two different parental crosses and three different levels of available parental information. The method is implemented as a software package and is freely available under the name Multimapper/outbred at URL http://www.rni.helsinki.fi/~mjs/.


1989 ◽  
Vol 19 (2) ◽  
pp. 199-212 ◽  
Author(s):  
Georges Dionne ◽  
Charles Vanasse

AbstractThe objective of this paper is to provide an extension of well-known models of tarification in automobile insurance. The analysis begins by introducing a regression component in the Poisson model in order to use all available information in the estimation of the distribution. In a second step, a random variable is included in the regression component of the Poisson model and a negative binomial model with a regression component is derived. We then present our main contribution by proposing a bonus-malus system which integrates a priori and a posteriori information on an individual basis. We show how net premium tables can be derived from the model. Examples of tables are presented.


2020 ◽  
Author(s):  
Faezeh Bayat ◽  
Maxwell Libbrecht

AbstractMotivationA sequencing-based genomic assay such as ChIP-seq outputs a real-valued signal for each position in the genome that measures the strength of activity at that position. Most genomic signals lack the property of variance stabilization. That is, a difference between 100 and 200 reads usually has a very different statistical importance from a difference between 1,100 and 1,200 reads. A statistical model such as a negative binomial distribution can account for this pattern, but learning these models is computationally challenging. Therefore, many applications—including imputation and segmentation and genome annotation (SAGA)—instead use Gaussian models and use a transformation such as log or inverse hyperbolic sine (asinh) to stabilize variance.ResultsWe show here that existing transformations do not fully stabilize variance in genomic data sets. To solve this issue, we propose VSS, a method that produces variance-stabilized signals for sequencingbased genomic signals. VSS learns the empirical relationship between the mean and variance of a given signal data set and produces transformed signals that normalize for this dependence. We show that VSS successfully stabilizes variance and that doing so improves downstream applications such as SAGA. VSS will eliminate the need for downstream methods to implement complex mean-variance relationship models, and will enable genomic signals to be easily understood by [email protected]://github.com/faezeh-bayat/Variance-stabilized-units-for-sequencing-based-genomic-signals.


2018 ◽  
Author(s):  
Carlos Martínez-Mira ◽  
Ana Conesa ◽  
Sonia Tarazona

AbstractMotivationAs new integrative methodologies are being developed to analyse multi-omic experiments, validation strategies are required for benchmarking. In silico approaches such as simulated data are popular as they are fast and cheap. However, few tools are available for creating synthetic multi-omic data sets.ResultsMOSim is a new R package for easily simulating multi-omic experiments consisting of gene expression data, other regulatory omics and the regulatory relationships between them. MOSim supports different experimental designs including time series data.AvailabilityThe package is freely available under the GPL-3 license from the Bitbucket repository (https://bitbucket.org/ConesaLab/mosim/)[email protected] informationSupplementary material is available at bioRxiv online.


2017 ◽  
Vol 17 (6) ◽  
pp. 359-380 ◽  
Author(s):  
Alan Huang

Conway–Maxwell–Poisson (CMP) distributions are flexible generalizations of the Poisson distribution for modelling overdispersed or underdispersed counts. The main hindrance to their wider use in practice seems to be the inability to directly model the mean of counts, making them not compatible with nor comparable to competing count regression models, such as the log-linear Poisson, negative-binomial or generalized Poisson regression models. This note illustrates how CMP distributions can be parametrized via the mean, so that simpler and more easily interpretable mean-models can be used, such as a log-linear model. Other link functions are also available, of course. In addition to establishing attractive theoretical and asymptotic properties of the proposed model, its good finite-sample performance is exhibited through various examples and a simulation study based on real datasets. Moreover, the MATLAB routine to fit the model to data is demonstrated to be up to an order of magnitude faster than the current software to fit standard CMP models, and over two orders of magnitude faster than the recently proposed hyper-Poisson model.


Parasitology ◽  
2008 ◽  
Vol 135 (10) ◽  
pp. 1225-1235 ◽  
Author(s):  
M. J. DENWOOD ◽  
M. J. STEAR ◽  
L. MATTHEWS ◽  
S. W. J. REID ◽  
N. TOFT ◽  
...  

SUMMARYUnderstanding the frequency distribution of parasites and parasite stages among hosts is essential for efficient experimental design and statistical analysis, and is also required for the development of sustainable methods of controlling infection.Nematodirus battusis one of the most important organisms that infect sheep but the distribution of parasites among hosts is unknown. An initial analysis indicated a high frequency of animals withoutN. battusand with zero egg counts, suggesting the possibility of a zero-inflated distribution. We developed a Bayesian analysis using Markov chain Monte Carlo methods to estimate the parameters of the zero-inflated negative binomial distribution. The analysis of 3000 simulated data sets indicated that this method out-performed the maximum likelihood procedure. Application of this technique to faecal egg counts from lambs in a commercial upland flock indicated thatN. battuscounts were indeed zero-inflated. Estimating the extent of zero-inflation is important for effective statistical analysis and for the accurate identification of genetically resistant animals.


2016 ◽  
Vol 115 (1) ◽  
pp. 434-444 ◽  
Author(s):  
Wahiba Taouali ◽  
Giacomo Benvenuti ◽  
Pascal Wallisch ◽  
Frédéric Chavane ◽  
Laurent U. Perrinet

The repeated presentation of an identical visual stimulus in the receptive field of a neuron may evoke different spiking patterns at each trial. Probabilistic methods are essential to understand the functional role of this variance within the neural activity. In that case, a Poisson process is the most common model of trial-to-trial variability. For a Poisson process, the variance of the spike count is constrained to be equal to the mean, irrespective of the duration of measurements. Numerous studies have shown that this relationship does not generally hold. Specifically, a majority of electrophysiological recordings show an “overdispersion” effect: responses that exhibit more intertrial variability than expected from a Poisson process alone. A model that is particularly well suited to quantify overdispersion is the Negative-Binomial distribution model. This model is well-studied and widely used but has only recently been applied to neuroscience. In this article, we address three main issues. First, we describe how the Negative-Binomial distribution provides a model apt to account for overdispersed spike counts. Second, we quantify the significance of this model for any neurophysiological data by proposing a statistical test, which quantifies the odds that overdispersion could be due to the limited number of repetitions (trials). We apply this test to three neurophysiological data sets along the visual pathway. Finally, we compare the performance of this model to the Poisson model on a population decoding task. We show that the decoding accuracy is improved when accounting for overdispersion, especially under the hypothesis of tuned overdispersion.


Sign in / Sign up

Export Citation Format

Share Document