scholarly journals A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory

2019 ◽  
Vol 15 (3) ◽  
pp. e1006794 ◽  
Author(s):  
Rodoniki Athanasiadou ◽  
Benjamin Neymotin ◽  
Nathan Brandt ◽  
Wei Wang ◽  
Lionel Christiaen ◽  
...  
2013 ◽  
Vol 14 (7) ◽  
pp. R74 ◽  
Author(s):  
Keyan Zhao ◽  
Zhi-xiang Lu ◽  
Juw Park ◽  
Qing Zhou ◽  
Yi Xing

2020 ◽  
pp. 47-63
Author(s):  
Bendix Carstensen

This chapter examines prevalence data, using a dataset which contains the number of diabetes patients and the total number of persons in Denmark as of January 1, 2010, classified by age and sex. Prevalence of a disease condition in a population is merely the proportion of affected people. The chapter uses prevalence to illustrate core modelling concepts: the model itself, the likelihood, the maximum likelihood estimation principle, and the properties of the results, all of which underlies most modern epidemiological methods. It also explains the concept of a statistical model leading to the distinction between empirical and theoretical prevalences. The chapter then focuses on the task of comparing different models for the same data, models that describe data in various degrees of detail.


Mathematics ◽  
2020 ◽  
Vol 8 (10) ◽  
pp. 1801
Author(s):  
Abdulhakim A. Al-Babtain ◽  
Ibrahim Elbatal ◽  
Christophe Chesneau ◽  
Farrukh Jamal

This paper is devoted to a new class of distributions called the Box-Cox gamma-G family. It is a natural generalization of the useful Ristić–Balakrishnan-G family of distributions, containing a wide variety of power gamma-G distributions, including the odd gamma-G distributions. The key tool for this generalization is the use of the Box-Cox transformation involving a tuning power parameter. Diverse mathematical properties of interest are derived. Then a specific member with three parameters based on the half-Cauchy distribution is studied and considered as a statistical model. The method of maximum likelihood is used to estimate the related parameters, along with a simulation study illustrating the theoretical convergence of the estimators. Finally, two different real datasets are analyzed to show the fitting power of the new model compared to other appropriate models.


1970 ◽  
Vol 19 (1-2) ◽  
pp. 79-79 ◽  
Author(s):  
M. V. Stack

Weights of developing incisors in 10 pairs of twin fetuses have previously been related to ages in order to compare dental growth status (Stack, 1963). Ages (T0) at which weights (W) of mineralised portions of teeth of the temporary dentition become significant have been computed (Stack, 1968), using a Fortran program giving maximum likelihood estimates of the required parameters, based on the statistical model of Angleton and Pettus (1966).Availability of the additional parameter T0 allows a reexamination of the previous data, now fortified by observations on incisors from five further pairs of twin fetuses. Estimates of ages were made from the relationship (T—T0) = k·W½, where values for upper and lower central incisors were 2.0 and 2.8 for k, 19.5 and 18.5 for T0, respectively. Less reliable estimates were obtained from the lateral incisors. These values were computed from observations on 40 dentitions from normal singleton fetuses. Ages were also estimated from tabulated body weights (Documenta Geigy).


2019 ◽  
Author(s):  
Anna Liza Kretzschmar ◽  
Arjun Verma ◽  
Shauna Murray ◽  
Tim Kahlke ◽  
Mathieu Fourment ◽  
...  

ABSTRACTFrom publicly available next-gen sequencing datasets of non-model organisms, such as marine protists, arise opportunities to explore their evolutionary relationships. In this study we explored the effects that dataset and model selection have on the phylogenetic inference of the Gonyaulacales, single celled marine algae of the phylum Dinoflagellata with genomes that show extensive paralogy. We developed a method for identifying and extracting single copy genes from RNA-seq libraries and compared phylogenies inferred from these single copy genes with those inferred from commonly used genetic markers and phylogenetic methods. Comparison of two datasets and three different phylogenetic models showed that exclusive use of ribosomal DNA sequences, maximum likelihood and gene concatenation showed very different results to that obtained with the multi-species coalescent. The multi-species coalescent has recently been recognized as being robust to the inclusion of paralogs, including hidden paralogs present in single copy gene sets (pseudoorthologs). Comparisons of model fit strongly favored the multi-species coalescent for these data, over a concatenated alignment (single tree) model. Our findings suggest that the multi-species coalescent (inferred either via Maximum Likelihood or Bayesian Inference) should be considered for future phylogenetic studies of organisms where accurate selection of orthologs is difficult.


Geophysics ◽  
1987 ◽  
Vol 52 (12) ◽  
pp. 1621-1630 ◽  
Author(s):  
Georgios B. Giannakis ◽  
Jerry M. Mendel

Entropy concepts are embodied in the maximum‐likelihood deconvolution (MLD) method, just as they are in minimum‐entropy (MED) methods. MLD asymptotically minimizes Shannon’s entropy of the reflectivity sequence (which, in MLD, is modeled as a Bernoulli‐Gaussian random sequence). Study of maximum‐ likelihood detection and estimation of the reflectivity sequence reveals that MLD is embodied in the general framework of the “adaptive” MED methods. Comparisons based on similarities and differences between MLD and various existing MED techniques show that MLD is robust due to explicit inclusion of noise in its statistical model.


Extremes ◽  
2004 ◽  
Vol 7 (4) ◽  
pp. 309-336 ◽  
Author(s):  
Alexander Kukush ◽  
Yuri Chernikov ◽  
Dietmar Pfeifer

2019 ◽  
Author(s):  
Shirong Deng ◽  
Feifei Xiao

AbstractIn the past few years extensive studies have been put on the analysis of genome function, especially on expression quantitative trait loci (eQTL) which offered promise for characterization of the functional sequencing variation and for the understanding of the basic processes of gene regulation. However, most studies of eQTL mapping have not implemented models that allow for the non-equivalence of parental alleles as so-called parent-of-origin effects (POEs); thus, the number and effects of imprinted genes remain important open questions. Imprinting is a type of POE that the expression of certain genes depends on their allelic parent-of-origin which are important contributors to phenotypic variations, such as diabetes and many cancer types. Besides, multi-collinearity is an important issue arising from modeling multiple genetic effects. To address these challenges, we proposed a statistical framework to test the main allelic effects of the candidate eQTLs along with the POE with an orthogonal model for RNA sequencing (RNA-seq) data. Using simulations, we demonstrated the desirable power and Type I error of the orthogonal model which also achieved accurate estimation of the genetic effects and over-dispersion of the RNA-seq data. These methods were applied to an existing HapMap project trio dataset to validate the reported imprinted genes and to discovery novel imprinted genes. Using the orthogonal method, we validated existing imprinting genes and discovered two novel imprinting genes with significant dominance effect.Author SummaryIn the past decades, an unprecedented wealth of knowledge has been accumulated for understanding variations in human DNA level. However, this DNA-level knowledge has not been sufficiently translated to understanding the mechanisms of human diseases. Gene expression quantitative trait locus (eQTL) mapping is one of the most promising approaches to fill this gap, which aims to explore the genetic basis of gene expression. Genomic imprinting is an important epigenetic phenomenon which is an important contributor to phenotypic variation in human complex diseases and may explain some of the “hidden” heritable variability. Many imprinting genes are known to play important roles in human complex diseases such as diabetes, breast cancer and obesity. However, traditional eQTL mapping approaches does not allow for the detection of imprinting which is usually involved in gene expression imbalance. In this study, we have for the first time demonstrated the orthogonal statistical model can be applied to eQTL mapping for RNA sequencing (RNA-seq) data. We showed by simulated and real data that the orthogonal model outperformed the usual functional model for detecting main effects in most cases, which addressed the issue of confounding between the dominance and additive effects. Application of the statistical model to the HapMap data resulted in discovery of some potential eQTLs with imprinting effects and dominance effects on expression of RB1 and IGF1R genes.In summary, we developed a comprehensive framework for modeling imprinting effect for eQTL mapping, by decomposing the effects to multiple genetic components. This study is providing new insights into statistical modeling of eQTL mapping with RNA-seq data which allows for uncorrelated parameter estimation of genetic effects, covariates and over-dispersion parameter.


Sign in / Sign up

Export Citation Format

Share Document