scholarly journals PEPA test: fast and powerful differential analysis from relative quantitative proteomics data using shared peptides

Biostatistics ◽  
2018 ◽  
Vol 20 (4) ◽  
pp. 632-647
Author(s):  
Laurent Jacob ◽  
Florence Combes ◽  
Thomas Burger

Summary We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for. In this article, we use a linear model describing peptide–protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.

2017 ◽  
Author(s):  
Laurent Jacob ◽  
Florence Combes ◽  
Thomas Burger

AbstractWe propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge, so that they are often not accounted for. In this article, we use a linear model describing peptide-protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.


2020 ◽  
Vol 117 (29) ◽  
pp. 16880-16890 ◽  
Author(s):  
Larry Wasserman ◽  
Aaditya Ramdas ◽  
Sivaraman Balakrishnan

We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood-ratio statistic that we call “the split likelihood-ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood-ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum-likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid P values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.


Author(s):  
Andrew D. Barbour

AbstractIt is shown that the Wilks large sample likelihood ratio statistic λn, for testing between composite hypotheses Θ0 ⊂ Θ1 on the basis of a sample of size n, behaves as n varies like a diffusion process related to an equilibrium Ornstein-Uhlenbeck process, whenever the null hypothesis is true. This fact is used to construct large sample sequential tests based on λn, which are the same whatever the underlying distributions. In particular, the underlying distributions need not belong to an exponential family.


2019 ◽  
Vol 11 (16) ◽  
pp. 1930 ◽  
Author(s):  
Hui Luo ◽  
Zhenhong Li ◽  
Zhen Dong ◽  
Anxi Yu ◽  
Yongsheng Zhang ◽  
...  

The application of SAR tomography (TomoSAR) on the urban infrastructure and other man-made buildings has gained increasing popularity with the development of modern high-resolution spaceborne satellites. Urban tomography focuses on the separation of the overlaid targets within one azimuth-range resolution cell, and on the reconstruction of their reflectivity profiles. In this work, we build on the existing methods of compressive sensing (CS) and generalized likelihood ratio test (GLRT), and develop a multiple scatterers detection method named CS-GLRT to automatically recognize the number of scatterers superimposed within a single pixel as well as to reconstruct the backscattered reflectivity profiles of the detected scatterers. The proposed CS-GLRT adopts a two-step strategy. In the first step, an L1-norm minimization is carried out to give a robust estimation of the candidate positions pixel by pixel with super-resolution. In the second step, a multiple hypothesis test is implemented in the GLRT to achieve model order selection, where the mapping matrix is constrained within the afore-selected columns, namely, within the candidate positions, and the parameters are estimated by least square (LS) method. Numerical experiments on simulated data were carried out, and the presented results show its capability of separating the closely located scatterers with a quasi-constant false alarm rate (QCFAR), as well as of obtaining an estimation accuracy approaching the Cramer–Rao Low Bound (CRLB). Experiments on real data of Spotlight TerraSAR-X show that CS-GLRT allows detecting single scatterers with high density, distinguishing a considerable number of double scatterers, and even detecting triple scatterers. The estimated results agree well with the ground truth and help interpret the true structure of the complex or buildings studied in the SAR images. It should be noted that this method is especially suitable for urban areas with very dense infrastructure and man-made buildings, and for datasets with tightly-controlled baseline distribution.


1992 ◽  
Vol 42 (3-4) ◽  
pp. 255-260 ◽  
Author(s):  
A.K. Basu ◽  
Debasis Bhattacharya

Non-null asymptotic distribution of randomly stopped loglikelihood ratio statistic for general dependent process has been obtained. We observed that under certain regularity conditions the limiting distribution of randomly stopped log-likelihood ratio statistic under alternative hypothesis is again a mixture of Normal distributions. Some possible applications of the result has also been pointed out in this note.


1979 ◽  
Vol 11 (04) ◽  
pp. 737-749
Author(s):  
Robert V. Foutz ◽  
R. C. Srivastava

Statistical inference for Markov processes is commonly based on the maximum likelihood method of estimation and the likelihood ratio criterion for testing hypotheses. Construction of estimators and test statistics by these methods require that a model be chosen in the form of a family of transition density functions. In this paper, asymptotic properties of the maximum likelihood estimator and of the likelihood ratio statistic λ n are examined when the model chosen for their construction is incorrect—that is, when no density in the model is a density for the transition probability distribution of the Markov process. It is shown that if and λ n are constructed from a ‘regular’ incorrect model, then is consistent and asymptotically normally distributed and the asymptotic null distribution of −2 log λ n is that of a linear combination of independent chi-squared random variables. These results are applied to propose measures of the performance of the test based on λ n when the statistic is constructed from an incorrect model.


Sign in / Sign up

Export Citation Format

Share Document