Evaluation of an integrative Bayesian peptide detection approach on a combinatorial peptide library

2022 ◽  
pp. 146906672110667
Author(s):  
Miroslav Hruska ◽  
Dusan Holub

Detection of peptides lies at the core of bottom-up proteomics analyses. We examined a Bayesian approach to peptide detection, integrating match-based models (fragments, retention time, isotopic distribution, and precursor mass) and peptide prior probability models under a unified probabilistic framework. To assess the relevance of these models and their various combinations, we employed a complete- and a tail-complete search of a low-precursor-mass synthetic peptide library based on oncogenic KRAS peptides. The fragment match was by far the most informative match-based model, while the retention time match was the only remaining such model with an appreciable impact––increasing correct detections by around 8 %. A peptide prior probability model built from a reference proteome greatly improved the detection over a uniform prior, essentially transforming de novo sequencing into a reference-guided search. The knowledge of a correct sequence tag in advance to peptide-spectrum matching had only a moderate impact on peptide detection unless the tag was long and of high certainty. The approach also derived more precise error rates on the analyzed combinatorial peptide library than those estimated using PeptideProphet and Percolator, showing its potential applicability for the detection of homologous peptides. Although the approach requires further computational developments for routine data analysis, it illustrates the value of peptide prior probabilities and presents a Bayesian approach for their incorporation into peptide detection.

2000 ◽  
Vol 2 (5) ◽  
pp. 461-466 ◽  
Author(s):  
Gérard Rossé ◽  
Erich Kueng ◽  
Malcolm G. P. Page ◽  
Vesna Schauer-Vukasinovic ◽  
Thomas Giller ◽  
...  

2012 ◽  
Vol 204-208 ◽  
pp. 3457-3461
Author(s):  
Tian Qi Li ◽  
Fei Geng

In order to study the probability of occurrence of secondary fire after the earthquake in urban areas, the probability model of the hazard analysis that the fire occurred and the spread is established and applied. Probability models need to consider the destruction level of buildings under earthquake excitation as well as the probability of the leakage and diffusion of combustible material in the buildings in the corresponding destruction level, combination of weather, season, housing density and other factors to determine the probability of the single building earthquake secondary fire. On this basis , the natural administrative areas in the city as a unit , considering the factors of regional hazard analysis such as population density , property distribution and density within a region , to calculate the hazard indicator and determine the high hazard areas of secondary fire in the city. The Geographic Information System was used as the platform, to division of urban earthquake secondary fire high-hazard areas.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6902 ◽  
Author(s):  
Simon Roux ◽  
Gareth Trubl ◽  
Danielle Goudeau ◽  
Nandita Nath ◽  
Estelle Couradeau ◽  
...  

Background Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes. Conclusions PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.


2018 ◽  
Author(s):  
Simon Roux ◽  
Gareth Trubl ◽  
Danielle Goudeau ◽  
Nandita Nath ◽  
Estelle Couradeau ◽  
...  

Background. Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods. Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results. Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥ 10kb by 10 to 100-fold for low input metagenomes. Conclusions. PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.


2021 ◽  
pp. 287-300
Author(s):  
Christian Dahlman ◽  
Eivind Kolflaath

This chapter addresses a classical challenge to the Bayesian approach. It examines different ways of setting the prior probability of the prosecutor’s hypothesis in a criminal trial, in particular, the classical Bayesian solution of setting the prior at 1/N, where N is the number of possible perpetrators in the geographical area where the crime was committed. The authors argue that this solution is at odds with the presumption of innocence, and that other proposals are also problematic, either theoretically or in practice. According to the authors, a presumed prior determined ex lege is less problematic than other solutions, and the problem of the prior can be avoided by a reconceptualization of the standard of proof.


Author(s):  
David Porubsky ◽  
◽  
Peter Ebert ◽  
Peter A. Audano ◽  
Mitchell R. Vollger ◽  
...  

AbstractHuman genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.


2019 ◽  
Vol 9 (3) ◽  
pp. 627-655 ◽  
Author(s):  
Andee Kaplan ◽  
Daniel J Nordman ◽  
Stephen B Vardeman

Abstract A probability model exhibits instability if small changes in a data outcome result in large and, often unanticipated, changes in probability. This instability is a property of the probability model, given by a distributional form and a given configuration of parameters. For correlated data structures found in several application areas, there is increasing interest in identifying such sensitivity in model probability structure. We consider the problem of quantifying instability for general probability models defined on sequences of observations, where each sequence of length $N$ has a finite number of possible values that can be taken at each point. A sequence of probability models, indexed by $N$, and an associated parameter sequence result to accommodate data of expanding dimension. Model instability is formally shown to occur when a certain log probability ratio under such models grows faster than $N$. In this case, a one component change in the data sequence can shift probability by orders of magnitude. Also, as instability becomes more extreme, the resulting probability models are shown to tend to degeneracy, placing all their probability on potentially small portions of the sample space. These results on instability apply to large classes of models commonly used in random graphs, network analysis and machine learning contexts.


1994 ◽  
Vol 26 (4) ◽  
pp. 831-854 ◽  
Author(s):  
Jeffrey D. Helterbrand ◽  
Noel Cressie ◽  
Jennifer L. Davidson

In this research, we present a statistical theory, and an algorithm, to identify one-pixel-wide closed object boundaries in gray-scale images. Closed-boundary identification is an important problem because boundaries of objects are major features in images. In spite of this, most statistical approaches to image restoration and texture identification place inappropriate stationary model assumptions on the image domain. One way to characterize the structural components present in images is to identify one-pixel-wide closed boundaries that delineate objects. By defining a prior probability model on the space of one-pixel-wide closed boundary configurations and appropriately specifying transition probability functions on this space, a Markov chain Monte Carlo algorithm is constructed that theoretically converges to a statistically optimal closed boundary estimate. Moreover, this approach ensures that any approximation to the statistically optimal boundary estimate will have the necessary property of closure.


1980 ◽  
Vol 5 (2) ◽  
pp. 129-156 ◽  
Author(s):  
George B. Macready ◽  
C. Mitchell Dayton

A variety of latent class models has been presented during the last 10 years which are restricted forms of a more general class of probability models. Each of these models involves an a priori dependency structure among a set of dichotomously scored tasks that define latent class response patterns across the tasks. In turn, the probabilities related to these latent class patterns along with a set of “Omission” and “intrusion” error rates for each task are the parameters used in defining models within this general class. One problem in using these models is that the defining parameters for a specific model may not be “identifiable.” To deal with this problem, researchers have considered curtailing the form of the model of interest by placing restrictions on the defining parameters. The purpose of this paper is to describe a two-stage conditional estimation procedure which results in reasonable estimates of specific models even though they may be nonidentifiable. This procedure involves the following stages: (a) establishment of initial parameter estimates and (b) step-wise maximum likelihood solutions for latent class probabilities and classification errors with iteration of this process until stable parameter estimates across successive iterations are obtained.


Sign in / Sign up

Export Citation Format

Share Document