scholarly journals Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation

2021 ◽  
Author(s):  
Anton Kalyuzhnyy ◽  
Patrick A. Eyers ◽  
Claire E. Eyers ◽  
Zhi Sun ◽  
Eric W. Deutsch ◽  
...  

AbstractMass spectrometry-based phosphoproteomics allows large-scale generation of phosphorylation site data. However, analytical pipelines need to be carefully designed and optimised to minimise incorrect identification of phosphopeptide sequences or wrong localisation of phosphorylation sites within those peptides. Public databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available MS data, but to our knowledge, there is no database-level control for false discovery of sites, subsequently leading to the likely overestimation of true phosphosites. It is therefore difficult for researchers to assess which phosphosites are “real” and which are likely to be artefacts of data processing. By profiling the human phosphoproteome, we aimed to estimate the false discovery rate (FDR) of phosphosites based on available evidence in PSP and/or PA and predict a more realistic count of true phosphosites. We ranked sites into phosphorylation likelihood sets based on layers of accumulated evidence and then analysed them in terms of amino acid conservation across 100 species, sequence properties and functional annotations of associated proteins. We demonstrated significant differences between the sets and developed a method for independent phosphosite FDR estimation. Remarkably, we estimated a false discovery rate of 86.1%, 95.4% and 82.2% within sets of described phosphoserine (pSer), phosphothreonine (pThr) and phosphotyrosine (pTyr) sites respectively for which only a single piece of identification evidence is available (the vast majority of sites in PSP). Overall, we estimate that ∼56,000 Ser, 10,000 Thr and 12,000 Tyr phosphosites in the human proteome have truly been identified to date, based on evidence in PSP and/or PA, which is lower than most published estimates. Furthermore, our analysis estimated ∼91,000 Ser, 49,000 Thr and 26,000 Tyr sites that are likely to represent false-positive phosphosite identifications. We conclude that researchers should be aware of the significant potential for false positive sites to be present in public databases and should evaluate the evidence behind the phosphosites used in their research.

2010 ◽  
Vol 17 (1) ◽  
pp. 58-62 ◽  
Author(s):  
James J. Chen ◽  
Paula K. Robeson ◽  
Michael J. Schell

Genes ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 167 ◽  
Author(s):  
Qingyang Zhang

The nonparanormal graphical model has emerged as an important tool for modeling dependency structure between variables because it is flexible to non-Gaussian data while maintaining the good interpretability and computational convenience of Gaussian graphical models. In this paper, we consider the problem of detecting differential substructure between two nonparanormal graphical models with false discovery rate control. We construct a new statistic based on a truncated estimator of the unknown transformation functions, together with a bias-corrected sample covariance. Furthermore, we show that the new test statistic converges to the same distribution as its oracle counterpart does. Both synthetic data and real cancer genomic data are used to illustrate the promise of the new method. Our proposed testing framework is simple and scalable, facilitating its applications to large-scale data. The computational pipeline has been implemented in the R package DNetFinder, which is freely available through the Comprehensive R Archive Network.


2018 ◽  
Vol 113 (523) ◽  
pp. 1172-1183 ◽  
Author(s):  
Pallavi Basu ◽  
T. Tony Cai ◽  
Kiranmoy Das ◽  
Wenguang Sun

2018 ◽  
Vol 17 (7) ◽  
pp. 2328-2334 ◽  
Author(s):  
Xusheng Wang ◽  
Drew R. Jones ◽  
Timothy I. Shaw ◽  
Ji-Hoon Cho ◽  
Yuanyuan Wang ◽  
...  

PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0255939
Author(s):  
Sibaji Gaj ◽  
Daniel Ontaneda ◽  
Kunio Nakamura

Gadolinium-enhancing lesions reflect active disease and are critical for in-patient monitoring in multiple sclerosis (MS). In this work, we have developed the first fully automated method to segment and count the gadolinium-enhancing lesions from routine clinical MRI of MS patients. The proposed method first segments the potential lesions using 2D-UNet from multi-channel scans (T1 post-contrast, T1 pre-contrast, FLAIR, T2, and proton-density) and classifies the lesions using a random forest classifier. The algorithm was trained and validated on 600 MRIs with manual segmentation. We compared the effect of loss functions (Dice, cross entropy, and bootstrapping cross entropy) and number of input contrasts. We compared the lesion counts with those by radiologists using 2,846 images. Dice, lesion-wise sensitivity, and false discovery rate with full 5 contrasts were 0.698, 0.844, and 0.307, which improved to 0.767, 0.969, and 0.00 in large lesions (>100 voxels). The model using bootstrapping loss function provided a statistically significant increase of 7.1% in sensitivity and of 2.3% in Dice compared with the model using cross entropy loss. T1 post/pre-contrast and FLAIR were the most important contrasts. For large lesions, the 2D-UNet model trained using T1 pre-contrast, FLAIR, T2, PD had a lesion-wise sensitivity of 0.688 and false discovery rate 0.083, even without T1 post-contrast. For counting lesions in 2846 routine MRI images, the model with 2D-UNet and random forest, which was trained with bootstrapping cross entropy, achieved accuracy of 87.7% using T1 pre-contrast, T1 post-contrast, and FLAIR when lesion counts were categorized as 0, 1, and 2 or more. The model performs well in routine non-standardized MRI datasets, allows large-scale analysis of clinical datasets, and may have clinical applications.


Metabolites ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 53
Author(s):  
Shin June Kim ◽  
Youngjae Oh ◽  
Jaesik Jeong

Due to the advance in technology, the type of data is getting more complicated and large-scale. To analyze such complex data, more advanced technique is required. In case of omics data from two different groups, it is interesting to find significant biomarkers between two groups while controlling error rate such as false discovery rate (FDR). Over the last few decades, a lot of methods that control local false discovery rate have been developed, ranging from one-dimensional to k-dimensional FDR procedure. For comparison study, we select three of them, which have unique and significant properties: Efron’s approach, Ploner’s approach, and Kim’s approach in chronological order. The first approach is one-dimensional approach while the other two are two-dimensional ones. Furthermore, we consider two more variants of Ploner’s approach. We compare the performance of those methods on both simulated and real data.


2019 ◽  
Author(s):  
Johannes Köster ◽  
Louis J. Dijkstra ◽  
Tobias Marschall ◽  
Alexander Schönhuth

AbstractAs witnessed by various population-scale cancer genome sequencing projects, accurate discovery of somatic variants has become of central importance in modern cancer research. However, count statistics on somatic insertions and deletions (indels) discovered so far point out that large amounts of discoveries must have been missed. The reason is that the combination of uncertainties relating to, for example, gap and alignment ambiguities, twilight zone indels, cancer heterogeneity, sample purity, sampling and strand bias are hard to accurately quantify. Here, a unifying statistical model is provided whose dependency structures enable to accurately quantify all inherent uncertainties in short time. As major consequence, false discovery rate (FDR) in somatic indel discovery can now be controlled at utmost accuracy. As demonstrated on simulated and real data, this enables to dramatically increase the amount of true discoveries while safely suppressing the FDR. Specifically supported by workflow design, our approach can be integrated as a post-processing step in large-scale projects.The software is publicly available at https://varlociraptor.github.io and can be easily installed via Bioconda1 [Grüning et al., 2018].


2019 ◽  
Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.


Sign in / Sign up

Export Citation Format

Share Document