scholarly journals Enhanced longitudinal differential expression detection in proteomics with robust reproducibility optimization regression

2021 ◽  
Author(s):  
Tommi Valikangas ◽  
Tomi Suomi ◽  
Courtney E Chandler ◽  
Alison J Scott ◽  
Bao Q Tran ◽  
...  

Quantitative proteomics has matured into an established tool and longitudinal proteomic experiments have begun to emerge. However, no effective, simple-to-use differential expression method for longitudinal proteomics data has been released. Typically, such data is noisy, contains missing values, has only few time points and biological replicates. To address this need, we provide a comprehensive evaluation of several existing differential expression methods for high-throughput longitudinal omics data and introduce a new method, Robust longitudinal Differential Expression (RolDE). The methods were evaluated using nearly 2000 semi-simulated spike-in proteomic datasets and a large experimental dataset. The RolDE method performed overall best; it was most tolerant to missing values, displayed good reproducibility and was the top method in ranking the results in a biologically meaningful way. Furthermore, contrary to many approaches, the open source RolDE does not require prior knowledge concerning the types of differences searched, but can easily be applied even by non-experienced users.

2020 ◽  
Author(s):  
Janine Egert ◽  
Bettina Warscheid ◽  
Clemens Kreutz

AbstractMotivationImputation is a prominent strategy when dealing with missing values (MVs) in proteomics data analysis pipelines. However, the performance of different imputation methods is difficult to assess and varies strongly depending on data characteristics. To overcome this issue, we present the concept of a data-driven selection of a suitable imputation algorithm (DIMA).ResultsThe performance and broad applicability of DIMA is demonstrated on 121 quantitative proteomics data sets from the PRIDE database and on simulated data consisting of 5 – 50% MVs with different proportions of missing not at random and missing completely at random values. DIMA reliably suggests a high-performing imputation algorithm which is always among the three best algorithms and results in a root mean square error difference (ΔRMSE) ≤ 10% in 84% of the cases.Availability and ImplementationSource code is freely available for download at github.com/clemenskreutz/OmicsData.


2016 ◽  
Vol 15 (4) ◽  
pp. 1116-1125 ◽  
Author(s):  
Cosmin Lazar ◽  
Laurent Gatto ◽  
Myriam Ferro ◽  
Christophe Bruley ◽  
Thomas Burger

2016 ◽  
Vol 14 (06) ◽  
pp. 1650030
Author(s):  
Nada Abidi ◽  
Raimo Franke ◽  
Peter Findeisen ◽  
Frank Klawonn

To better understand the dynamics of the underlying processes in cells, it is necessary to take measurements over a time course. Modern high-throughput technologies are often used for this purpose to measure the behavior of cell products like metabolites, peptides, proteins, [Formula: see text]RNA or mRNA at different points in time. Compared to classical time series, the number of time points is usually very limited and the measurements are taken at irregular time intervals. The main reasons for this are the costs of the experiments and the fact that the dynamic behavior usually shows a strong reaction and fast changes shortly after a stimulus and then slowly converges to a certain stable state. Another reason might simply be missing values. It is common to repeat the experiments and to have replicates in order to carry out a more reliable analysis. The ideal assumptions that the initial stimulus really started exactly at the same time for all replicates and that the replicates are perfectly synchronized are seldom satisfied. Therefore, there is a need to first adjust or align the time-resolved data before further analysis is carried out. Dynamic time warping (DTW) is considered as one of the common alignment techniques for time series data with equidistant time points. In this paper, we modified the DTW algorithm so that it can align sequences with measurements at different, non-equidistant time points with large gaps in between. This type of data is usually known as time-resolved data characterized by irregular time intervals between measurements as well as non-identical time points for different replicates. This new algorithm can be easily used to align time-resolved data from high-throughput experiments and to come across existing problems such as time scarcity and existing noise in the measurements. We propose a modified method of DTW to adapt requirements imposed by time-resolved data by use of monotone cubic interpolation splines. Our presented approach provides a nonlinear alignment of two sequences that neither need to have equi-distant time points nor measurements at identical time points. The proposed method is evaluated with artificial as well as real data. The software is available as an R package tra (Time-Resolved data Alignment) which is freely available at: http://public.ostfalia.de/klawonn/tra.zip .


2021 ◽  
Author(s):  
Min Shi ◽  
Shamim Mollah

Abstract: High-throughput studies of biological systems are rapidly generating a wealth of 'omics'-scale data. Many of these studies are time-series collecting proteomics and genomics data capturing dynamic observations. While time-series omics data are essential to unravel the mechanisms of various diseases, they often include missing (or incomplete) values resulting in data shortage. Data missing and shortage are especially problematic for downstream applications such as omics data integration and computational analyses that need complete and sufficient data representations. Data imputation and forecasting methods have been widely used to mitigate these issues. However, existing imputation and forecasting techniques typically address static omics data representing a single time point and perform forecasting on data with complete values. As a result, these techniques lack the ability to capture the time-ordered nature of data and cannot handle omics data containing missing values at multiple time points. Result: We propose a network-based method for time-series omics data imputation and forecasting (NeTOIF) that handle omics data containing missing values at multiple time points. NeTOIF takes advantage of topological relationships (e.g., protein-protein and gene-gene interactions) among omics data samples and incorporates a graph convolutional network to first infer the missing values at different time points. Then, we combine these inferred values with the original omics data to perform time-series imputation and forecasting using a long short-term memory network. Evaluating NeTOIF with a proteomic and a genomic dataset demonstrated a distinct advantage of NeTOIF over existing data imputation and forecasting methods. The average mean square error of NeTOIF improved 11.3% for imputation and 6.4% for forcasting compared to the baseline methods.


With the advancement of high-throughput technology, identifying differential expression has become an essential task in multiple domains of biomedical research, such as transcriptome, proteome, metabolome. A wide variety of computational methods and statistical approaches were developed for detecting differential expression. Most of these methods were applicable to modeling expression level of the entire set of features simultaneously. In this article, we provide a review emphasizing on moderated-t methods published in last two decades. We compared similarities and differences between them, and also discussed their limitations in applications.


2021 ◽  
Vol 22 (10) ◽  
pp. 5369
Author(s):  
Martina Pirro ◽  
Yassene Mohammed ◽  
Arnoud H. de Ru ◽  
George M. C. Janssen ◽  
Rayman T. N. Tjokrodirijo ◽  
...  

Developments in mass spectrometry (MS)-based analyses of glycoproteins have been important to study changes in glycosylation related to disease. Recently, the characteristic pattern of oxonium ions in glycopeptide fragmentation spectra had been used to assign different sets of glycopeptides. In particular, this was helpful to discriminate between O-GalNAc and O-GlcNAc. Here, we thought to investigate how such information can be used to examine quantitative proteomics data. For this purpose, we used tandem mass tag (TMT)-labeled samples from total cell lysates and secreted proteins from three different colorectal cancer cell lines. Following automated glycopeptide assignment (Byonic) and evaluation of the presence and relative intensity of oxonium ions, we observed that, in particular, the ratio of the ions at m/z 144.066 and 138.055, respectively, could be used to discriminate between O-GlcNAcylated and O-GalNAcylated peptides, with concomitant relative quantification between the different cell lines. Among the O-GalNAcylated proteins, we also observed anterior gradient protein 2 (AGR2), a protein which glycosylation site and status was hitherto not well documented. Using a combination of multiple fragmentation methods, we then not only assigned the site of modification, but also showed different glycosylation between intracellular (ER-resident) and secreted AGR2. Overall, our study shows the potential of broad application of the use of the relative intensities of oxonium ions for the confident assignment of glycopeptides, even in complex proteomics datasets.


2021 ◽  
Vol 22 (8) ◽  
pp. 4069
Author(s):  
Xiaoyang Chen ◽  
Zhangxin Pei ◽  
Pingping Li ◽  
Xiabing Li ◽  
Yuhang Duan ◽  
...  

Rice false smut is a fungal disease distributed worldwide and caused by Ustilaginoidea virens. In this study, we identified a putative ester cyclase (named as UvEC1) as being significantly upregulated during U. virens infection. UvEC1 contained a SnoaL-like polyketide cyclase domain, but the functions of ketone cyclases such as SnoaL in plant fungal pathogens remain unclear. Deletion of UvEC1 caused defects in vegetative growth and conidiation. UvEC1 was also required for response to hyperosmotic and oxidative stresses and for maintenance of cell wall integrity. Importantly, ΔUvEC1 mutants exhibited reduced virulence. We performed a tandem mass tag (TMT)-based quantitative proteomic analysis to identify differentially accumulating proteins (DAPs) between the ΔUvEC1-1 mutant and the wild-type isolate HWD-2. Proteomics data revealed that UvEC1 has a variety of effects on metabolism, protein localization, catalytic activity, binding, toxin biosynthesis and the spliceosome. Taken together, our findings suggest that UvEC1 is critical for the development and virulence of U. virens.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Kalpana Raja ◽  
Matthew Patrick ◽  
Yilin Gao ◽  
Desmond Madu ◽  
Yuyang Yang ◽  
...  

In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information.


Sign in / Sign up

Export Citation Format

Share Document