Proper imputation of missing values in proteomics datasets for differential expression analysis

Author(s):  
Mingyi Liu ◽  
Ashok Dongre

Abstract Label-free shotgun proteomics is an important tool in biomedical research, where tandem mass spectrometry with data-dependent acquisition (DDA) is frequently used for protein identification and quantification. However, the DDA datasets contain a significant number of missing values (MVs) that severely hinders proper analysis. Existing literature suggests that different imputation methods should be used for the two types of MVs: missing completely at random or missing not at random. However, the simulated or biased datasets utilized by most of such studies offer few clues about the composition and thus proper imputation of MVs in real-life proteomic datasets. Moreover, the impact of imputation methods on downstream differential expression analysis—a critical goal for many biomedical projects—is largely undetermined. In this study, we investigated public DDA datasets of various tissue/sample types to determine the composition of MVs in them. We then developed simulated datasets that imitate the MV profile of real-life datasets. Using such datasets, we compared the impact of various popular imputation methods on the analysis of differentially expressed proteins. Finally, we make recommendations on which imputation method(s) to use for proteomic data beyond just DDA datasets.

2010 ◽  
Vol 6 (11) ◽  
pp. 2218 ◽  
Author(s):  
Silvia Rocchiccioli ◽  
Enrico Congiu ◽  
Claudia Boccardi ◽  
Lorenzo Citti ◽  
Luciano Callipo ◽  
...  

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Matúš Medo ◽  
Daniel M. Aebersold ◽  
Michaela Medová

Abstract Background Data from discovery proteomic and phosphoproteomic experiments typically include missing values that correspond to proteins that have not been identified in the analyzed sample. Replacing the missing values with random numbers, a process known as “imputation”, avoids apparent infinite fold-change values. However, the procedure comes at a cost: Imputing a large number of missing values has the potential to significantly impact the results of the subsequent differential expression analysis. Results We propose a method that identifies differentially expressed proteins by ranking their observed changes with respect to the changes observed for other proteins. Missing values are taken into account by this method directly, without the need to impute them. We illustrate the performance of the new method on two distinct datasets and show that it is robust to missing values and, at the same time, provides results that are otherwise similar to those obtained with edgeR which is a state-of-art differential expression analysis method. Conclusions The new method for the differential expression analysis of proteomic data is available as an easy to use Python package.


2021 ◽  
Author(s):  
Eloi Schmauch ◽  
Pia Laitinen ◽  
Tiia A Turunen ◽  
Mari-Anna Vaananen ◽  
Tarja Malm ◽  
...  

MicroRNAs (miRNAs) are small RNA molecules that act as regulators of gene expression through targeted mRNA degradation. They are involved in many biological and pathophysiological processes and are widely studied as potential biomarkers and therapeutics agents for human diseases, including cardiovascular disorders. Recently discovered isoforms of miRNAs (isomiRs) exist in high quantities and are very diverse. Despite having few differences with their corresponding reference miRNAs, they display specific functions and expression profiles, across tissues and conditions. However, they are still overlooked and understudied, as we lack a comprehensive view on their condition-specific regulation and impact on differential expression analysis. Here, we show that isomiRs can have major effects on differential expression analysis results, as their expression is independent of their host miRNA genes or reference sequences. We present two miRNA-seq datasets from human umbilical vein endothelial cells, and assess isomiR expression in response to senescence and compartment-specificity (nuclear/cytosolic) under hypoxia. We compare three different methods for miRNA analysis, including isomiR-specific analysis, and show that ignoring isomiRs induces major biases in differential expression. Moreover, isomiR analysis permits higher resolution of complex signal dissection, such as the impact of hypoxia on compartment localization, and differential isomiR type enrichments between compartments. Finally, we show important distribution differences across conditions, independently of global miRNA expression signals. Our results raise concerns over the quasi exclusive use of miRNA reference sequences in miRNA-seq processing and experimental assays. We hope that our work will guide future isomiR expression studies, which will correct some biases introduced by golden standard analysis, improving the resolution of such assays and the biological significance of their downstream studies.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Matthew Chung ◽  
Vincent M. Bruno ◽  
David A. Rasko ◽  
Christina A. Cuomo ◽  
José F. Muñoz ◽  
...  

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ammar Zaghlool ◽  
Adnan Niazi ◽  
Åsa K. Björklund ◽  
Jakub Orzechowski Westholm ◽  
Adam Ameur ◽  
...  

AbstractTranscriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our understanding of gene function, and interpretation of gene expression signatures in cells. Here, we separated cytosolic and nuclear RNA from human fetal and adult brain samples and performed a comprehensive analysis of cytosolic and nuclear transcriptomes. There are significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. We show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Differential expression analysis between fetal and adult frontal cortex show that results obtained from the cytosolic RNA differ from results using nuclear RNA both at the level of transcript types and the number of differentially expressed genes. Our data provide a resource for the subcellular localization of thousands of RNA transcripts in the human brain and highlight differences in using the cytosolic or the nuclear transcriptomes for expression analysis.


Sign in / Sign up

Export Citation Format

Share Document