Proper imputation of missing values in proteomics datasets for differential expression analysis

Real Life ◽

Label Free ◽

Imputation Methods ◽

Abstract Label-free shotgun proteomics is an important tool in biomedical research, where tandem mass spectrometry with data-dependent acquisition (DDA) is frequently used for protein identification and quantification. However, the DDA datasets contain a significant number of missing values (MVs) that severely hinders proper analysis. Existing literature suggests that different imputation methods should be used for the two types of MVs: missing completely at random or missing not at random. However, the simulated or biased datasets utilized by most of such studies offer few clues about the composition and thus proper imputation of MVs in real-life proteomic datasets. Moreover, the impact of imputation methods on downstream differential expression analysis—a critical goal for many biomedical projects—is largely undetermined. In this study, we investigated public DDA datasets of various tissue/sample types to determine the composition of MVs in them. We then developed simulated datasets that imitate the MV profile of real-life datasets. Using such datasets, we compared the impact of various popular imputation methods on the analysis of differentially expressed proteins. Finally, we make recommendations on which imputation method(s) to use for proteomic data beyond just DDA datasets.

A proteomic study of microgravity cardiac effects: feature maps of label-free LC-MALDI data for differential expression analysis

Molecular BioSystems ◽

10.1039/c0mb00065e ◽

2010 ◽

Vol 6 (11) ◽

pp. 2218 ◽

Cited By ~ 3

Author(s):

Silvia Rocchiccioli ◽

Enrico Congiu ◽

Claudia Boccardi ◽

Lorenzo Citti ◽

Luciano Callipo ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Label Free ◽

Feature Maps ◽

Cardiac Effects ◽

Proteomic Study

ProtRank: bypassing the imputation of missing values in differential expression analysis of proteomic data

BMC Bioinformatics ◽

10.1186/s12859-019-3144-3 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Matúš Medo ◽

Daniel M. Aebersold ◽

Michaela Medová

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Missing Values ◽

New Method ◽

Random Numbers ◽

Differentially Expressed Proteins ◽

Analysis Method ◽

Proteomic Data ◽

Python Package

Abstract Background Data from discovery proteomic and phosphoproteomic experiments typically include missing values that correspond to proteins that have not been identified in the analyzed sample. Replacing the missing values with random numbers, a process known as “imputation”, avoids apparent infinite fold-change values. However, the procedure comes at a cost: Imputing a large number of missing values has the potential to significantly impact the results of the subsequent differential expression analysis. Results We propose a method that identifies differentially expressed proteins by ranking their observed changes with respect to the changes observed for other proteins. Missing values are taken into account by this method directly, without the need to impute them. We illustrate the performance of the new method on two distinct datasets and show that it is robust to missing values and, at the same time, provides results that are otherwise similar to those obtained with edgeR which is a state-of-art differential expression analysis method. Conclusions The new method for the differential expression analysis of proteomic data is available as an easy to use Python package.

The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis

BMC Bioinformatics ◽

10.1186/1471-2105-14-124 ◽

2013 ◽

Vol 14 (1) ◽

pp. 124 ◽

Cited By ~ 41

Author(s):

Xing Qiu ◽

Hulin Wu ◽

Rui Hu

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Gene Differential Expression ◽

Testing Power ◽

isomiRs-specific differential expression is the rule, not the exception: Are we missing hundreds of species in microRNA analysis?

10.1101/2021.12.15.472814 ◽

2021 ◽

Author(s):

Eloi Schmauch ◽

Pia Laitinen ◽

Tiia A Turunen ◽

Mari-Anna Vaananen ◽

Tarja Malm ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Umbilical Vein ◽

Expression Profiles ◽

Biological Significance ◽

Complex Signal ◽

Reference Sequences ◽

Specific Regulation ◽

MicroRNAs (miRNAs) are small RNA molecules that act as regulators of gene expression through targeted mRNA degradation. They are involved in many biological and pathophysiological processes and are widely studied as potential biomarkers and therapeutics agents for human diseases, including cardiovascular disorders. Recently discovered isoforms of miRNAs (isomiRs) exist in high quantities and are very diverse. Despite having few differences with their corresponding reference miRNAs, they display specific functions and expression profiles, across tissues and conditions. However, they are still overlooked and understudied, as we lack a comprehensive view on their condition-specific regulation and impact on differential expression analysis. Here, we show that isomiRs can have major effects on differential expression analysis results, as their expression is independent of their host miRNA genes or reference sequences. We present two miRNA-seq datasets from human umbilical vein endothelial cells, and assess isomiR expression in response to senescence and compartment-specificity (nuclear/cytosolic) under hypoxia. We compare three different methods for miRNA analysis, including isomiR-specific analysis, and show that ignoring isomiRs induces major biases in differential expression. Moreover, isomiR analysis permits higher resolution of complex signal dissection, such as the impact of hypoxia on compartment localization, and differential isomiR type enrichments between compartments. Finally, we show important distribution differences across conditions, independently of global miRNA expression signals. Our results raise concerns over the quasi exclusive use of miRNA reference sequences in miRNA-seq processing and experimental assays. We hope that our work will guide future isomiR expression studies, which will correct some biases introduced by golden standard analysis, improving the resolution of such assays and the biological significance of their downstream studies.

Faculty Opinions recommendation of Differential expression analysis for sequence count data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.6932959.7123057 ◽

2010 ◽

Author(s):

Sarah Teichmann ◽

Daniel Hebenstreit

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Count Data ◽

Differential Expression Analysis

Best practices on the differential expression analysis of multi-species RNA-seq

Genome Biology ◽

10.1186/s13059-021-02337-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew Chung ◽

Vincent M. Bruno ◽

David A. Rasko ◽

Christina A. Cuomo ◽

José F. Muñoz ◽

...

Keyword(s):

Best Practices ◽

Differential Expression ◽

Expression Analysis ◽

Single Species ◽

Rna Seq ◽

Species Analysis ◽

Differential Gene ◽

Multiple Species ◽

Downstream Analysis

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.

Characterization of the nuclear and cytosolic transcriptomes in human brain tissue reveals new insights into the subcellular distribution of RNA transcripts

Scientific Reports ◽

10.1038/s41598-021-83541-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ammar Zaghlool ◽

Adnan Niazi ◽

Åsa K. Björklund ◽

Jakub Orzechowski Westholm ◽

Adam Ameur ◽

...

Keyword(s):

Human Brain ◽

Expression Analysis ◽

Adult Brain ◽

Sequencing Data ◽

Human Brain Tissue ◽

Protein Coding ◽

Rna Transcripts ◽

Nuclear Rna ◽

AbstractTranscriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our understanding of gene function, and interpretation of gene expression signatures in cells. Here, we separated cytosolic and nuclear RNA from human fetal and adult brain samples and performed a comprehensive analysis of cytosolic and nuclear transcriptomes. There are significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. We show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Differential expression analysis between fetal and adult frontal cortex show that results obtained from the cytosolic RNA differ from results using nuclear RNA both at the level of transcript types and the number of differentially expressed genes. Our data provide a resource for the subcellular localization of thousands of RNA transcripts in the human brain and highlight differences in using the cytosolic or the nuclear transcriptomes for expression analysis.