scholarly journals High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Weitong Cui ◽  
Huaru Xue ◽  
Lei Wei ◽  
Jinghua Jin ◽  
Xuewen Tian ◽  
...  

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Sushant Patkar ◽  
Kerstin Heselmeyer-Haddad ◽  
Noam Auslander ◽  
Daniela Hirsch ◽  
Jordi Camps ◽  
...  

Abstract Background Many carcinomas have recurrent chromosomal aneuploidies specific to the tissue of tumor origin. The reason for this specificity is not completely understood. Methods In this study, we looked at the frequency of chromosomal arm gains and losses in different cancer types from the The Cancer Genome Atlas (TCGA) and compared them to the mean gene expression of each chromosome arm in corresponding normal tissues of origin from the Genotype-Tissue Expression (GTEx) database, in addition to the distribution of tissue-specific oncogenes and tumor suppressors on different chromosome arms. Results This analysis revealed a complex picture of factors driving tumor karyotype evolution in which some recurrent chromosomal copy number reflect the chromosome arm-wide gene expression levels of the their normal tissue of tumor origin. Conclusions We conclude that the cancer type-specific distribution of chromosomal arm gains and losses is potentially “hardwiring” gene expression levels characteristic of the normal tissue of tumor origin, in addition to broadly modulating the expression of tissue-specific tumor driver genes.


2020 ◽  
Author(s):  
Hansapani Rodrigo ◽  
Bryan Martinez ◽  
Roberto De La Garza ◽  
Upal Roy

Abstract Background: HIV Associated Neurological Disorders (HAND) is relatively common among people with HIV-1 infection, even those taking combined antiretroviral treatment (cART). Genome-wide screening of transcription regulation in brain tissue helps in identifying substantial abnormalities present in patients’ gene transcripts and to discover possible biomarkers for HAND. This study explores the possibility of identifying differentially expressed (DE) genes, which can serve as potential biomarkers to detect HAND. In this study, we have investigated the gene expression levels of three subject groups with different impairment levels of HAND along with a control group in three distinct brain sectors: white matter, frontal cortex, and basal ganglia. Methods: Linear models with weighted least squares along with Benjamini-Hochberg multiple corrections were used to identify DE genes in each brain region. Genes with an adjusted p-value of less than 0.01 were identified as differentially expressed. Principal component analyses (PCA) were performed to detect any groupings among the subject groups. Significance Analysis of Microarrays (SAM) and random forests (RF) methods with two distinct approaches were used to identify DE genes. Results: A total of 710 genes in basal ganglia, 794 genes in the frontal cortex, and 1481 genes in white matter were screened. The highest proportion of DE genes was observed within the two brain regions, frontal neocortex, and basal ganglia. PCA analyses do not exhibit clear groupings among four subject groups. SAM and RF models reveal the genes, CIRBP, RBM3, GPNMB, ISG15, IFIT6, IFI6, and IFIT3, to have DE genes in the frontal cortex or basal ganglia among the subject groups. The gene, GADD45A, a protein-coding gene whose transcript levels tend to increase with stressful growth arrest conditions, was consistently ranked among the top genes by both RF models within the frontal cortex. Conclusions: Our study contributes to a comprehensive understanding of the gene expression levels of the subject with different severity levels of HAND. Several genes that appear to play critical roles in the inflammatory response have been found, and they have an excellent potential to be used as biomarkers to detect HAND under further investigations.


2021 ◽  
Author(s):  
Jian-Rong Li ◽  
Mabel Tang ◽  
Yafang Li ◽  
Christopher I Amos ◽  
Chao Cheng

Abstract Background: Expression quantitative trait loci (eQTLs) analyses have been widely used to identify genetic variants associated with gene expression levels to understand what molecular mechanisms underlie genetic traits. The resultant eQTLs might affect the expression of associated genes through transcriptional or post-transcriptional regulation. In this study, we attempt to distinguish these two types of regulation by identifying genetic variants associated with mRNA stability of genes (stQTLs).Results: Here, we presented a computational framework that take the advantage of recently developed methods to infer the mRNA stability of genes based on RNA-seq data and performed association analysis to identify stQTLs. Using the Genotype-Tissue Expression (GTEx) lung RNA-Seq data, we identified a total of 142,801 stQTLs for 3,942 genes and 186,132 eQTLs for 4,751 genes from 15,122,700 genetic variants for 13,476 genes, respectively. Interesting, our results indicated that stQTLs were enriched in the CDS and 3’UTR regions, while eQTLs are enriched in the CDS, 3’UTR, 5’UTR, and upstream regions. We also found that stQTLs are more likely than eQTLs to overlap with RNA binding protein (RBP) and microRNA (miRNA) binding sites. Our analyses demonstrate that simultaneous identification of stQTLs and eQTLs can provide more mechanistic insight on the association between genetic variants and gene expression levels.


2010 ◽  
Vol 08 (supp01) ◽  
pp. 177-192 ◽  
Author(s):  
XI WANG ◽  
ZHENGPENG WU ◽  
XUEGONG ZHANG

Due to its unprecedented high-resolution and detailed information, RNA-seq technology based on next-generation high-throughput sequencing significantly boosts the ability to study transcriptomes. The estimation of genes' transcript abundance levels or gene expression levels has always been an important question in research on the transcriptional regulation and gene functions. On the basis of the concept of Reads Per Kilo-base per Million reads (RPKM), taking the union-intersection genes (UI-based) and summing up inferred isoform abundance (isoform-based) are the two current strategies to estimate gene expression levels, but produce different estimations. In this paper, we made the first attempt to compare the two strategies' performances through a series of simulation studies. Our results showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy. If taking into account the non-uniformity of read distribution, the isoform-based method can further reduce estimation errors. We applied both strategies to real RNA-seq datasets of technical replicates, and found that the isoform-based strategy also displays a better performance. For a more accurate estimation of gene expression levels from RNA-seq data, even if the abundance levels of isoforms are not of interest, it is still better to first infer the isoform abundance and sum them up to get the expression level of a gene as a whole.


Genes ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 19 ◽  
Author(s):  
Chao Zhang ◽  
Xiang-Dong Liu

Wing dimorphism is considered as an adaptive trait of insects. Brown planthoppers (BPHs) Nilaparvata lugens, a serious pest of rice, are either macropterous or brachypterous. Genetic and environmental factors are both likely to control wing morph determination in BPHs, but the hereditary law and genes network are still unknown. Here, we investigated changes in gene expression levels between macropterous and brachypterous BPHs by creating artificially bred morphotype lines. The nearly pure-bred strains of macropterous and brachypterous BPHs were established, and their transcriptomes and gene expression levels were compared. Over ten-thousand differentially expressed genes (DEGs) between macropterous and brachypterous strains were found in the egg, nymph, and adult stages, and the three stages shared 6523 DEGs. The regulation of actin cytoskeleton, focal adhesion, tight junction, and adherens junction pathways were consistently enriched with DEGs across the three stages, whereas insulin signaling pathway, metabolic pathways, vascular smooth muscle contraction, platelet activation, oxytocin signaling pathway, sugar metabolism, and glycolysis/gluconeogenesis were significantly enriched by DEGs in a specific stage. Gene expression trend profiles across three stages were different between the two strains. Eggs, nymphs, and adults from the macropterous strain were distinguishable from the brachypterous based on gene expression levels, and genes that were related to wing morphs were differentially expressed between wing strains or strain × stage. A proposed mode based on genes and environments to modulate the wing dimorphism of BPHs was provided.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jin Wang ◽  
Qinxue Zhang ◽  
Xiong You ◽  
Xilin Hou

BackgroundNon-heading Chinese cabbage (Brassica rapa ssp. chinensis) is an important leaf vegetable grown worldwide. However, there has currently been not enough transcriptome and small RNA combined sequencing analysis of cold tolerance, which hinders further functional genomics research.ResultsIn this study, 63.43 Gb of clean data was obtained from the transcriptome analysis. The clean data of each sample reached 6.99 Gb, and the basic percentage of Q30 was 93.68% and above. The clean reads of each sample were sequence aligned with the designated reference genome (Brassica rapa, IVFCAASv1), and the efficiency of the alignment varied from 81.54 to 87.24%. According to the comparison results, 1,860 new genes were discovered in Pak-choi, of which 1,613 were functionally annotated. Among them, 13 common differentially expressed genes were detected in all materials, including seven upregulated and six downregulated. At the same time, we used quantitative real-time PCR to confirm the changes of these gene expression levels. In addition, we sequenced miRNA of the same material. Our findings revealed a total of 34,182,333 small RNA reads, 88,604,604 kinds of small RNAs, among which the most common size was 24 nt. In all materials, the number of common differential miRNAs is eight. According to the corresponding relationship between miRNA and its target genes, we carried out Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analysis on the set of target genes on each group of differentially expressed miRNAs. Through the analysis, it is found that the distributions of candidate target genes in different materials are different. We not only used transcriptome sequencing and small RNA sequencing but also used experiments to prove the expression levels of differentially expressed genes that were obtained by sequencing. Sequencing combined with experiments proved the mechanism of some differential gene expression levels after low-temperature treatment.ConclusionIn all, this study provides a resource for genetic and genomic research under abiotic stress in Pak-choi.


2014 ◽  
Author(s):  
Jenny Tung ◽  
Xiang Zhou ◽  
Susan C Alberts ◽  
Matthew Stephens ◽  
Yoav Gilad

Gene expression variation is well documented in human populations and its genetic architecture has been extensively explored. However, we still know little about the genetic architecture of gene expression variation in other species, particularly our closest living relatives, the nonhuman primates. To address this gap, we performed an RNA sequencing (RNA-seq)-based study of 63 wild baboons, members of the intensively studied Amboseli baboon population in Kenya. Our study design allowed us to measure gene expression levels and identify genetic variants using the same data set, enabling us to perform complementary mapping of putative cis-acting expression quantitative trait loci (eQTL) and measurements of allele-specific expression (ASE) levels. We discovered substantial evidence for genetic effects on gene expression levels in this population. Surprisingly, we found more power to detect individual eQTL in the baboons relative to a HapMap human data set of comparable size, probably as a result of greater genetic variation, enrichment of SNPs with high minor allele frequencies, and longer-range linkage disequilibrium in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes. Interestingly, genes with eQTL significantly overlapped between the baboon and human data sets, suggesting that some genes may tolerate more genetic perturbation than others, and that this property may be conserved across species. Finally, we used a Bayesian sparse linear mixed model to partition genetic, demographic, and early environmental contributions to variation in gene expression levels. We found a strong genetic contribution to gene expression levels for almost all genes, while individual demographic and environmental effects tended to be more modest. Together, our results establish the feasibility of eQTL mapping using RNA-seq data alone, and act as an important first step towards understanding the genetic architecture of gene expression variation in nonhuman primates.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Jenny Tung ◽  
Xiang Zhou ◽  
Susan C Alberts ◽  
Matthew Stephens ◽  
Yoav Gilad

Primate evolution has been argued to result, in part, from changes in how genes are regulated. However, we still know little about gene regulation in natural primate populations. We conducted an RNA sequencing (RNA-seq)-based study of baboons from an intensively studied wild population. We performed complementary expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses, discovering substantial evidence for, and surprising power to detect, genetic effects on gene expression levels in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes; interestingly, genes with eQTL significantly overlapped between baboons and a comparable human eQTL data set. Our results suggest that genes vary in their tolerance of genetic perturbation, and that this property may be conserved across species. Further, they establish the feasibility of eQTL mapping using RNA-seq data alone, and represent an important step towards understanding the genetic architecture of gene expression in primates.


Sign in / Sign up

Export Citation Format

Share Document