Exploring gene expression levels in Pancreatic Ductal Adenocarcinoma (PDAC) using RNA-Seq data

Author(s):  
Alokita Jaiswal ◽  
Imlimaong Aier
2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Weitong Cui ◽  
Huaru Xue ◽  
Lei Wei ◽  
Jinghua Jin ◽  
Xuewen Tian ◽  
...  

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.


2021 ◽  
Author(s):  
Jian-Rong Li ◽  
Mabel Tang ◽  
Yafang Li ◽  
Christopher I Amos ◽  
Chao Cheng

Abstract Background: Expression quantitative trait loci (eQTLs) analyses have been widely used to identify genetic variants associated with gene expression levels to understand what molecular mechanisms underlie genetic traits. The resultant eQTLs might affect the expression of associated genes through transcriptional or post-transcriptional regulation. In this study, we attempt to distinguish these two types of regulation by identifying genetic variants associated with mRNA stability of genes (stQTLs).Results: Here, we presented a computational framework that take the advantage of recently developed methods to infer the mRNA stability of genes based on RNA-seq data and performed association analysis to identify stQTLs. Using the Genotype-Tissue Expression (GTEx) lung RNA-Seq data, we identified a total of 142,801 stQTLs for 3,942 genes and 186,132 eQTLs for 4,751 genes from 15,122,700 genetic variants for 13,476 genes, respectively. Interesting, our results indicated that stQTLs were enriched in the CDS and 3’UTR regions, while eQTLs are enriched in the CDS, 3’UTR, 5’UTR, and upstream regions. We also found that stQTLs are more likely than eQTLs to overlap with RNA binding protein (RBP) and microRNA (miRNA) binding sites. Our analyses demonstrate that simultaneous identification of stQTLs and eQTLs can provide more mechanistic insight on the association between genetic variants and gene expression levels.


2020 ◽  
Author(s):  
Huatian Luo ◽  
Da-qiu Chen ◽  
Jing-jing Pan ◽  
Zhang-wei Wu ◽  
Can Yang ◽  
...  

Abstract Background: Pancreatic cancer has many pathologic types, among which pancreatic ductal adenocarcinoma (PDAC) is the most common one. Bioinformatics has become a very common tool for the selection of potentially pathogenic genes. Methods: Three data sets containing the gene expression profiles of PDAC were downloaded from the gene expression omnibus (GEO) database. The limma package of R language was utilized to explore the differentially expressed genes (DEGs). To analyze functions and signaling pathways, the Database Visualization and Integrated Discovery (DAVID) was used. To visualize the protein-protein interaction (PPI) of the DEGs ,Cytoscape was performed under the utilization of Search Tool for the Retrieval of Interacting Genes (STRING). With the usage of the plug-in cytoHubba in cytoscape software, the hub genes were found out. To verify the expression levels of hub genes, Gene Expression Profiling Interactive Analysis (GEPIA) was performed. Last but not least, UALCAN analysis online tool was implemented to analyze the overall survival. Results: The 376 DEGs were highly enriched in biological processes including signal transduction, apoptotic process and several pathways, mainly associated with Protein digestion and absorption and Pancreatic secretion pathway. The expression levels of nucleolar and spindle associated protein 1 (NUSAP1) and SHC binding and spindle associated 1 (SHCBP1) were discovered highly expressed in pancreatic ductal adenocarcinoma tissues. NUSAP1 and SHCBP1 had a high correlation with prognosis. Conclusions: The findings of this bioinformatics analysis indicate that NUSAP1 and SHCBP1 may be key factors in the prognosis and treatment of pancreatic cancer.


2010 ◽  
Vol 08 (supp01) ◽  
pp. 177-192 ◽  
Author(s):  
XI WANG ◽  
ZHENGPENG WU ◽  
XUEGONG ZHANG

Due to its unprecedented high-resolution and detailed information, RNA-seq technology based on next-generation high-throughput sequencing significantly boosts the ability to study transcriptomes. The estimation of genes' transcript abundance levels or gene expression levels has always been an important question in research on the transcriptional regulation and gene functions. On the basis of the concept of Reads Per Kilo-base per Million reads (RPKM), taking the union-intersection genes (UI-based) and summing up inferred isoform abundance (isoform-based) are the two current strategies to estimate gene expression levels, but produce different estimations. In this paper, we made the first attempt to compare the two strategies' performances through a series of simulation studies. Our results showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy. If taking into account the non-uniformity of read distribution, the isoform-based method can further reduce estimation errors. We applied both strategies to real RNA-seq datasets of technical replicates, and found that the isoform-based strategy also displays a better performance. For a more accurate estimation of gene expression levels from RNA-seq data, even if the abundance levels of isoforms are not of interest, it is still better to first infer the isoform abundance and sum them up to get the expression level of a gene as a whole.


2014 ◽  
Author(s):  
Jenny Tung ◽  
Xiang Zhou ◽  
Susan C Alberts ◽  
Matthew Stephens ◽  
Yoav Gilad

Gene expression variation is well documented in human populations and its genetic architecture has been extensively explored. However, we still know little about the genetic architecture of gene expression variation in other species, particularly our closest living relatives, the nonhuman primates. To address this gap, we performed an RNA sequencing (RNA-seq)-based study of 63 wild baboons, members of the intensively studied Amboseli baboon population in Kenya. Our study design allowed us to measure gene expression levels and identify genetic variants using the same data set, enabling us to perform complementary mapping of putative cis-acting expression quantitative trait loci (eQTL) and measurements of allele-specific expression (ASE) levels. We discovered substantial evidence for genetic effects on gene expression levels in this population. Surprisingly, we found more power to detect individual eQTL in the baboons relative to a HapMap human data set of comparable size, probably as a result of greater genetic variation, enrichment of SNPs with high minor allele frequencies, and longer-range linkage disequilibrium in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes. Interestingly, genes with eQTL significantly overlapped between the baboon and human data sets, suggesting that some genes may tolerate more genetic perturbation than others, and that this property may be conserved across species. Finally, we used a Bayesian sparse linear mixed model to partition genetic, demographic, and early environmental contributions to variation in gene expression levels. We found a strong genetic contribution to gene expression levels for almost all genes, while individual demographic and environmental effects tended to be more modest. Together, our results establish the feasibility of eQTL mapping using RNA-seq data alone, and act as an important first step towards understanding the genetic architecture of gene expression variation in nonhuman primates.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Jenny Tung ◽  
Xiang Zhou ◽  
Susan C Alberts ◽  
Matthew Stephens ◽  
Yoav Gilad

Primate evolution has been argued to result, in part, from changes in how genes are regulated. However, we still know little about gene regulation in natural primate populations. We conducted an RNA sequencing (RNA-seq)-based study of baboons from an intensively studied wild population. We performed complementary expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses, discovering substantial evidence for, and surprising power to detect, genetic effects on gene expression levels in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes; interestingly, genes with eQTL significantly overlapped between baboons and a comparable human eQTL data set. Our results suggest that genes vary in their tolerance of genetic perturbation, and that this property may be conserved across species. Further, they establish the feasibility of eQTL mapping using RNA-seq data alone, and represent an important step towards understanding the genetic architecture of gene expression in primates.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 1042-1042
Author(s):  
Kohei Hosokawa ◽  
Sachiko Kajigaya ◽  
Keyvan Keyvanfar ◽  
Qiao Wangmin ◽  
Yanling Xie ◽  
...  

Abstract Background. Paroxysmal nocturnal hemoglobinuria (PNH) is a rare acquired blood disease, characterized by hemolytic anemia, bone marrow (BM) failure, and venous thrombosis. The etiology of PNH is a somatic mutation in the phosphatidylinositol glycan class A gene (PIG-A) on the X chromosome, which causes deficiency in glycosyl phosphatidylinositol-anchored proteins (GPI-APs). The involvement of T cells in PNH is strongly supported by clinical overlap between PNH and aplastic anemia (AA); the presence of GPI-AP deficient cells in AA associated with favorable response to immunosuppressive therapy; and an oligoclonal T cell repertoire in PNH patients. However, the molecular mechanisms responsible for the aberrant immune responses in PNH patients are not well understood. To identify aberrant molecular mechanisms involved in immune targeting of hematopoietic stem cells in BM, RNA sequencing (RNA-seq) was applied to examine the transcriptome of T cell subsets from PNH patients and healthy controls. Method. Blood samples were obtained after informed consent from 15 PNH patients and 15 age-matched healthy controls. For RNA extraction, freshly isolated peripheral blood mononuclear cells were sorted on the same day of blood draw to obtain four different T cell (CD3+ CD14- CD19- ViViD-) populations [CD4+ naïve (CD45RA+ CD45RO-), CD4+ memory (CD45RA- CD45RO+), CD8+ naïve (CD45RA+ CD45RO-), and CD8+ memory (CD45RA- CD45RO+) T cells] by fluorescence-activated cell sorter . RNA-Seq analysis from three PNH and three healthy controls was performed using the Illumina HiSeq™ 2000 platform. The Ingenuity® Pathway Analysis and Gene set enrichment analysis (GSEA) were employed to elucidate transcriptional pathways. RNA-seq data were validated by flow cytometry and quantitative real-time RT-PCR (RT-qPCR). Results and Discussion . Differentially expressed gene analysis of four T cell subsets showed distinct gene expression signatures in individual T cell subsets. In CD4+ naïve T cells, 11 gene expression levels were significantly different: five upregulated (including SRRM2 and TNFSF8) and six downregulated genes (including GIMAP6) (> 2 fold change, false discovery rate [FDR] < 0.05). In CD4+ memory T cells, 25 gene expression levels were significantly different: 15 upregulated (including JUND and TOB1) and 10 downregulated genes (including GIMAP4). In CD8+ naive T cells, only two gene expression levels were significantly different: upregulated CTSW and downregulated RPL9. In CD8+ memory T cells, seven gene expression levels were significantly different: two upregulated (CTSW and DPP4) and five downregulated genes (including SLC12A7). Further, differentially expressed gene analysis was performed by combining CD4+ naïve, CD4+ memory, CD8+ naïve, and CD8+ memory T cells from PNH or healthy controls, respectively. Out of 55 gene expression levels that were significantly different, 41 were upregulated (including TNFAIP3, JUN, JUND, TOB1, TNFSF8, and CD69) and 14 downregulated (including GIMAP4). By canonical pathways analysis, putative gene network interactions of differentially expressed genes were significantly enriched for canonical pathways of TNFR1, TNFR2, IL-17A, and CD27 signaling. By GSEA, the most significantly upregulated gene sets in CD4+ naïve, CD4+ memory, CD8+ naïve, and CD8+ memory T cells from PNH patients displayed gene signatures related to the "IGF1 pathway", "Pre-NOTCH expression and processing", "AP-1 pathway", and "ATF2 pathway", respectively. For validation of the RNA-seq data, we chose seven genes (TNFAIP3, JUN, JUND, TOB1, TNFSF8, CD69, and CTSW) because these are important mediators involved in regulation for T cells and dysregulation of these genes is associated with autoimmune diseases. Differential expression levels of TNFAIP3, JUN, and TOB1 were validated by RT-qPCR. By flow cytometry, higher expression of CD69 and TNFSF8 was confirmed in CD4+ and CD8+T cells from PNH compared to healthy controls. Conclusion. Using RNA-seq, we identified novel molecular mechanisms and pathways which may underlie the aberrant T cell immune status in PNH. Specific dysregulation of T cell intracellular signaling may contribute to BM failure and the inflammatory environment in PNH. Understanding these pathways may provide new therapeutic strategies to modulate T cell immune responses in BM failure. Disclosures Hosokawa: Aplastic Anemia and MDS International Foundation: Research Funding. Rios:GSK/Novartis: Research Funding. Weinstein:GSK/Novartis: Research Funding. Townsley:GSK/Novartis: Research Funding.


2014 ◽  
Author(s):  
Nuno A Fonseca ◽  
John A Marioni ◽  
Alvis Brazma

Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignment/quantification tools can have a major effect upon the expression levels found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the ``true'' expression levels? We evaluate fifty gene profiling pipelines in experimental and simulated data sets with different characteristics (e.g, read length and sequencing depth). In the absence of knowledge of the 'ground truth' in real RNAseq data sets, we used simulated data to assess the differences between the true expression and those reconstructed by the analysis pipelines. Even though this approach does not take into account all known biases present in RNAseq data, it still allows to assess the accuracy of the gene expression values inferred by different analysis pipelines. The results show that i) overall there is a high correlation between the expression levels inferred by the best pipelines and the true quantification values; ii) the error in the estimated gene expression values can vary considerably across genes; and iii) a small set of genes have expression estimates with consistently high error (across data sets and methods). Finally, although the mapping software is important, the quantification method makes a greater difference to the results.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e16744-e16744
Author(s):  
Daruka Mahadevan ◽  
Ritu Pandey ◽  
Yuliang Chen ◽  
Jacob Essif ◽  
Aisha Al-Khinji

e16744 Background: Carcinoembryonic cell adhesion molecule 6 (CEACAM6) is a cell adhesion receptor of the Ig-superfamily overexpressed in human Pancreatic Ductal Adenocarcinoma (PDA), enriching to the classical activated stroma subtype. CEACAM6 has multifaceted roles in PDA and is a poor prognostic maker (Pandey et al. Sci Rep 2019). We report functional correlative studies across PDA cell lines with high vs KO vs low of CEACAM6 and a PDX model with a therapeutic Mab. Methods: RNA-Seq and microarray expression data of PDA cell lines were downloaded from GEO using R (4.3), normalized and log transformed for analysis: CEACAM6 high vs. low were assessed for differential gene expression changes. Correlation of CEACAM6 levels with genes of interest was studied and compared with the CEACAM6 KO proteomic profile of HPAF-II cells. CEACAM6 WT vs. KO cells were profiled for protein kinase (PK) activity (PAMChip) and gene expression changes by RNA-Seq. NSG-CD34+ mice bearing PDX were evaluated with a humanized anti-CEACAM6 Mab for anti-tumor activity. Results: Differential expression analyses between PDA cell lines with low vs KO vs high CEACAM6 resulted in identifying similar markers changing in quantitative proteomics. KRT20, SYTL1, SKIL, CES1P1, MAN1A1 were down-regulated and HMOX1, CPNE2, ABCD1 were up-regulated in CEACAM6 low or KO cell lines. Specific PKs are upregulated in CEACAM6 KO enriching to the TK family (EPH A1, 3, 4, 8 and HCK), AGC family (e.g. AKT, PKA) and cellular apoptosis (e.g. BAD). RNA-Seq of CEACAM6 WT vs KO cells reconfirmed the up-regulation of MMP1, IL2RG, ATP6V0D2 and low expression of KRT20, AGK and MAN1A1 in CEACAM6 KO cells. Pharmacologic inhibition with a humanized anti-CEACAM6 scFv-Fc (IgG4) in PDA PDX of NSG CD34+ mice demonstrated ~55% tumor growth inhibition (TGI) with enhanced survival of > 14 days vs. control. Conclusions: CEACAM6 is expressed exclusively in primates and humans and plays multifaceted oncogenic roles in PDA pathogenesis. When CEACAM6 is disrupted, ECM proteins are altered reshaping the stroma, activating specific PKs and priming apoptosis. The therapeutic anti-CEACAM6 Mab possesses anti-tumor activity with associated cellular apoptosis and increased mouse survival.


Sign in / Sign up

Export Citation Format

Share Document