scholarly journals Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification Tools for Clinical Breast Cancer Research

2019 ◽  
Vol 9 (2) ◽  
pp. 18 ◽  
Author(s):  
Isaac D. Raplee ◽  
Alexei V. Evsikov ◽  
Caralina Marín de Evsikova

The rapid expansion of transcriptomics and affordability of next-generation sequencing (NGS) technologies generate rocketing amounts of gene expression data across biology and medicine, including cancer research. Concomitantly, many bioinformatics tools were developed to streamline gene expression and quantification. We tested the concordance of NGS RNA sequencing (RNA-seq) analysis outcomes between two predominant programs for read alignment, HISAT2, and STAR, and two most popular programs for quantifying gene expression in NGS experiments, edgeR and DESeq2, using RNA-seq data from breast cancer progression series, which include histologically confirmed normal, early neoplasia, ductal carcinoma in situ and infiltrating ductal carcinoma samples microdissected from formalin fixed, paraffin embedded (FFPE) breast tissue blocks. We identified significant differences in aligners’ performance: HISAT2 was prone to misalign reads to retrogene genomic loci, STAR generated more precise alignments, especially for early neoplasia samples. edgeR and DESeq2 produced similar lists of differentially expressed genes, with edgeR producing more conservative, though shorter, lists of genes. Gene Ontology (GO) enrichment analysis revealed no skewness in significant GO terms identified among differentially expressed genes by edgeR versus DESeq2. As transcriptomics of FFPE samples becomes a vanguard of precision medicine, choice of bioinformatics tools becomes critical for clinical research. Our results indicate that STAR and edgeR are well-suited tools for differential gene expression analysis from FFPE samples.

Author(s):  
Isaac Raplee ◽  
Alexei Evsikov ◽  
Caralina Marín de Evsikova

The rapid expansion of transcriptomics from increased affordability of next-generation sequencing (NGS) technologies generates rocketing amounts of gene expression data across biology and medicine, and notably in cancer research. Concomitantly, many bioinformatics tools were developed to streamline gene expression analysis and quantification. We tested the concordance of NGS RNA sequencing (RNA-seq) analysis outcomes between the two predominant programs for reads alignment, HISAT2 and STAR, and the two most popular programs for quantifying gene expression in NGS experiments, edgeR and DESeq2, using RNA-seq data from a series of breast cancer progression specimens, which include histologically confirmed normal, early neoplasia, ductal carcinoma in situ and infiltrating ductal carcinoma samples microdissected from formalin fixed, paraffin embedded (FFPE) breast tissue blocks. We identified significant differences in aligners’ performance: HISAT2 was prone to misalign reads to retrogene genomic loci, STAR generated more precise alignments, especially for early neoplasia samples. edgeR and DESeq2 produced similar lists of differentially expressed genes in stage comparisons, with edgeR producing more conservative, though shorter, lists of genes. Albeit, Gene Ontology (GO) enrichment analysis revealed no skewness in significant GO categories identified among differentially expressed genes by edgeR vs DESeq2. As transcriptome analysis of archived FFPE samples becomes a vanguard of precision medicine, identification and fine-tuning of bioinformatics tools becomes critical for clinical research. Our results indicate that STAR and edgeR are well-suited tools for differential gene expression analysis from FFPE samples.


2020 ◽  
Author(s):  
Arash Akbarzadeh ◽  
Aimee Lee S. Houde ◽  
Ben J.G. Sutherland ◽  
Oliver P. Günther ◽  
Kristina M. Miller

AbstractIdentifying early gene expression responses to hypoxia (i.e., low dissolved oxygen) as a tool to assess the degree of exposure to this stressor is crucial for salmonids, because they are increasingly exposed to hypoxic stress due to anthropogenic habitat change, e.g., global warming, excessive nutrient loading, and persistent algal blooms. Our goal was to discover and validate gill gene expression biomarkers specific to the hypoxia response in salmonids across multi-stressor conditions. Gill tissue was collected from 24 freshwater juvenile Chinook salmon (Oncorhynchus tshawytscha), held in normoxia [dissolved oxygen (DO) > 8 mg L−1] and hypoxia (DO = 4□5 mg L−1) in 10 and 18°C temperatures for up to six days. RNA-sequencing (RNA-seq) was then used to discover 240 differentially expressed genes between hypoxic and normoxic conditions, but not affected by temperature. The most significantly differentially expressed genes had functional roles in the cell cycle and suppression of cell proliferation associated with hypoxic conditions. The most significant genes (n = 30) were selected for real-time qPCR assay development. These assays demonstrated a strong correlation (r = 0.88; p < 0.001) between the expression values from RNA-seq and the fold changes from qPCR. Further, qPCR of the 30 candidate hypoxia biomarkers was applied to an additional 322 Chinook salmon exposed to hypoxic and normoxic conditions to reveal the top biomarkers to define hypoxic stress. Multivariate analyses revealed that smolt stage, water salinity, and morbidity status were relevant factors to consider with the expression of these genes in relation to hypoxic stress. These hypoxia candidate genes will be put into application screening Chinook salmon to determine the identity of stressors impacting the fish.


2018 ◽  
Author(s):  
Adam McDermaid ◽  
Brandon Monier ◽  
Jing Zhao ◽  
Qin Ma

AbstractDifferential gene expression (DGE) is one of the most common applications of RNA-sequencing (RNA-seq) data. This process allows for the elucidation of differentially expressed genes (DEGs) across two or more conditions. Interpretation of the DGE results can be non-intuitive and time consuming due to the variety of formats based on the tool of choice and the numerous pieces of information provided in these results files. Here we present an R package, ViDGER (Visualization of Differential Gene Expression Results using R), which contains nine functions that generate information-rich visualizations for the interpretation of DGE results from three widely-used tools, Cuffdiff, DESeq2, and edgeR.


2012 ◽  
Vol 30 (15_suppl) ◽  
pp. 10558-10558 ◽  
Author(s):  
Natalie Galanina ◽  
Emmet Sprecher ◽  
Veerle Bossuyt ◽  
Sudipa Sarkar ◽  
Ian E. Krop ◽  
...  

10558 Background: The use of a ‘brief exposure’ to single agent T allows the measurement of dynamic changes in the transcriptome that may predict response to T-based combinations. We have shown that most gene expression changes in HER2+ tumors treated with T occur in tumors that ultimately achieve a pCR. Our further analysis suggests several patterns of transcriptional change in pCR tumors suggesting different mechanisms of action of T. RNA-seq analysis provides more in-depth annotation of these mechanisms. Methods: Fresh tumor core biopsies were taken at a 2 week time point after a single dose of T (8mg/m2) from 80 HER2+ early breast cancer patients enrolled on a clinical trial of T>T+C. Nucleic acids were extracted using Qiagen AllPrep and were analyzed with Illumina HT12v3 Beadchip and Illumina 610 QUAD V1 SNP arrays. RNA was also processed for sequencing using the Ovation RNA-Seq System and paired-end sequenced using an Illumina Genome Analyzer IIxRNA-seq data was analyzed with Tophat/Cufflinks. Network analysis was performed with Metacore. Results: Among pCR tumors, distinct patterns of differential expression pre/post T were observed, in both microarray and RNA-seq data. ERBB2 down-regulation was characteristic of pCR in one subgroup by microarray. In this group, differentially expressed genes belonged to interaction networks involved in apoptosis and cell cycle regulation. In contrast, tumors with no change in ErbB2 showed differentially expressed genes that belonged to networks related to chromatin assembly and regulation of immune pathways. NOLC1, RPL41, ZCHHC17, and B2M had altered alternative splicing product distributions in both groups. In the ERBB2 down-regulated group, genes with changed expression were enriched for targets of STAT3 and YY1. Conclusions: RNA-seq and microarray reveal distinct responses in tumors that achieve pCR to T-containing regimens. These methods provide predictive markers for validation in subsequent clinical trials.


2020 ◽  
Vol 10 (9) ◽  
pp. 3321-3336
Author(s):  
Arash Akbarzadeh ◽  
Aimee Lee S Houde ◽  
Ben J G Sutherland ◽  
Oliver P Günther ◽  
Kristina M Miller

Abstract Identifying early gene expression responses to hypoxia (i.e., low dissolved oxygen) as a tool to assess the degree of exposure to this stressor is crucial for salmonids, because they are increasingly exposed to hypoxic stress due to anthropogenic habitat change, e.g., global warming, excessive nutrient loading, and persistent algal blooms. Our goal was to discover and validate gill gene expression biomarkers specific to the hypoxia response in salmonids across multi-stressor conditions. Gill tissue was collected from 24 freshwater juvenile Chinook salmon (Oncorhynchus tshawytscha), held in normoxia [dissolved oxygen (DO) &gt; 8 mg L-1] and hypoxia (DO = 4‒5 mg L-1) in 10 and 18° temperatures for up to six days. RNA-sequencing (RNA-seq) was then used to discover 240 differentially expressed genes between hypoxic and normoxic conditions, but not affected by temperature. The most significantly differentially expressed genes had functional roles in the cell cycle and suppression of cell proliferation associated with hypoxic conditions. The most significant genes (n = 30) were selected for real-time qPCR assay development. These assays demonstrated a strong correlation (r = 0.88; P &lt; 0.001) between the expression values from RNA-seq and the fold changes from qPCR. Further, qPCR of the 30 candidate hypoxia biomarkers was applied to an additional 322 Chinook salmon exposed to hypoxic and normoxic conditions to reveal the top biomarkers to define hypoxic stress. Multivariate analyses revealed that smolt stage, water salinity, and morbidity status were relevant factors to consider with the expression of these genes in relation to hypoxic stress. These hypoxia candidate genes will be put into application screening Chinook salmon to determine the identity of stressors impacting the fish.


2021 ◽  
Vol 20 ◽  
pp. 153303382098329
Author(s):  
Yujie Weng ◽  
Wei Liang ◽  
Yucheng Ji ◽  
Zhongxian Li ◽  
Rong Jia ◽  
...  

Human epidermal growth factor 2 (HER2)+ breast cancer is considered the most dangerous type of breast cancers. Herein, we used bioinformatics methods to identify potential key genes in HER2+ breast cancer to enable its diagnosis, treatment, and prognosis prediction. Datasets of HER2+ breast cancer and normal tissue samples retrieved from Gene Expression Omnibus and The Cancer Genome Atlas databases were subjected to analysis for differentially expressed genes using R software. The identified differentially expressed genes were subjected to gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses followed by construction of protein-protein interaction networks using the STRING database to identify key genes. The genes were further validated via survival and differential gene expression analyses. We identified 97 upregulated and 106 downregulated genes that were primarily associated with processes such as mitosis, protein kinase activity, cell cycle, and the p53 signaling pathway. Visualization of the protein-protein interaction network identified 10 key genes ( CCNA2, CDK1, CDC20, CCNB1, DLGAP5, AURKA, BUB1B, RRM2, TPX2, and MAD2L1), all of which were upregulated. Survival analysis using PROGgeneV2 showed that CDC20, CCNA2, DLGAP5, RRM2, and TPX2 are prognosis-related key genes in HER2+ breast cancer. A nomogram showed that high expression of RRM2, DLGAP5, and TPX2 was positively associated with the risk of death. TPX2, which has not previously been reported in HER2+ breast cancer, was associated with breast cancer development, progression, and prognosis and is therefore a potential key gene. It is hoped that this study can provide a new method for the diagnosis and treatment of HER2 + breast cancer.


BMC Cancer ◽  
2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Michal Marczyk ◽  
Chunxiao Fu ◽  
Rosanna Lau ◽  
Lili Du ◽  
Alexander J. Trevarton ◽  
...  

Abstract Background Utilization of RNA sequencing methods to measure gene expression from archival formalin-fixed paraffin-embedded (FFPE) tumor samples in translational research and clinical trials requires reliable interpretation of the impact of pre-analytical variables on the data obtained, particularly the methods used to preserve samples and to purify RNA. Methods Matched tissue samples from 12 breast cancers were fresh frozen (FF) and preserved in RNAlater or fixed in formalin and processed as FFPE tissue. Total RNA was extracted and purified from FF samples using the Qiagen RNeasy kit, and in duplicate from FFPE tissue sections using three different kits (Norgen, Qiagen and Roche). All RNA samples underwent whole transcriptome RNA sequencing (wtRNAseq) and targeted RNA sequencing for 31 transcripts included in a signature of sensitivity to endocrine therapy. We assessed the effect of RNA extraction kit on the reliability of gene expression levels using linear mixed-effects model analysis, concordance correlation coefficient (CCC) and differential analysis. All protein-coding genes in the wtRNAseq and three gene expression signatures for breast cancer were assessed for concordance. Results Despite variable quality of the RNA extracted from FFPE samples by different kits, all had similar concordance of overall gene expression from wtRNAseq between matched FF and FFPE samples (median CCC 0.63–0.66) and between technical replicates (median expression difference 0.13–0.22). More than half of genes were differentially expressed between FF and FFPE, but with low fold change (median |LFC| 0.31–0.34). Two out of three breast cancer signatures studied were highly robust in all samples using any kit, whereas the third signature was similarly discordant irrespective of the kit used. The targeted RNAseq assay was concordant between FFPE and FF samples using any of the kits (CCC 0.91–0.96). Conclusions The selection of kit to purify RNA from FFPE did not influence the overall quality of results from wtRNAseq, thus variable reproducibility of gene signatures probably relates to the reliability of individual gene selected and possibly to the algorithm. Targeted RNAseq showed promising performance for clinical deployment of quantitative assays in breast cancer from FFPE samples, although numerical scores were not identical to those from wtRNAseq and would require calibration.


Viruses ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 244 ◽  
Author(s):  
Antonio Victor Campos Coelho ◽  
Rossella Gratton ◽  
João Paulo Britto de Melo ◽  
José Leandro Andrade-Santos ◽  
Rafael Lima Guimarães ◽  
...  

HIV-1 infection elicits a complex dynamic of the expression various host genes. High throughput sequencing added an expressive amount of information regarding HIV-1 infections and pathogenesis. RNA sequencing (RNA-Seq) is currently the tool of choice to investigate gene expression in a several range of experimental setting. This study aims at performing a meta-analysis of RNA-Seq expression profiles in samples of HIV-1 infected CD4+ T cells compared to uninfected cells to assess consistently differentially expressed genes in the context of HIV-1 infection. We selected two studies (22 samples: 15 experimentally infected and 7 mock-infected). We found 208 differentially expressed genes in infected cells when compared to uninfected/mock-infected cells. This result had moderate overlap when compared to previous studies of HIV-1 infection transcriptomics, but we identified 64 genes already known to interact with HIV-1 according to the HIV-1 Human Interaction Database. A gene ontology (GO) analysis revealed enrichment of several pathways involved in immune response, cell adhesion, cell migration, inflammation, apoptosis, Wnt, Notch and ERK/MAPK signaling.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 4582-4582
Author(s):  
Wei Liao ◽  
Gwen Jordaan ◽  
Artur Jaroszewicz ◽  
Matteo Pellegrini ◽  
Sanjai Sharma

Abstract Abstract 4582 High throughput sequencing of cellular mRNA provides a comprehensive analysis of the transcriptome. Besides identifying differentially expressed genes in different cell types, it also provides information of mRNA isoforms and splicing alterations. We have analyzed two CLL specimens and a normal peripheral blood B cells mRNA by this approach and performed data analysis to identify differentially expressed and spliced genes. The result showed CLLs specimens express approximately 40% more transcripts compared to normal B cells. The FPKM data (fragment per kilobase of exon per million) revealed a higher transcript expression on chromosome 12 in CLL#1 indicating the presence of trisomy 12, which was confirmed by fluorescent in-situ hybridization assay. With a two-fold change in FPKM as a cutoff and a p value cutoff of 0.05 as compared to the normal B cell control, 415 genes and 174 genes in CLL#1 and 676 and 235 genes in CLL#2 were up and downregulated or differentially expressed. In these two CLL specimens, 45% to 75% of differentially expressed genes are common to both the CLL specimens indicating that genetically disparate CLL specimens have a high percentage of a core set of genes that are potentially important for CLL biology. Selected differentially expressed genes with increased expression (selectin P ligand, SELPLG, and adhesion molecule interacts with CXADR antigen 1, AMICA) and decreased (Fos, Jun, CD69 and Rhob) expression based on the FPKM from RNA-sequencing data were also analyzed in additional CLL specimens by real time PCR analysis. The expression data from RNA-seq closely matches the fold-change in expression as measured by RT-PCR analysis and confirms the validity of the RNA-seq analysis. Interestingly, Fos was identified as one of the most downregulated gene in CLL. Using the Cufflinks and Cuffdiff software, the splicing patterns of genes in CLL specimens and normal B cells were analyzed. Approximately, 1100 to 1250 genes in the two CLL specimens were significantly differentially spliced as compared to normal B cells. In this analysis as well, there is a core set of 800 common genes which are differentially spliced in the two CLL specimens. The RNA-sequencing analysis accurately identifies differentially expressed novel genes and splicing variations that will help us understand the biology of CLL. Disclosures: No relevant conflicts of interest to declare.


2020 ◽  
Author(s):  
Qiang Song ◽  
Man Huang ◽  
Guicheng Wu ◽  
Lu Dou ◽  
Wenjin Zhang ◽  
...  

Abstract Background Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) is the most sensitive technique for evaluating gene expression levels. Choosing appropriate reference genes (RGs) is critical for normalizing and evaluating changes in the expression of target genes. However, uniform and reliable RGs for breast cancer research have not been identified, limiting the value of target gene expression studies. Here, we provide a novel approach for mining RGs by using the RNA-seq dataset to identify reliable and accurate RGs that can be applied to different types of breast cancer tissues and cell lines. Methods First, we compiled the transcriptome profiling data from the TCGA database involving 1217 samples to identify novel RGs and then ten genes (SF1, TARDBP, THRAP3, QRICH1, TRA2B, SRSF3, YY1, DNAJC8, RNF10, and RHOA) with relatively stable expression levels were chosen as novel candidate RGs. Additionally, six conventional RGs (ACTB, TUBA1A, RPL13A, B2M, GAPDH, and GUSB) were also selected. To determine and validate the optimal RGs we performed qRT-PCR experiments on 87 samples from 5 types of surgically excised breast tumor specimens including HR+HER2-, HR+HER2+, HR-HER2-, HR-HER2+, breast cancer after neoadjuvant chemotherapy (NAC) and their matched para-carcinoma tissues, furthermore, we also included a benign breast tumor sample. Six biological replicates were included for each tissue. Moreover, we assessed 7 breast cancer cell lines (MCF-10A, MCF-7, T-47D, MDA-MB-231, MDA-MB-468, as well as MDA-MB-231 with either CNR2 knockdown or overexpression; 3 biological replicates for each line). Five statistical algorithms (geNorm, NormFinder, ΔCt method, BestKeeper, and ComprFinder) were used to assess the stability of expression of each RG across all breast cancer tissues and cell lines. Results Our results show that RG combinations SF1+TRA2B+THRAP3 and THRAP3+RHOA+QRICH1 showed stable expression in breast cancer tissues and cell lines, respectively, and that these two combinations displayed good interchangeability. Therefore, we propose that the above two combinations are optimal triplet RGs for breast cancer research. Conclusions In summary, we identified novel and reliable RG combinations for breast cancer research based on a public RNA-seq dataset which lays a solid foundation for accurate normalization of qRT-PCR results across different breast cancer tissues and cells.


Sign in / Sign up

Export Citation Format

Share Document