scholarly journals ReQTL: identifying correlations between expressed SNVs and gene expression using RNA-sequencing data

2019 ◽  
Vol 36 (5) ◽  
pp. 1351-1359 ◽  
Author(s):  
Liam F Spurr ◽  
Nawaf Alomran ◽  
Pavlos Bousounis ◽  
Dacian Reece-Stremtan ◽  
N M Prashant ◽  
...  

Abstract Motivation By testing for associations between DNA genotypes and gene expression levels, expression quantitative trait locus (eQTL) analyses have been instrumental in understanding how thousands of single nucleotide variants (SNVs) may affect gene expression. As compared to DNA genotypes, RNA genetic variation represents a phenotypic trait that reflects the actual allele content of the studied system. RNA genetic variation at expressed SNV loci can be estimated using the proportion of alleles bearing the variant nucleotide (variant allele fraction, VAFRNA). VAFRNA is a continuous measure which allows for precise allele quantitation in loci where the RNA alleles do not scale with the genotype count. We describe a method to correlate VAFRNA with gene expression and assess its ability to identify genetically regulated expression solely from RNA-sequencing (RNA-seq) datasets. Results We introduce ReQTL, an eQTL modification which substitutes the DNA allele count for the variant allele fraction at expressed SNV loci in the transcriptome (VAFRNA). We exemplify the method on sets of RNA-seq data from human tissues obtained though the Genotype-Tissue Expression (GTEx) project and demonstrate that ReQTL analyses are computationally feasible and can identify a subset of expressed eQTL loci. Availability and implementation A toolkit to perform ReQTL analyses is available at https://github.com/HorvathLab/ReQTL. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Liam Spurr ◽  
Nawaf Alomran ◽  
Piotr Słowiński ◽  
Muzi Li ◽  
Pavlos Bousounis ◽  
...  

MotivationBy testing for association of DNA genotypes with gene expression levels, expression quantitative trait locus (eQTL) analyses have been instrumental in understanding how thousands of single nucleotide variants (SNVs) may affect gene expression. As compared to DNA genotypes, RNA genetic variation represents a phenotypic trait that reflects the actual allele content of the studied system. RNA genetic variation can be measured at expressed genome regions, and differs from the DNA genotype in sites subjected to regulatory forces. Therefore, assessment of correlation between RNA genetic variation and gene expression can reveal regulatory genomic relationships in addition to eQTLs.ResultsWe introduce ReQTL, an eQTL modification which substitutes the DNA allele count for the variant allele frequency (VAF) at expressed SNV loci in the transcriptome. We exemplify the method on sets of RNA-sequencing data from human tissues obtained though the Genotype-Tissue Expression Project (GTEx) and demonstrate that ReQTL analyses show consistently high performance and sufficient power to identify both previously known and novel molecular associations. The majority of the SNVs implicated in significant cis-ReQTLs identified by our analysis were previously reported as significant cis-eQTL loci. Notably, trans ReQTL loci in our data were substantially enriched in RNA-editing sites. In summary, ReQTL analyses are computationally feasible and do not require matched DNA data, hence they have a high potential to facilitate the discovery of novel molecular interactions through exploration of the increasingly accessible RNA-sequencing datasets.Availability and implementationSample scripts used in our ReQTL analyses are available with the Supplementary Material (ReQTL_sample_code)[email protected] or [email protected] InformationRe_QTL_Supplementary_Data.zip


2019 ◽  
Author(s):  
Justin Sein ◽  
Liam F. Spurr ◽  
Pavlos Bousounis ◽  
N M Prashant ◽  
Hongyu Liu ◽  
...  

SummaryRsQTL is a tool for identification of splicing quantitative trait loci (sQTLs) from RNA-sequencing (RNA-seq) data by correlating the variant allele fraction at expressed SNV loci in the transcriptome (VAFRNA) with the proportion of molecules spanning local exon-exon junctions at loci with differential intron excision (percent spliced in, PSI). We exemplify the method on sets of RNA-seq data from human tissues obtained though the Genotype-Tissue Expression Project (GTEx). RsQTL does not require matched DNA and can identify a subset of expressed sQTL loci. Due to the dynamic nature of VAFRNA, RsQTL is applicable for the assessment of conditional and dynamic variation-splicing relationships.Availability and implementationhttps://github.com/HorvathLab/[email protected] or [email protected] InformationRsQTL_Supplementary_Data.zip


2020 ◽  
Author(s):  
Hongyu Liu ◽  
N M Prashant ◽  
Liam F. Spurr ◽  
Pavlos Bousounis ◽  
Nawaf Alomran ◽  
...  

AbstractRecently, pioneering eQTLs studies on single cell RNA-seq (scRNA-seq) data have revealed new and cell-specific regulatory SNVs. Because eQTLs correlate genotypes and gene expression across multiple individuals, they are confined to SNVs with sufficient population frequency. Here, we present an alternative sc-eQTL approach – scReQTL - wherein we substitute the genotypes with expressed Variant Allele Fraction (VAFRNA) at heterozygous SNV sites. Our approach employs the advantage that, when estimated from multiple cells, VAFRNA can be used to assess effects of rare SNVs in a single individual. ScReQTLs are enriched in known genetic interactions, therefore can be used to identify novel regulatory SNVs.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2018 ◽  
Author(s):  
Adam McDermaid ◽  
Xin Chen ◽  
Yiran Zhang ◽  
Juan Xie ◽  
Cankun Wang ◽  
...  

AbstractMotivationOne of the main benefits of using modern RNA-sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses.ResultsOur investigation into 95 RNA-Seq datasets from seven species (totaling 1,951GB) indicates an average of roughly 22% of all reads are MMRs for plant and animal species. Here we present a tool called GeneQC (Gene expression Quality Control), which can accurately estimate the reliability of each gene’s expression level. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability.AvailabilityGeneQC is freely available at http://bmbl.sdstate.edu/GeneQC/[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
NM Prashant ◽  
Nawaf Alomran ◽  
Yu Chen ◽  
Hongyu Liu ◽  
Pavlos Bousounis ◽  
...  

SummarySCReadCounts is a method for a cell-level estimation of the sequencing read counts bearing a particular nucleotide at genomic positions of interest from barcoded scRNA-seq alignments. SCReadCounts generates an array of outputs, including cell-SNV matrices with the absolute variant-harboring read counts, as well as cell-SNV matrices with expressed Variant Allele Fraction (VAFRNA); we demonstrate its application to estimate cell level expression of somatic mutations and RNA-editing on cancer datasets. SCReadCounts is benchmarked against GATK and Samtools and is freely available as a 64-bit self-contained binary distribution (Linux), along with MacOS and Python installation.Availabilityhttps://github.com/HorvathLab/NGS/tree/master/SCReadCountsSupplementary InformationSCReadCounts_Supplementary_Data.zip


2021 ◽  
pp. 510-524
Author(s):  
Jeffrey C. Thompson ◽  
Erica L. Carpenter ◽  
Benjamin A. Silva ◽  
Jamie Rosenstein ◽  
Austin L. Chien ◽  
...  

PURPOSE Although the majority of patients with metastatic non–small-cell lung cancer (mNSCLC) lacking a detectable targetable mutation will receive pembrolizumab-based therapy in the frontline setting, predicting which patients will experience a durable clinical benefit (DCB) remains challenging. MATERIALS AND METHODS Patients with mNSCLC receiving pembrolizumab monotherapy or in combination with chemotherapy underwent a 74-gene next-generation sequencing panel on blood samples obtained at baseline and at 9 weeks. The change in circulating tumor DNA levels on-therapy (molecular response) was quantified using a ratio calculation with response defined by a > 50% decrease in mean variant allele fraction. Patient response was assessed using RECIST 1.1; DCB was defined as complete or partial response or stable disease that lasted > 6 months. Progression-free survival and overall survival were recorded. RESULTS Among 67 patients, 51 (76.1%) had > 1 variant detected at a variant allele fraction > 0.3% and thus were eligible for calculation of molecular response from paired baseline and 9-week samples. Molecular response values were significantly lower in patients with an objective radiologic response (log mean 1.25% v 27.7%, P < .001). Patients achieving a DCB had significantly lower molecular response values compared to patients with no durable benefit (log mean 3.5% v 49.4%, P < .001). Molecular responders had significantly longer progression-free survival (hazard ratio, 0.25; 95% CI, 0.13 to 0.50) and overall survival (hazard ratio, 0.27; 95% CI, 0.12 to 0.64) compared with molecular nonresponders. CONCLUSION Molecular response assessment using circulating tumor DNA may serve as a noninvasive, on-therapy predictor of response to pembrolizumab-based therapy in addition to standard of care imaging in mNSCLC. This strategy requires validation in independent prospective studies.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11875
Author(s):  
Tomoko Matsuda

Large volumes of high-throughput sequencing data have been submitted to the Sequencing Read Archive (SRA). The lack of experimental metadata associated with the data makes reuse and understanding data quality very difficult. In the case of RNA sequencing (RNA-Seq), which reveals the presence and quantity of RNA in a biological sample at any moment, it is necessary to consider that gene expression responds over a short time interval (several seconds to a few minutes) in many organisms. Therefore, to isolate RNA that accurately reflects the transcriptome at the point of harvest, raw biological samples should be processed by freezing in liquid nitrogen, immersing in RNA stabilization reagent or lysing and homogenizing in RNA lysis buffer containing guanidine thiocyanate as soon as possible. As the number of samples handled simultaneously increases, the time until the RNA is protected can increase. Here, to evaluate the effect of different lag times in RNA protection on RNA-Seq data, we harvested CHO-S cells after 3, 5, 6, and 7 days of cultivation, added RNA lysis buffer in a time course of 15, 30, 45, and 60 min after harvest, and conducted RNA-Seq. These RNA samples showed high RNA integrity number (RIN) values indicating non-degraded RNA, and sequence data from libraries prepared with these RNA samples was of high quality according to FastQC. We observed that, at the same cultivation day, global trends of gene expression were similar across the time course of addition of RNA lysis buffer; however, the expression of some genes was significantly different between the time-course samples of the same cultivation day; most of these differentially expressed genes were related to apoptosis. We conclude that the time lag between sample harvest and RNA protection influences gene expression of specific genes. It is, therefore, necessary to know not only RIN values of RNA and the quality of the sequence data but also how the experiment was performed when acquiring RNA-Seq data from the database.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 135-135
Author(s):  
Shengfa F Liao ◽  
Shamimul Hasan ◽  
Jean M Feugang

Abstract Animal life essentially is a set of gene expression processes. Thorough understanding of these processes driven by dietary nutrients and other environmental factors can be regarded as a bottom line of modern advanced animal nutrition research for improving animal growth, development, health, production, and reproduction performance. Nutrigenomics, a genome-wide approach using the knowledge and techniques obtained from the disciplines of genomics (including transcriptomics) and molecular biology, is to study the effects of dietary nutrients on cellular gene expression, cellular metabolic responses and, ultimately, the phenotypic changes of a living organism. Transcriptomics can be applied to investigate animal tissue transcriptome at a defined physiological or nutritional state, which provides a holistic view of the intracellular expression of RNA, especially mRNA. As a novel, promising transcriptomics approach, RNA sequencing (RNA-Seq) technology can monitor all-gene expressions simultaneously in response to dietary intervention. The principle and history of RNA-Seq technology will be briefly reviewed, and the three principal steps of this methodology, including the laboratory analysis of tissue samples, the bioinformatics analysis of the generated sequence data, and the subsequent biological interpretation of the data, will be described. The application of RNA-Seq technology in different areas of animal nutrition research, which include maternal nutrition, feeding strategy and gut microbiota, will be summarized. Lastly, the application of RNA-Seq technology in swine science and nutrition research will also be discussed. In short, to further improve animal feeding or production efficiency, RNA-Seq technology holds a great potential to be employed to explore the new insights into better understanding of nutrient-gene interactions in agricultural animals, and it is expected that the application of this cutting-edge technology in animal nutrition research will continue to grow in the foreseeable future. This research was supported in part by a USDA-NIFA Multistate Project (No. 1007691).


2020 ◽  
Vol 36 (13) ◽  
pp. 4021-4029
Author(s):  
Hyundoo Jeong ◽  
Zhandong Liu

Abstract Summary Single-cell RNA sequencing technology provides a novel means to analyze the transcriptomic profiles of individual cells. The technique is vulnerable, however, to a type of noise called dropout effects, which lead to zero-inflated distributions in the transcriptome profile and reduce the reliability of the results. Single-cell RNA sequencing data, therefore, need to be carefully processed before in-depth analysis. Here, we describe a novel imputation method that reduces dropout effects in single-cell sequencing. We construct a cell correspondence network and adjust gene expression estimates based on transcriptome profiles for the local subnetwork of cells of the same type. We comprehensively evaluated this method, called PRIME (PRobabilistic IMputation to reduce dropout effects in Expression profiles of single-cell sequencing), on synthetic and eight real single-cell sequencing datasets and verified that it improves the quality of visualization and accuracy of clustering analysis and can discover gene expression patterns hidden by noise. Availability and implementation The source code for the proposed method is freely available at https://github.com/hyundoo/PRIME. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document