scholarly journals Towards mouse genetic-specific RNA-sequencing read mapping

2021 ◽  
Author(s):  
Nastassia Gobet ◽  
Maxime Jan ◽  
Paul Franken ◽  
Ioannis Xenarios

Genetic variations affect behavior and cause disease but understanding how these variants drive complex traits is still an open question. A common approach is to link the genetic variants to intermediate molecular phenotypes such as the transcriptome using RNA-sequencing (RNA-seq). Paradoxically, these variants between the samples are usually ignored at the beginning of RNA-seq analyses of many model organisms. This can skew the transcriptome estimates that are used later for downstream analyses, such as expression quantitative trait locus (eQTL) detection. Here, we assessed the impact of reference-based analysis on the transcriptome and eQTLs in a widely-used mouse genetic population: the BXD panel of recombinant inbred lines. We highlight existing reference bias in the transcriptome data analysis and propose practical solutions which combine available genetic variants, genotypes, and genome reference sequence. The use of custom BXD line references improved downstream analysis compared to classical genome reference. These insights would likely benefit genetic studies with a transcriptomic component and demonstrate that genome references might need to be reassessed and improved.

FACETS ◽  
2017 ◽  
Vol 2 (2) ◽  
pp. 610-641 ◽  
Author(s):  
Rebekah A. Oomen ◽  
Jeffrey A. Hutchings

The need to better understand how plasticity and evolution affect organismal responses to environmental variability is paramount in the face of global climate change. The potential for using RNA sequencing (RNA-seq) to study complex responses by non-model organisms to the environment is evident in a rapidly growing body of literature. This is particularly true of fishes for which research has been motivated by their ecological importance, socioeconomic value, and increased use as model species for medical and genetic research. Here, we review studies that have used RNA-seq to study transcriptomic responses to continuous abiotic variables to which fishes have likely evolved a response and that are predicted to be affected by climate change (e.g., salinity, temperature, dissolved oxygen concentration, and pH). Field and laboratory experiments demonstrate the potential for individuals to respond plastically to short- and long-term environmental stress and reveal molecular mechanisms underlying developmental and transgenerational plasticity, as well as adaptation to different environmental regimes. We discuss experimental, analytical, and conceptual issues that have arisen from this work and suggest avenues for future study.


2020 ◽  
Vol 100 (10) ◽  
pp. 1345-1355 ◽  
Author(s):  
Stefaniya Boneva ◽  
Anja Schlecht ◽  
Daniel Böhringer ◽  
Hans Mittelviefhaus ◽  
Thomas Reinhard ◽  
...  

Abstract This study aims to compare the potential of standard RNA-sequencing (RNA-Seq) and 3′ massive analysis of c-DNA ends (MACE) RNA-sequencing for the analysis of fresh tissue and describes transcriptome profiling of formalin-fixed paraffin-embedded (FFPE) archival human samples by MACE. To compare MACE to standard RNA-Seq on fresh tissue, four healthy conjunctiva from four subjects were collected during vitreoretinal surgery, halved and immediately transferred to RNA lysis buffer without prior fixation and then processed for either standard RNA-Seq or MACE RNA-Seq analysis. To assess the impact of FFPE preparation on MACE, a third part was fixed in formalin and processed for paraffin embedding, and its transcriptional profile was compared with the unfixed specimens analyzed by MACE. To investigate the impact of FFPE storage time on MACE results, 24 FFPE-treated conjunctival samples from 24 patients were analyzed as well. Nineteen thousand six hundred fifty-nine transcribed genes were detected by both MACE and standard RNA-Seq on fresh tissue, while 3251 and 2213 transcripts were identified explicitly by MACE or RNA-Seq, respectively. Standard RNA-Seq tended to yield longer detected transcripts more often than MACE technology despite normalization, indicating that the MACE technology is less susceptible to a length bias. FFPE processing revealed negligible effects on MACE sequencing results. Several quality-control measurements showed that long-term storage in paraffin did not decrease the diversity of MACE libraries. We noted a nonlinear relation between storage time and the number of raw reads with an accelerated decrease within the first 1000 days in paraffin, while the numbers remained relatively stable in older samples. Interestingly, the number of transcribed genes detected was independent on FFPE storage time. RNA of sufficient quality and quantity can be extracted from FFPE samples to obtain comprehensive transcriptome profiling using MACE technology. We thus present MACE as a novel opportunity for utilizing FFPE samples stored in histological archives.


2009 ◽  
Vol 296 (5) ◽  
pp. L713-L725 ◽  
Author(s):  
Li Gao ◽  
Kathleen C. Barnes

It has been well established that acute lung injury (ALI), and the more severe presentation of acute respiratory distress syndrome (ARDS), constitute complex traits characterized by a multigenic and multifactorial etiology. Identification and validation of genetic variants contributing to disease susceptibility and severity has been hampered by the profound heterogeneity of the clinical phenotype and the role of environmental factors, which includes treatment, on outcome. The critical nature of ALI and ARDS, compounded by the impact of phenotypic heterogeneity, has rendered the amassing of sufficiently powered studies especially challenging. Nevertheless, progress has been made in the identification of genetic variants in select candidate genes, which has enhanced our understanding of the specific pathways involved in disease manifestation. Identification of novel candidate genes for which genetic association studies have confirmed a role in disease has been greatly aided by the powerful tool of high-throughput expression profiling. This article will review these studies to date, summarizing candidate genes associated with ALI and ARDS, acknowledging those that have been replicated in independent populations, with a special focus on the specific pathways for which candidate genes identified so far can be clustered.


2015 ◽  
Vol 9S4 ◽  
pp. BBI.S29334 ◽  
Author(s):  
Jessica P. Hekman ◽  
Jennifer L Johnson ◽  
Anna V. Kukekova

Domesticated species occupy a special place in the human world due to their economic and cultural value. In the era of genomic research, domesticated species provide unique advantages for investigation of diseases and complex phenotypes. RNA sequencing, or RNA-seq, has recently emerged as a new approach for studying transcriptional activity of the whole genome, changing the focus from individual genes to gene networks. RNA-seq analysis in domesticated species may complement genome-wide association studies of complex traits with economic importance or direct relevance to biomedical research. However, RNA-seq studies are more challenging in domesticated species than in model organisms. These challenges are at least in part associated with the lack of quality genome assemblies for some domesticated species and the absence of genome assemblies for others. In this review, we discuss strategies for analyzing RNA-seq data, focusing particularly on questions and examples relevant to domesticated species.


2014 ◽  
Author(s):  
Gregory A Moyerbrailean ◽  
Chris T Harvey ◽  
Cynthia A Kalita ◽  
Xiaoquan Wen ◽  
Francesca Luca ◽  
...  

Ongoing large experimental characterization is crucial to determine all regulatory sequences, yet we do not know which genetic variants in those regions are non-silent. Here, we present a novel analysis integrating sequence and DNase I footprinting data for 653 samples to predict the impact of a sequence change on transcription factor binding for a panel of 1,372 motifs. Most genetic variants in footprints (5,810,227) do not show evidence of allele-specific binding (ASB). In contrast, functional genetic variants predicted by our computational models are highly enriched for ASB (3,217 SNPs at 20% FDR). Comparing silent to functional non-coding genetic variants, the latter are 1.22-fold enriched for GWAS traits, have lower allele frequencies, and affect footprints more distal to promoters or active in fewer tissues. Finally, integration of the annotations into 18 GWAS meta-studies improves identification of likely causal SNPs and transcription factors relevant for complex traits.


2016 ◽  
Author(s):  
Alan Medlar ◽  
Laura Laakso ◽  
Andreia Miraldo ◽  
Ari Löytynoja

AbstractHigh-throughput RNA-seq data has become ubiquitous in the study of non-model organisms, but its use in comparative analysis remains a challenge. Without a reference genome for mapping, sequence data has to be de novo assembled, producing large numbers of short, highly redundant contigs. Preparing these assemblies for comparative analyses requires the removal of redundant isoforms, assignment of orthologs and converting fragmented transcripts into gene alignments. In this article we present Glutton, a novel tool to process transcriptome assemblies for downstream evolutionary analyses. Glutton takes as input a set of fragmented, possibly erroneous transcriptome assemblies. Utilising phylogeny-aware alignment and reference data from a closely related species, it reconstructs one transcript per gene, finds orthologous sequences and produces accurate multiple alignments of coding sequences. We present a comprehensive analysis of Glutton’s performance across a wide range of divergence times between study and reference species. We demonstrate the impact choice of assembler has on both the number of alignments and the correctness of ortholog assignment and show substantial improvements over heuristic methods, without sacrificing correctness. Finally, using inference of Darwinian selection as an example of downstream analysis, we show that Glutton-processed RNA-seq data give results comparable to those obtained from full length gene sequences even with distantly related reference species. Glutton is available from http://wasabiapp.org/software/glutton/ and is licensed under the GPLv3.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 55-55
Author(s):  
Shengfa F Liao ◽  
M Shamimul Hasan

Abstract In life science, RNA sequencing (RNA-seq) technique is a state-of-the-art research approach for tissue or cell transcriptome analyses. In recent years, RNA-seq has been applied to profile the gene expression in response to dietary nutrients or feed additives to gain thorough understanding of the complex nutrient-gene interactions in agricultural animals. In this presentation, we will selectively review the application of RNA-seq technique in nutrigenomics studies in swine. Such studies have investigated the impact of various sources and quantities of dietary fatty acids, protein (including alternative protein), energy, probiotics, and plant-derived bioactive compounds on the gene expression in major metabolic tissues, such as liver, muscle, and adipose. Although the RNA-seq methodology is a powerful quantitative tool for transcriptomics analysis, it still has various technical challenges and pitfalls throughout its practice steps that include experiment design, sample collection, sample laboratory analysis, data statistical and bioinformatic analyses, and data interpretation. Currently, many options are available for use in some steps, but a thorough understanding of each option is critical for making right decisions and avoiding getting into inconclusive results. Therefore, this presentation will also provide an overview on the “best practices” for applying RNA-seq technique in swine nutrigenomics studies, which include the aspects of appropriately designing experiments, collecting samples, and analyzing the data in order to have confidence in the results obtained from this approach. In short, the aims of this presentation are to provide some basic guidelines for researchers new in the field and to promote a discussion of standardization or “best practices” of RNA-seq methodology for animal nutrigenomics studies.


Author(s):  
Basten L Snoek ◽  
Mark G Sterken ◽  
Harm Nijveen ◽  
Rita J M Volkers ◽  
Joost Riksen ◽  
...  

Abstract Studying genetic variation of gene expression provides a powerful way to unravel the molecular components underlying complex traits. Expression QTL studies have been performed in several different model species, yet most of these linkage studies have been based on genetic segregation of two parental alleles. Recently we developed a multi-parental segregating population of 200 recombinant inbred lines (mpRILs) derived from four wild isolates (JU1511, JU1926, JU1931 and JU1941) in the nematode Caenorhabditis elegans. We used RNA-seq to investigate how multiple alleles affect gene expression in these mpRILs. We found 1,789 genes differentially expressed between the parental lines. Transgression, expression beyond any of the parental lines in the mpRILs, was found for 7,896 genes. For expression QTL mapping almost 9,000 SNPs were available. By combining these SNPs and the RNA-seq profiles of the mpRILs, we detected almost 6,800 eQTLs. Most trans-eQTLs (63%) co-locate in six newly identified trans-bands. The trans-eQTLs found in previous 2-parental allele eQTL experiments and this study showed some overlap (17.5%-46.8%), highlighting on the one hand that a large group of genes is affected by polymorphic regulators across populations and conditions, on the other hand it shows that the mpRIL population allows identification of novel gene expression regulatory loci. Taken together, the analysis of our mpRIL population provides a more refined insight into C. elegans complex trait genetics and eQTLs in general, as well as a starting point to further test and develop advanced statistical models for detection of multi-allelic eQTLs and systems genetics studying the genotype-phenotype relationship.


2020 ◽  
Author(s):  
Benjamin Kellman ◽  
Hratch Baghdassarian ◽  
Tiziano Pramparo ◽  
Isaac Shamie ◽  
Vahid Gazestani ◽  
...  

Abstract Background: Both RNA-Seq and sample freeze-thaw are ubiquitous. However, knowledge about the impact of freeze-thaw on downstream analyses is limited. The lack of common quality metrics that are sufficiently sensitive to freeze-thaw and RNA degradation, e.g. the RNA Integrity Score, makes such assessments challenging.Results: Here we quantify the impact of repeated freeze-thaw cycles on the reliability of RNA-Seq by examining poly(A)-enriched and ribosomal RNA depleted RNA-seq from frozen leukocytes drawn from a toddler Autism cohort. To do so, we estimate the relative noise, or percentage of random counts, separating technical replicates. Using this approach we measured noise associated with RIN and freeze-thaw cycles. As expected, RIN does not fully capture sample degradation due to freeze-thaw. We further examined differential expression results and found that three freeze-thaws should extinguish the differential expression reproducibility of similar experiments. Freeze-thaw also resulted in a 3’ shift in the read coverage distribution along the gene body of poly(A)-enriched samples compared to ribosomal RNA depleted samples, suggesting that library preparation may exacerbate freeze-thaw-induced sample degradation.Conclusion: The use of poly(A)-enrichment for RNA sequencing is pervasive in library preparation of frozen tissue, and thus, it is important during experimental design and data analysis to consider the impact of repeated freeze-thaw cycles on reproducibility.


2018 ◽  
Author(s):  
Adam McDermaid ◽  
Xin Chen ◽  
Yiran Zhang ◽  
Juan Xie ◽  
Cankun Wang ◽  
...  

AbstractMotivationOne of the main benefits of using modern RNA-sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses.ResultsOur investigation into 95 RNA-Seq datasets from seven species (totaling 1,951GB) indicates an average of roughly 22% of all reads are MMRs for plant and animal species. Here we present a tool called GeneQC (Gene expression Quality Control), which can accurately estimate the reliability of each gene’s expression level. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability.AvailabilityGeneQC is freely available at http://bmbl.sdstate.edu/GeneQC/[email protected] informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document