Recent Applications of RNA Sequencing in Food and Agriculture

RNA sequencing (RNA-Seq) is the leading, routine, high-throughput, and cost-effective next-generation sequencing (NGS) approach for mapping and quantifying transcriptomes, and determining the transcriptional structure. The transcriptome is a complete collection of transcripts found in a cell or tissue or organism at a given time point or specific developmental or environmental or physiological condition. The emergence and evolution of RNA-Seq chemistries have changed the landscape and the pace of transcriptome research in life sciences over a decade. This chapter introduces RNA-Seq and surveys its recent food and agriculture applications, ranging from differential gene expression, variants calling and detection, allele-specific expression, alternative splicing, alternative polyadenylation site usage, microRNA profiling, circular RNAs, single-cell RNA-Seq, metatranscriptomics, and systems biology. A few popular RNA-Seq databases and analysis tools are also presented for each application. We began to witness the broader impacts of RNA-Seq in addressing complex biological questions in food and agriculture.

Download Full-text

Uncovering the Complexity of Transcriptomes with RNA-Seq

Journal of Biomedicine and Biotechnology ◽

10.1155/2010/853916 ◽

2010 ◽

Vol 2010 ◽

pp. 1-19 ◽

Cited By ~ 185

Author(s):

Valerio Costa ◽

Claudia Angelini ◽

Italia De Feis ◽

Alfredo Ciccodicola

Keyword(s):

Massively Parallel Sequencing ◽

Point Of View ◽

Rna Seq ◽

Specific Expression ◽

Allele Specific ◽

Data Files ◽

Sequencing Platforms ◽

New Applications ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

In recent years, the introduction of massively parallel sequencing platforms for Next Generation Sequencing (NGS) protocols, able to simultaneously sequence hundred thousand DNA fragments, dramatically changed the landscape of the genetics studies. RNA-Seq for transcriptome studies, Chip-Seq for DNA-proteins interaction, CNV-Seq for large genome nucleotide variations are only some of the intriguing new applications supported by these innovative platforms. Among them RNA-Seq is perhaps the most complex NGS application. Expression levels of specific genes, differential splicing, allele-specific expression of transcripts can be accurately determined by RNA-Seq experiments to address many biological-related issues. All these attributes are not readily achievable from previously widespread hybridization-based or tag sequence-based approaches. However, the unprecedented level of sensitivity and the large amount of available data produced by NGS platforms provide clear advantages as well as new challenges and issues. This technology brings the great power to make several new biological observations and discoveries, it also requires a considerable effort in the development of new bioinformatics tools to deal with these massive data files. The paper aims to give a survey of the RNA-Seq methodology, particularly focusing on the challenges that this application presents both from a biological and a bioinformatics point of view.

Download Full-text

Investigation of allele specific expression in various tissues of broiler chickens using the detection tool VADT

Scientific Reports ◽

10.1038/s41598-021-83459-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

M. Joseph Tomlinson ◽

Shawn W. Polson ◽

Jing Qiu ◽

Juniper A. Lake ◽

William Lee ◽

...

Keyword(s):

Broiler Chickens ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Specific Expression ◽

Single Nucleotide ◽

Allele Specific Expression ◽

Detection Tool ◽

Commercial Broiler ◽

Significant Phenomenon ◽

Allele Specific

AbstractDifferential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.

Download Full-text

Replicate sequencing libraries are important for quantification of allelic imbalance

Nature Communications ◽

10.1038/s41467-021-23544-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Asia Mendelevich ◽

Svetlana Vinogradova ◽

Saumya Gupta ◽

Andrey A. Mironov ◽

Shamil R. Sunyaev ◽

...

Keyword(s):

Allelic Imbalance ◽

False Positive Rate ◽

Error Rates ◽

Differential Analysis ◽

Rna Seq ◽

Specific Expression ◽

Technical Noise ◽

Specific Analysis ◽

Positive Rate ◽

Allele Specific

AbstractA sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.

Download Full-text

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Nature Communications ◽

10.1038/s41467-021-21894-x ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ryan Lusk ◽

Evan Stene ◽

Farnoush Banaei-Kashani ◽

Boris Tabakoff ◽

Katerina Kechris ◽

...

Keyword(s):

Rna Sequencing ◽

Dna Sequence ◽

Mammalian Species ◽

Alternative Polyadenylation ◽

Sequence Information ◽

Rna Seq ◽

Average Precision ◽

Polyadenylation Sites ◽

Dna Nucleotide Sequence

AbstractAnnotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.

Download Full-text

Analysis of Allele-Specific Expression in Mouse Liver by RNA-Seq: A Comparison With Cis-eQTL Identified Using Genetic Linkage

Genetics ◽

10.1534/genetics.113.153882 ◽

2013 ◽

Vol 195 (3) ◽

pp. 1157-1166 ◽

Cited By ~ 34

Author(s):

Sandrine Lagarrigue ◽

Lisa Martin ◽

Farhad Hormozdiari ◽

Pierre-François Roux ◽

Calvin Pan ◽

...

Keyword(s):

Mouse Liver ◽

Genetic Linkage ◽

Rna Seq ◽

Specific Expression ◽

Allele Specific Expression ◽

Allele Specific

Download Full-text

DEBKS: A Tool to Detect Differentially Expressed Circular RNA

10.1101/2020.10.14.336982 ◽

2020 ◽

Author(s):

Zelin Liu ◽

Huiru Ding ◽

Jianqi She ◽

Chunhua Chen ◽

Weiguang Zhang ◽

...

Keyword(s):

Open Source ◽

Rna Sequencing ◽

Open Source Software ◽

Simulated Data ◽

Circular Rna ◽

Host Gene ◽

Circular Rnas ◽

Biological Processes ◽

Rna Seq ◽

Disease Pathogenesis

AbstractCircular RNAs (circRNAs) are involved in various biological processes and in disease pathogenesis. However, only a small number of functional circRNAs have been identified among hundreds of thousands of circRNA species, partly because most current methods are based on circular junction counts and overlook the fact that circRNA is formed from the host gene by back-splicing (BS). To distinguish between expression originating from BS and that from the host gene, we present DEBKS, a software program to streamline the discovery of differential BS between two rRNA-depleted RNA sequencing (RNA-seq) sample groups. By applying real and simulated data and employing RT-qPCR for validation, we demonstrate that DEBKS is efficient and accurate in detecting circRNAs with differential BS events between paired and unpaired sample groups. DEBKS is available at https://github.com/yangence/DEBKS as open-source software.

Download Full-text

Variant calling from RNA-seq data of the brain transcriptome of pigs and its application for allele-specific expression and imprinting analysis

Gene ◽

10.1016/j.gene.2017.10.076 ◽

2018 ◽

Vol 641 ◽

pp. 367-375 ◽

Cited By ~ 6

Author(s):

Maria Oczkowicz ◽

Tomasz Szmatoła ◽

Katarzyna Piórkowska ◽

Katarzyna Ropka-Molik

Keyword(s):

Variant Calling ◽

Rna Seq ◽

Specific Expression ◽

Allele Specific Expression ◽

Brain Transcriptome ◽

Allele Specific ◽

The Brain

Download Full-text

Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression

Bioinformatics ◽

10.1093/bioinformatics/bty078 ◽

2018 ◽

Vol 34 (13) ◽

pp. 2177-2184 ◽

Cited By ~ 33

Author(s):

Narayanan Raghupathy ◽

Kwangbom Choi ◽

Matthew J Vincent ◽

Glen L Beane ◽

Keith S Sheppard ◽

...

Keyword(s):

Hierarchical Analysis ◽

Rna Seq ◽

Specific Expression ◽

Allele Specific Expression ◽

Allele Specific

Download Full-text

Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets

Nature Communications ◽

10.1038/s41467-020-18320-z ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 2

Author(s):

Emily Berger ◽

Deniz Yorukoglu ◽

Lillian Zhang ◽

Sarah K. Nyquist ◽

Alex K. Shalek ◽

...

Keyword(s):

Long Range ◽

Genetic Variants ◽

Read Length ◽

Rna Seq ◽

Sequencing Data ◽

Specific Expression ◽

Integrative Framework ◽

Whole Exome ◽

Allele Specific ◽

Diverse Data

Abstract Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.

Download Full-text

Estimating the Allele-Specific Expression of SNVs From 10× Genomics Single-Cell RNA-Sequencing Data

Genes ◽

10.3390/genes11030240 ◽

2020 ◽

Vol 11 (3) ◽

pp. 240 ◽

Cited By ~ 2

Author(s):

Prashant N. M. ◽

Hongyu Liu ◽

Pavlos Bousounis ◽

Liam Spurr ◽

Nawaf Alomran ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Sequencing Data ◽

Specific Expression ◽

Single Nucleotide ◽

Healthy Donors ◽

Allele Expression ◽

Single Cell Rna Sequencing ◽

Allele Specific

With the recent advances in single-cell RNA-sequencing (scRNA-seq) technologies, the estimation of allele expression from single cells is becoming increasingly reliable. Allele expression is both quantitative and dynamic and is an essential component of the genomic interactome. Here, we systematically estimate the allele expression from heterozygous single nucleotide variant (SNV) loci using scRNA-seq data generated on the 10×Genomics Chromium platform. We analyzed 26,640 human adipose-derived mesenchymal stem cells (from three healthy donors), sequenced to an average of 150K sequencing reads per cell (more than 4 billion scRNA-seq reads in total). High-quality SNV calls assessed in our study contained approximately 15% exonic and >50% intronic loci. To analyze the allele expression, we estimated the expressed variant allele fraction (VAFRNA) from SNV-aware alignments and analyzed its variance and distribution (mono- and bi-allelic) at different minimum sequencing read thresholds. Our analysis shows that when assessing positions covered by a minimum of three unique sequencing reads, over 50% of the heterozygous SNVs show bi-allelic expression, while at a threshold of 10 reads, nearly 90% of the SNVs are bi-allelic. In addition, our analysis demonstrates the feasibility of scVAFRNA estimation from current scRNA-seq datasets and shows that the 3′-based library generation protocol of 10×Genomics scRNA-seq data can be informative in SNV-based studies, including analyses of transcriptional kinetics.

Download Full-text