scholarly journals Full-Length Transcriptome Sequencing of the Scleractinian Coral Montipora foliosa Reveals the Gene Expression Profile of Coral–Zooxanthellae Holobiont

Biology ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1274
Author(s):  
Yunqing Liu ◽  
Xin Liao ◽  
Tingyu Han ◽  
Ao Su ◽  
Zhuojun Guo ◽  
...  

Coral–zooxanthellae holobionts are one of the most productive ecosystems in the ocean. With global warming and ocean acidification, coral ecosystems are facing unprecedented challenges. To save the coral ecosystems, we need to understand the symbiosis of coral–zooxanthellae. Although some Scleractinia (stony corals) transcriptomes have been sequenced, the reliable full-length transcriptome is still lacking due to the short-read length of second-generation sequencing and the uncertainty of the assembly results. Herein, PacBio Sequel II sequencing technology polished with the Illumina RNA-seq platform was used to obtain relatively complete scleractinian coral M. foliosa transcriptome data and to quantify M. foliosa gene expression. A total of 38,365 consensus sequences and 20,751 unique genes were identified. Seven databases were used for the gene function annotation, and 19,972 genes were annotated in at least one database. We found 131 zooxanthellae transcripts and 18,829 M. foliosa transcripts. A total of 6328 lncRNAs, 847 M. foliosa transcription factors (TFs), and 2 zooxanthellae TF were identified. In zooxanthellae we found pathways related to symbiosis, such as photosynthesis and nitrogen metabolism. Pathways related to symbiosis in M. foliosa include oxidative phosphorylation and nitrogen metabolism, etc. We summarized the isoforms and expression level of the symbiont recognition genes. Among the membrane proteins, we found three pathways of glycan biosynthesis, which may be involved in the organic matter storage and monosaccharide stabilization in M. foliosa. Our results provide better material for studying coral symbiosis.

Author(s):  
Marine Guilcher ◽  
Arnaud Liehrmann ◽  
Chloé Seyman ◽  
Thomas Blein ◽  
Guillem Rigaill ◽  
...  

Plastid gene expression involves many post-transcriptional maturation steps resulting in a complex transcriptome composed of multiple isoforms. Although short read RNA-seq has considerably improved our understanding of the molecular mechanisms controlling these processes, it is unable to sequence full-length transcripts. This information is however crucial when it comes to understand the interplay between the various steps of plastid gene expression. Here, the study of the Arabidopsis leaf plastid transcriptome using Nanopore sequencing showed that many splicing and editing events were not independent but co-occurring. For a given transcript, maturation events also appeared to be chronologically ordered with splicing happening after most sites are edited.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3091 ◽  
Author(s):  
Anna V. Klepikova ◽  
Artem S. Kasianov ◽  
Mikhail S. Chesnokov ◽  
Natalia L. Lazarevich ◽  
Aleksey A. Penin ◽  
...  

BackgroundRNA-seq is a useful tool for analysis of gene expression. However, its robustness is greatly affected by a number of artifacts. One of them is the presence of duplicated reads.ResultsTo infer the influence of different methods of removal of duplicated reads on estimation of gene expression in cancer genomics, we analyzed paired samples of hepatocellular carcinoma (HCC) and non-tumor liver tissue. Four protocols of data analysis were applied to each sample: processing without deduplication, deduplication using a method implemented in samtools, and deduplication based on one or two molecular indices (MI). We also analyzed the influence of sequencing layout (single read or paired end) and read length. We found that deduplication without MI greatly affects estimated expression values; this effect is the most pronounced for highly expressed genes.ConclusionThe use of unique molecular identifiers greatly improves accuracy of RNA-seq analysis, especially for highly expressed genes. We developed a set of scripts that enable handling of MI and their incorporation into RNA-seq analysis pipelines. Deduplication without MI affects results of differential gene expression analysis, producing a high proportion of false negative results. The absence of duplicate read removal is biased towards false positives. In those cases where using MI is not possible, we recommend using paired-end sequencing layout.


2018 ◽  
Author(s):  
Wan R. Yang ◽  
Daniel Ardeljan ◽  
Clarissa N. Pacyna ◽  
Lindsay M. Payer ◽  
Kathleen H. Burns

AbstractTransposable elements are interspersed repeat sequences that make up much of the human genome. Conventional approaches to RNA-seq analysis often exclude these sequences, fail to optimally adjudicate read alignments, or align reads to interspersed repeat consensus sequences without considering these transcripts in their genomic contexts. As a result, repetitive sequence contributions to transcriptomes are not well understood. Here, we present Software for Quantifying Interspersed Repeat Expression (SQuIRE), an RNA-seq analysis pipeline that integrates repeat and genome annotation (RepeatMasker), read alignment (STAR), gene expression (StringTie) and differential expression (DESeq2). SQuIRE uniquely provides a locus-specific picture of interspersed repeat-encoded RNA expression. SQuIRE can be downloaded at (github.com/wyang17/SQuIRE).


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 2331-2331
Author(s):  
Vikram R Paralkar ◽  
Tejaswini Mishra ◽  
Jing Luan ◽  
Yu Yao ◽  
Neeraja Konuthula ◽  
...  

Abstract Abstract 2331 Lnc (long noncoding) RNAs are RNA transcripts greater than 200nt that regulate gene expression independent of protein coding potential. It is estimated that thousands of lncRNAs play vital roles in diverse cellular processes and are involved in numerous diseases, including cancer. We hypothesize that multiple lncRNAs regulate erythrocyte and megakaryocyte formation by modulating gene expression. To identify lncRNAs in erythro-megakaryopoiesis, we purified two biological replicates each of murine Ter119+ erythroblasts, CD41+ megakaryocytes and bipotential megakaryocyte-erythroid progenitors (MEPs) [Lin− Kit+, Sca1−, CD16/32−, CD34−]. We performed strand-specific, paired-end, 200nt-read-length deep sequencing (RNA-Seq) to a depth of ∼200 million reads per sample using the Illumina GAII platform. We used the Tophat and Cufflinks suite of bioinformatic tools to assemble and compare de-novo transcriptomes from these three cell types, producing a high-confidence set of 69,488 transcripts. We confirmed that the RNA-seq assemblies accurately reflect gene expression predicted from prior studies. For example, Ter119+ cells were highly enriched for key erythroid transcripts encoding globins, heme synthetic enzymes and specialized membrane proteins. Megakaryocytes expressed high levels of gene encoding lineage-specific integrins and platelet markers. MEPs expressed numerous progenitor genes including Gata2, Kit and Myc. Thus, the RNA-seq data are of high-quality and sufficient complexity to accurately represent erythroid, megakaryocytic and MEP transcriptomes. We used a series of Unix-based bioinformatic filtering tools to identify lncRNAs that are expressed in these transcriptomes. We identified 605 “stringent” lncRNAs, and 813 “potential noncoding” transcripts. 47% of the lncRNAs are novel unannotated transcripts, validating the use of de-novo RNA-Seq in unique cell populations for lncRNA discovery. Among the 605 “stringent” lncRNAs, 103 are erythroid-restricted, 133 are meg-restricted and 280 are MEP-restricted, consistent with reports that lncRNAs exhibit exquisitely cell-type specific expression. Current efforts are aimed at generating a more comprehensive map of lncRNA expression at specific stages of erythroid and megakaryocyte/platelet development, and performing high throughput functional screens to analyze currently identified lncRNAs. Our studies are beginning to define new layers of gene regulation in normal erythro-megakaryopoiesis and are relevant to the pathophysiology of related disorders including various anemias, myeloproliferative and myelodysplastic syndromes and leukemias. Disclosures: No relevant conflicts of interest to declare.


2019 ◽  
Vol 21 (Supplement_6) ◽  
pp. vi101-vi101
Author(s):  
Piroon Jejaroenpun ◽  
Thidathip Wongsurawat ◽  
Annick DeLoose ◽  
David Ussery ◽  
Intawat Nookaew ◽  
...  

Abstract The RNA sequencing (RNA-Seq) technique is now routinely used to quantitatively explore genome-wide expression by various research fields including cancer research. The most common RNA-seq methodology produce billions of short-read sequencing in the range of 100–600 base pairs, from which it is occasionally difficult to reconstruct isoform-level transcriptome and fusion genes. The limitations of the short-reads can be overcome by using third-generation sequencing technologies, such as Oxford Nanopore Technologies (ONT). This study aims to perform full-length cDNA sequencing using ONT platform and investigate the abilities of ONT in (1) identifying differential gene expression, (2) detecting differential transcript isoform usage, and (3) detecting fusion genes. To do these methods, CNS-1 cells were implanted into the frontal lobes of three Lewis rats. The CNS-1 model is a histocompatible astrocytoma cell line with an invasive pattern mimicking glioblastoma (GBM). After two weeks of transplantation, the transplanted tumors and the normal brain on the other side were collected as matched normal-tumor pairs. Total RNA extracted from the samples were subjected to the full-length cDNA sequencing on a portable MinION sequencer. In tumors samples, 615 genes involved in cell cycle were upregulated, whereas 1067 genes involved in neurological functions were downregulated. Finally, we could identify differential transcript isoform expression and fusion genes from the matched normal-tumor pairs. Overall, full-length sequencing of the cDNA molecules permitted a detailed characterization of the differential gene expression, the isoform complexity, and fusion genes. In the near future, we will use these methods on human samples.


2014 ◽  
Author(s):  
Nuno A Fonseca ◽  
John A Marioni ◽  
Alvis Brazma

Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignment/quantification tools can have a major effect upon the expression levels found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the ``true'' expression levels? We evaluate fifty gene profiling pipelines in experimental and simulated data sets with different characteristics (e.g, read length and sequencing depth). In the absence of knowledge of the 'ground truth' in real RNAseq data sets, we used simulated data to assess the differences between the true expression and those reconstructed by the analysis pipelines. Even though this approach does not take into account all known biases present in RNAseq data, it still allows to assess the accuracy of the gene expression values inferred by different analysis pipelines. The results show that i) overall there is a high correlation between the expression levels inferred by the best pipelines and the true quantification values; ii) the error in the estimated gene expression values can vary considerably across genes; and iii) a small set of genes have expression estimates with consistently high error (across data sets and methods). Finally, although the mapping software is important, the quantification method makes a greater difference to the results.


2021 ◽  
Vol 22 (20) ◽  
pp. 11297
Author(s):  
Marine Guilcher ◽  
Arnaud Liehrmann ◽  
Chloé Seyman ◽  
Thomas Blein ◽  
Guillem Rigaill ◽  
...  

Plastid gene expression involves many post-transcriptional maturation steps resulting in a complex transcriptome composed of multiple isoforms. Although short-read RNA-Seq has considerably improved our understanding of the molecular mechanisms controlling these processes, it is unable to sequence full-length transcripts. This information is crucial, however, when it comes to understanding the interplay between the various steps of plastid gene expression. Here, we describe a protocol to study the plastid transcriptome using nanopore sequencing. In the leaf of Arabidopsis thaliana, with about 1.5 million strand-specific reads mapped to the chloroplast genome, we could recapitulate most of the complexity of the plastid transcriptome (polygenic transcripts, multiple isoforms associated with post-transcriptional processing) using virtual Northern blots. Even if the transcripts longer than about 2,500 nucleotides were missing, the study of the co-occurrence of editing and splicing events identified 42 pairs of events that were not occurring independently. This study also highlighted a preferential chronology of maturation events with splicing happening after most sites were edited.


2019 ◽  
Author(s):  
Camille Sessegolo ◽  
Corinne Cruaud ◽  
Corinne Da Silva ◽  
Audric Cologne ◽  
Marion Dubarry ◽  
...  

AbstractOur vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T’s. This bias is marked for runs of at least 15 T’s, but is already detectable for runs of at least 9 T’s and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene. The entire dataset is available from http://www.genoscope.cns.fr/externe/ONT_mouse_RNA.


2019 ◽  
Author(s):  
Koen Van den Berge ◽  
Hector Roux de Bézieux ◽  
Kelly Street ◽  
Wouter Saelens ◽  
Robrecht Cannoodt ◽  
...  

AbstractTrajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression levels during biological processes such as the cell cycle, cell type differentiation, and cellular activation. Downstream of trajectory inference, it is vital to discover genes that are associated with the lineages in the trajectory to illuminate the underlying biological processes. Furthermore, genes that are differentially expressed between developmental/activational lineages might be highly relevant to further unravel the system under study. Current data analysis procedures, however, typically cluster cells and assess differential expression between the clusters, which fails to exploit the continuous resolution provided by trajectory inference to its full potential. The few available non-cluster-based methods only assess broad differences in gene expression between lineages, hence failing to pinpoint the exact types of divergence. We introduce a powerful generalized additive model framework based on the negative binomial distribution that allows flexible inference of (i) within-lineage differential expression by detecting associations between gene expression and pseudotime over an entire lineage or by comparing gene expression between points/regions within the lineage and (ii) between-lineage differential expression by comparing gene expression between lineages over the entire lineages or at specific points/regions. By incorporating observation-level weights, the model additionally allows to account for zero inflation, commonly observed in single-cell RNA-seq data from full-length protocols. We evaluate the method on simulated and real datasets from droplet-based and full-length protocols, and show that the flexible inference framework is capable of yielding biological insights through a clear interpretation of the data.


Sign in / Sign up

Export Citation Format

Share Document