scholarly journals A de novo transcriptional atlas in Danaus plexippus reveals variability in dosage compensation across tissues

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
José M. Ranz ◽  
Pablo M. González ◽  
Bryan D. Clifton ◽  
Nestor O. Nazario-Yepiz ◽  
Pablo L. Hernández-Cervantes ◽  
...  

AbstractA detailed knowledge of gene function in the monarch butterfly is still lacking. Here we generate a genome assembly from a Mexican nonmigratory population and used RNA-seq data from 14 biological samples for gene annotation and to construct an atlas portraying the breadth of gene expression during most of the monarch life cycle. Two thirds of the genes show expression changes, with long noncoding RNAs being particularly finely regulated during adulthood, and male-biased expression being four times more common than female-biased. The two portions of the monarch heterochromosome Z, one ancestral to the Lepidoptera and the other resulting from a chromosomal fusion, display distinct association with sex-biased expression, reflecting sample-dependent incompleteness or absence of dosage compensation in the ancestral but not the novel portion of the Z. This study presents extended genomic and transcriptomic resources that will facilitate a better understanding of the monarch’s adaptation to a changing environment.

2017 ◽  
Author(s):  
Mickael Orgeur ◽  
Marvin Martens ◽  
Stefan T. Börno ◽  
Bernd Timmermann ◽  
Delphine Duprez ◽  
...  

AbstractThe sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads and the gene annotation that defines gene features must also be taken into account. Partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and ade novotranscriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.


2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2020 ◽  
Author(s):  
José M. Ranz ◽  
Pablo M. González ◽  
Bryan D. Clifton ◽  
Nestor O. Nazario ◽  
Pablo L. Hernández-Cervantes ◽  
...  

ABSTRACTThe monarch butterfly epitomizes insect biodiversity decline. Understanding the genetic basis of the adaptation of the monarch to a changing environment requires genomic and transcriptomic resources that better reflect its genetic diversity while being informative about gene functionality during life cycle. We report a reference-quality genome assembly from an individual resident at a nonmigratory colony in Mexico, and a new gene annotation and expression atlas for 14,865 genes, including 492 unreported long noncoding RNA (lncRNA) genes, based on RNA-seq data from 14 larval and pupal stages, plus adult morphological sections. Two thirds of the genes show significant expression changes associated with a life stage or section, with lncRNAs being more finely regulated during adulthood than protein-coding genes, and male-biased expression being four times more common than female-biased. The two portions of the heterochromosome Z display distinct patterns of differential expression between the sexes, reflecting that dosage compensation is either absent or incomplete –depending on the sample– in the ancestral but not in the novel portion of the Z. This study represents a major advance in the genomic and transcriptome resources available for D. plexippus while providing the first systematic analysis of its transcriptional program across most of its life cycle.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 3783-3783
Author(s):  
Alexia Katsarou ◽  
Nikolaos Trasanidis ◽  
Jaime Alvarez-Benayas ◽  
Foteini Papaleonidopoulou ◽  
Keren Keren ◽  
...  

Overexpression of the transcription factor MAF, as a result of its juxtaposition to the IgH enhancer [MAF-translocated t(14;16)], is a myeloma-initiating event in 3-5% of patients with multiple myeloma (MM) and confers a poor prognosis. MAF is also overexpressed in another 40% of cases, often in co-operation with the oncogene MMSET. The mechanisms by which MAF overexpression impacts on the regulatory genome to generate the MAF-driven oncogenic transcriptome and its direct targets are not known. To address this, we employed a multi-layer -omics approach using primary myeloma plasma cells (PC) as well as myeloma cell lines (MMCL). First, we determined the chromatin accessibility and transcriptome profiles of MAF-translocated myeloma by performing ATAC-seq and RNA-seq, respectively, in purified bone marrow CD138+ PC from two patients with t(14;16) and three healthy donors. We identified 6,640 differentially accessible regions, 87% of which displayed enhanced chromatin accessibility in MAF samples compared to normal PC. Secondary analysis comparing this with ATAC-seq data from a set of 28 other MM samples, including hyperdiploid, MMSET and CCND1-translocated MM, revealed 33% of those regions to be MAF subgroup specific (1,949 regions), with the rest shared between MAF and other cytogenetic groups. Gene annotation and pathway enrichment analysis using GREAT confirmed overrepresentation of the MF myeloma patient signature, as previously identified in microarray datasets. RNA-seq detected significant upregulation of approximately 900 genes in MAF samples compared to normal counterparts, including MAF itself (top 4th hit) as well as its presumed targets (CCND2, ITGB7 and NUAK1). Next, we obtained the MAF cistrome using ChIP-seq in the MAF-translocated MMCL MM1.S and integrated it with the primary PC ATAC-seq data. This revealed that 31% (618/1,949) of the differentially accessible regions in MAF-translocated MM PC are also MAF-bound. Additional overlay with ENCODE ChromHMM epigenome map showed that 47% of MAF binding sites are on active enhancers and 42% on active promoters signifying potential direct regulation of the corresponding genes. Next, we superimposed the accessible and MAF-bound loci on the epigenomic landscapes of normal PC and other B-cell types using their corresponding ChromHMM maps (Blueprint consortium data). Interestingly, 56% (345/618) of the MAF-specific regions were not active in any stage of B cell development. This suggests that aberrant MAF overexpression and chromatin binding in PC is associated with de novo activation of these chromatin regions, over half of which (200/345; 58%) are enhancers; we termed these 'neo-enhancers'. Upon de novo motif analysis of MAF ChIP-seq in MAF-translocated JJN3 and MM1.S MMCL, we confirmed MAF as the first and, interestingly, IRF4 as the second top hit, suggesting a possible MAF-IRF4 functional interaction in myelomagenesis. Indeed, overlay of the accessible MAF-bound loci with IRF4 ChIP-seq data in MM1.S revealed 63% co-occupancy (including 62% of "neo-enhancers"), proposing a novel and extensive co-operative chromatin-based network between the two transcription factors. Final integration of the accessible MAF-bound regions with the paired transcriptomes of primary myeloma PC revealed a set 206 candidate enhancer-gene pairs. Strikingly, we identified two IRF4-cobound "neo-enhancers" linked to overexpression of TLR4 and CCR1, two genes known for their roles in myeloma cell proliferation and migration. We confirmed significant downregulation of both genes upon shRNA-mediated knockdown of MAF in the two MAF-translocated MMCL, MM1.S and JJN3, as well as the lethality of MAF depletion. Further, MAF overexpression in MAF-negative myeloma backgrounds led to transcriptional upregulation of these genes, further validating them as MAF targets. While CRISPR/Cas9i experiments targeting TLR4 are ongoing, preliminary results validated the functional role of the "neo-enhancer" in CCR1 gene expression. In conclusion, we demonstrate for the first time an extensive re-organisation of the PC chromatin conferred by oncogenic MAF in MM; we reveal its extensive co-operation with IRF4 in this process; we validate the directly MAF-regulated genes and functionally characterise neo-enhancers of key MAF-dependent genes that in addition to MAF itself are also critical for myeloma biology. Disclosures Hatjiharissi: Janssen: Honoraria. Caputo:GSK: Research Funding. Karadimitris:GSK: Research Funding.


2020 ◽  
Author(s):  
PENG MA ◽  
Xiao Zhang ◽  
Bowen Luo ◽  
Zhen Chen ◽  
Xuan He ◽  
...  

Abstract Background: Long noncoding RNAs (lncRNAs) play important roles in essential biological processes. However, our understanding of lncRNAs as competing endogenous RNAs (ceRNAs) and their responses to nitrogen stress is still limited.Results: Here, we surveyed the lncRNAs and miRNAs in maize inbred line P178 leaves and roots at the seedling stage under high-nitrogen and low-nitrogen conditions using lncRNA-Seq and small RNA-Seq. A total of 894 differentially expressed lncRNAs and 38 different miRNAs were identified. Co-expression analysis found two lncRNAs and four lncRNA-targets could competitively combine with ZmmiR159 and ZmmiR164, respectively. To dissect the genetic regulatory by which lncRNAs might enable adaptation to limited nitrogen availability. An association mapping panel containing a high-density single–nucleotide polymorphism (SNP) array (56,110 SNPs) combined with variable LN resistance-related phenotypes obtained from hydroponics was used for a genome-wide association study (GWAS). By combining GWAS and RNA-Seq, 170 differently expressed lncRNAs within the range of significant markers were screened. Moreover, 40 consistently LN-responsive genes including those involved in glutamine biosynthesis and nitrogen acquisition in root were identified. Transient expression assays in Nicotiana benthamiana demonstrated LNC_002923 could inhabit ZmmiR159-guided cleavage of Zm00001d015521. Conclusions: These lncRNAs containing trait-associated significant SNPs could consider to be related to root development and nutrient utilization. Taken together, the results of our study can provide new insights into the potential regulatory roles of lncRNAs in response to LN stress, and give valuable information for further screening of candidates as well as the improvement of maize regarding LN-responsive resistance.


2020 ◽  
Author(s):  
Michal Levin ◽  
Marion Scheibe ◽  
Falk Butter

Abstract BackgroundThe process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-Seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. ResultsCombining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. ConclusionsWe show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.


2019 ◽  
Author(s):  
Xue-ying Zhang ◽  
Xian-zhi Sun ◽  
Sheng Zhang ◽  
Jing-hui Yang ◽  
Fang-fang Liu ◽  
...  

Abstract Abstract Background: Aphid ( Macrosiphoniella sanbourni ) stress drastically influences the yield and quality of chrysanthemum, and grafting has been widely used to improve tolerance to biotic and abiotic stresses. However, the effect of grafting on the resistance of chrysanthemum to aphids remains unclear. Therefore, we used the RNA-Seq platform to perform a de novo transcriptome assembly to analyze the self-rooted grafted chrysanthemum ( Chrysanthemum morifolium T. 'Hangbaiju') and the grafted Artermisia-chrysanthemum (grafted onto Artemisia scoparia W.) transcription response to aphid stress. Results : The results showed that there were 1337 differentially expressed genes (DEGs), among which 680 were upregulated and 667 were downregulated, in the grafted Artemisia-chrysanthemum compared to the self-rooted grafted chrysanthemum. These genes were mainly involved in sucrose metabolism, the biosynthesis of secondary metabolites, the plant hormone signaling pathway and the plant-to-pathogen pathway. KEGG and GO enrichment analyses revealed the coordinated upregulation of these genes from numerous functional categories related to aphid stress responses. In addition, we determined the physiological indicators of chrysanthemum under aphid stress, and the results were consistent with the molecular sequencing results. All evidence indicated that grafting chrysanthemum onto A. scoparia W. upregulated aphid stress responses in chrysanthemum. Conclusion: In summary, our study presents a genome-wide transcript profile of the self-rooted grafted chrysanthemum and the grafted Artemisia-chrysanthemum and provides insights into the molecular mechanisms of C. morifolium T. in response to aphid infestation. These data will contribute to further studies of aphid tolerance and the exploration of new candidate genes for chrysanthemum molecular breeding. Key words : Chrysanthemum, Grafting, Aphid stress, Gene expression, RNA-Seq


2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

AbstractBackgroundThe quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data.ResultsWe developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5’ and 3’ tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.ConclusionsOur proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9585
Author(s):  
Wei Xia ◽  
Yajing Dou ◽  
Rui Liu ◽  
Shufang Gong ◽  
Dongyi Huang ◽  
...  

Long noncoding RNAs (lncRNAs) are an important class of genes and play important roles in a range of biological processes. However, few reports have described the identification of lncRNAs in oil palm. In this study, we applied strand specific RNA-seq with rRNA removal to identify 1,363 lncRNAs from the equally mixed tissues of oil palm spear leaf and six different developmental stages of mesocarp (8–24 weeks). Based on strand specific RNA-seq data and 18 released oil palm transcriptomes, we systematically characterized the expression patterns of lncRNA loci and their target genes. A total of 875 uniq target genes for natural antisense lncRNAs (NAT-lncRNA, 712), long intergenic noncoding RNAs (lincRNAs, 92), intronic-lncRNAs (33), and sense-lncRNAs (52) were predicted. A majority of lncRNA loci (77.8%–89.6%) had low expression in 18 transcriptomes, while only 89 lncRNA loci had medium to high expression in at least one transcriptome. Coexpression analysis between lncRNAs and their target genes indicated that 6% of lncRNAs had expression patterns positively correlated with those of target genes. Based on single nucleotide polymorphism (SNP) markers derived from our previous research, 6,882 SNPs were detected for lncRNAs and 28 SNPs belonging to 21 lncRNAs were associated with the variation of fatty acid contents. Moreover, seven lncRNAs showed expression patterns positively correlated expression pattern with those of genes in de novo fatty acid synthesis pathways. Our study identified a collection of lncRNAs for oil palm and provided clues for further research into lncRNAs that may regulate mesocarp development and lipid metabolism.


2015 ◽  
Vol 36 (5) ◽  
pp. 809-819 ◽  
Author(s):  
Gireesh K. Bogu ◽  
Pedro Vizán ◽  
Lawrence W. Stanton ◽  
Miguel Beato ◽  
Luciano Di Croce ◽  
...  

Discovering and classifying long noncoding RNAs (lncRNAs) across all mammalian tissues and cell lines remains a major challenge. Previously, mouse lncRNAs were identified using transcriptome sequencing (RNA-seq) data from a limited number of tissues or cell lines. Additionally, associating a few hundred lncRNA promoters with chromatin states in a single mouse cell line has identified two classes of chromatin-associated lncRNA. However, the discovery and classification of lncRNAs is still pending in many other tissues in mouse. To address this, we built a comprehensive catalog of lncRNAs by combining known lncRNAs with high-confidence novel lncRNAs identified by mapping andde novoassembling billions of RNA-seq reads from eight tissues and a primary cell line in mouse. Next, we integrated this catalog of lncRNAs with multiple genome-wide chromatin state maps and found two different classes of chromatin state-associated lncRNAs, including promoter-associated (plncRNAs) and enhancer-associated (elncRNAs) lncRNAs, across various tissues. Experimental knockdown of an elncRNA resulted in the downregulation of the neighboring protein-codingKdm8gene, encoding a histone demethylase. Our findings provide 2,803 novel lncRNAs and a comprehensive catalog of chromatin-associated lncRNAs across different tissues in mouse.


Sign in / Sign up

Export Citation Format

Share Document