CircParser: a novel streamlined pipeline for circular RNA structure and host gene prediction in non-model organisms

Circular RNAs (circRNAs) are long noncoding RNAs that play a significant role in various biological processes, including embryonic development and stress responses. These regulatory molecules can modulate microRNA activity and are involved in different molecular pathways as indirect regulators of gene expression. Thousands of circRNAs have been described in diverse taxa due to the recent advances in high throughput sequencing technologies, which led to a huge variety of total RNA sequencing being publicly available. A number of circRNA de novo and host gene prediction tools are available to date, but their ability to accurately predict circRNA host genes is limited in the case of low-quality genome assemblies or annotations. Here, we present CircParser, a simple and fast Unix/Linux pipeline that uses the outputs from the most common circular RNAs in silico prediction tools (CIRI, CIRI2, CircExplorer2, find_circ, and circFinder) to annotate circular RNAs, assigning presumptive host genes from local or public databases such as National Center for Biotechnology Information (NCBI). Also, this pipeline can discriminate circular RNAs based on their structural components (exonic, intronic, exon-intronic or intergenic) using a genome annotation file.

Download Full-text

HiC-Hiker: a probabilistic model to determine contig orientation in chromosome-length scaffolds with Hi-C

Bioinformatics ◽

10.1093/bioinformatics/btaa288 ◽

2020 ◽

Vol 36 (13) ◽

pp. 3966-3974

Author(s):

Ryo Nakabayashi ◽

Shinichi Morishita

Keyword(s):

Viterbi Algorithm ◽

De Novo ◽

Gene Prediction ◽

Effective Means ◽

Cost Effective ◽

Synteny Block ◽

Chromosome Length ◽

Model Organisms ◽

Contact Frequency ◽

Reference Quality

Abstract Motivation De novo assembly of reference-quality genomes used to require enormously laborious tasks. In particular, it is extremely time-consuming to build genome markers for ordering assembled contigs along chromosomes; thus, they are only available for well-established model organisms. To resolve this issue, recent studies demonstrated that Hi-C could be a powerful and cost-effective means to output chromosome-length scaffolds for non-model species with no genome marker resources, because the Hi-C contact frequency between a pair of two loci can be a good estimator of their genomic distance, even if there is a large gap between them. Indeed, state-of-the-art methods such as 3D-DNA are now widely used for locating contigs in chromosomes. However, it remains challenging to reduce errors in contig orientation because shorter contigs have fewer contacts with their neighboring contigs. These orientation errors lower the accuracy of gene prediction, read alignment, and synteny block estimation in comparative genomics. Results To reduce these contig orientation errors, we propose a new algorithm, named HiC-Hiker, which has a firm grounding in probabilistic theory, rigorously models Hi-C contacts across contigs, and effectively infers the most probable orientations via the Viterbi algorithm. We compared HiC-Hiker and 3D-DNA using human and worm genome contigs generated from short reads, evaluated their performances, and observed a remarkable reduction in the contig orientation error rate from 4.3% (3D-DNA) to 1.7% (HiC-Hiker). Our algorithm can consider long-range information between distal contigs and precisely estimates Hi-C read contact probabilities among contigs, which may also be useful for determining the ordering of contigs. Availability and implementation HiC-Hiker is freely available at: https://github.com/ryought/hic_hiker.

Download Full-text

A Multireference-Based Whole Genome Assembly for the Obligate Ant-Following Antbird, Rhegmatorhina melanosticta (Thamnophilidae)

Diversity ◽

10.3390/d11090144 ◽

2019 ◽

Vol 11 (9) ◽

pp. 144 ◽

Cited By ~ 4

Author(s):

Laís Coelho ◽

Lukas Musher ◽

Joel Cracraft

Keyword(s):

Genome Assembly ◽

High Throughput Sequencing ◽

Population Genomics ◽

De Novo ◽

Structural Difference ◽

Whole Genome ◽

Sequencing Technology ◽

A Genome ◽

Avian Genomes ◽

Chromosome Level

Current generation high-throughput sequencing technology has facilitated the generation of more genomic-scale data than ever before, thus greatly improving our understanding of avian biology across a range of disciplines. Recent developments in linked-read sequencing (Chromium 10×) and reference-based whole-genome assembly offer an exciting prospect of more accessible chromosome-level genome sequencing in the near future. We sequenced and assembled a genome of the Hairy-crested Antbird (Rhegmatorhina melanosticta), which represents the first publicly available genome for any antbird (Thamnophilidae). Our objectives were to (1) assemble scaffolds to chromosome level based on multiple reference genomes, and report on differences relative to other genomes, (2) assess genome completeness and compare content to other related genomes, and (3) assess the suitability of linked-read sequencing technology for future studies in comparative phylogenomics and population genomics studies. Our R. melanosticta assembly was both highly contiguous (de novo scaffold N50 = 3.3 Mb, reference based N50 = 53.3 Mb) and relatively complete (contained close to 90% of evolutionarily conserved single-copy avian genes and known tetrapod ultraconserved elements). The high contiguity and completeness of this assembly enabled the genome to be successfully mapped to the chromosome level, which uncovered a consistent structural difference between R. melanosticta and other avian genomes. Our results are consistent with the observation that avian genomes are structurally conserved. Additionally, our results demonstrate the utility of linked-read sequencing for non-model genomics. Finally, we demonstrate the value of our R. melanosticta genome for future researchers by mapping reduced representation sequencing data, and by accurately reconstructing the phylogenetic relationships among a sample of thamnophilid species.

Download Full-text

Peer Review #2 of "CircParser: a novel streamlined pipeline for circular RNA structure and host gene prediction in non-model organisms (v0.1)"

10.7287/peerj.8757v0.1/reviews/2 ◽

2020 ◽

Author(s):

A Sokolov

Keyword(s):

Peer Review ◽

Rna Structure ◽

Gene Prediction ◽

Circular Rna ◽

Host Gene ◽

Model Organisms

Download Full-text

Proteotranscriptomics assisted gene annotation and spatial proteomics of Bombyx mori BmN4 cell line

10.21203/rs.3.rs-23159/v2 ◽

2020 ◽

Author(s):

Michal Levin ◽

Marion Scheibe ◽

Falk Butter

Keyword(s):

Mass Spectrometry ◽

Bombyx Mori ◽

Cell Line ◽

De Novo ◽

High Resolution Mass Spectrometry ◽

Gene Annotation ◽

Transcriptome Assembly ◽

Model Organisms ◽

Sequence Information ◽

A Genome

Abstract BackgroundThe process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-Seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. ResultsCombining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. ConclusionsWe show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.

Download Full-text

A genome-wide circular RNA transcriptome in Rat

10.1101/2021.02.20.432122 ◽

2021 ◽

Author(s):

Disha Sharma ◽

Paras Sehgal ◽

Sridhar Sivasubbu ◽

Vinod Scaria

Keyword(s):

Developmental Stages ◽

Donor Site ◽

Model Organism ◽

Circular Rna ◽

Model Organisms ◽

Circular Rnas ◽

Acceptor Site ◽

Tissue Samples ◽

A Genome ◽

The Difference

AbstractBackgroundCircular RNAs are a novel class of non-coding RNAs that backsplice from 5’ donor site and 3’ acceptor site to form a circular structure. A number of circRNAs have been discovered in model organisms including human, mouse, Drosophila, among other organisms. There are a few candidate-based studies on circular RNAs in rat, a well studied model organism. The availability of a recent dataset of transcriptomes encompassing 11 tissues, 4 developmental stages and 2 genders motivated us to explore the landscape of circular RNAs in the organism.MethodologyIn order to understand the difference among different pipelines, we have used the same bodymap RNA sequencing dataset. A number of pipelines have been published to identify the backsplice junctions for the discovery of circRNAs but studies comparing these tools have suggested that a combination of tools would be a better approach to identify high-confidence circular RNAs. We employed 5 different combinations of tools including tophat_CIRCexplorer2, segemehl_CIRCexplorer2, star_CIRCexplorer, Bowtie2_findcirc and Bowtie2_findcirc (noHisat2) to identify circular RNAs from the dataset.ResultsOur analysis identified a number of tissue-specific, developmental stage specific and gender specific circular RNAs. We further independently validated 16 circRNA junctions out of 24 selected candidates in 5 tissue samples. We additionally estimated the quantitative expression of 5 circRNA candidates using real-time PCR and our analysis suggests 3 candidates as tissue-enrichedConclusionThis study is one of the most comprehensive studies that provides a circular RNA transcriptome as well as to understand the difference among different computational pipelines in Rat.

Download Full-text

Comprehensive transcriptome analysis of grafting onto Artemisia scoparia W. to affect the aphid resistance of chrysanthemum (Chrysanthemum morifolium T. )

10.21203/rs.2.10583/v3 ◽

2019 ◽

Author(s):

Xue-ying Zhang ◽

Xian-zhi Sun ◽

Sheng Zhang ◽

Jing-hui Yang ◽

Fang-fang Liu ◽

...

Keyword(s):

Stress Responses ◽

Molecular Mechanisms ◽

De Novo ◽

Transcriptome Assembly ◽

Chrysanthemum Morifolium ◽

The Self ◽

Rna Seq ◽

Aphid Infestation ◽

A Genome ◽

Artemisia Scoparia

Abstract Abstract Background: Aphid ( Macrosiphoniella sanbourni ) stress drastically influences the yield and quality of chrysanthemum, and grafting has been widely used to improve tolerance to biotic and abiotic stresses. However, the effect of grafting on the resistance of chrysanthemum to aphids remains unclear. Therefore, we used the RNA-Seq platform to perform a de novo transcriptome assembly to analyze the self-rooted grafted chrysanthemum ( Chrysanthemum morifolium T. 'Hangbaiju') and the grafted Artermisia-chrysanthemum (grafted onto Artemisia scoparia W.) transcription response to aphid stress. Results : The results showed that there were 1337 differentially expressed genes (DEGs), among which 680 were upregulated and 667 were downregulated, in the grafted Artemisia-chrysanthemum compared to the self-rooted grafted chrysanthemum. These genes were mainly involved in sucrose metabolism, the biosynthesis of secondary metabolites, the plant hormone signaling pathway and the plant-to-pathogen pathway. KEGG and GO enrichment analyses revealed the coordinated upregulation of these genes from numerous functional categories related to aphid stress responses. In addition, we determined the physiological indicators of chrysanthemum under aphid stress, and the results were consistent with the molecular sequencing results. All evidence indicated that grafting chrysanthemum onto A. scoparia W. upregulated aphid stress responses in chrysanthemum. Conclusion: In summary, our study presents a genome-wide transcript profile of the self-rooted grafted chrysanthemum and the grafted Artemisia-chrysanthemum and provides insights into the molecular mechanisms of C. morifolium T. in response to aphid infestation. These data will contribute to further studies of aphid tolerance and the exploration of new candidate genes for chrysanthemum molecular breeding. Key words : Chrysanthemum, Grafting, Aphid stress, Gene expression, RNA-Seq

Download Full-text

Population genomics of bank vole populations reveals associations between immune related genes and the epidemiology of Puumala hantavirus in Sweden

10.1101/148163 ◽

2017 ◽

Author(s):

Audrey Rohfritsch ◽

Maxime Galan ◽

Mathieu Gautier ◽

Karim Gharbi ◽

Gert Olsson ◽

...

Keyword(s):

High Throughput ◽

Bank Vole ◽

High Throughput Sequencing ◽

Population Genomics ◽

Reservoir Host ◽

Model Organisms ◽

Myodes Glareolus ◽

A Genome ◽

Outlier Loci ◽

Immune Related Genes

AbstractInfectious pathogens are major selective forces acting on individuals. The recent advent of high-throughput sequencing technologies now enables to investigate the genetic bases of resistance/susceptibility to infections in non-model organisms. From an evolutionary perspective, the analysis of the genetic diversity observed at these genes in natural populations provides insight into the mechanisms maintaining polymorphism and their epidemiological consequences. We explored these questions in the context of the interactions between Puumala hantavirus (PUUV) and its reservoir host, the bank vole Myodes glareolus. Despite the continuous spatial distribution of M. glareolus in Europe, PUUV distribution is strongly heterogeneous. Different defence strategies might have evolved in bank voles as a result of co-adaptation with PUUV, which may in turn reinforce spatial heterogeneity in PUUV distribution. We performed a genome scan study of six bank vole populations sampled along a North/South transect in Sweden, including PUUV endemic and non-endemic areas. We combined candidate gene analyses (Tlr4, Tlr7, Mx2 genes) and high throughput sequencing of RAD (Restriction-site Associated DNA) markers. We found evidence for outlier loci showing high levels of genetic differentiation. Ten outliers among the 52 that matched to mouse protein-coding genes corresponded to immune related genes and were detected using ecological associations with variations in PUUV prevalence. One third of the enriched pathways concerned immune processes, including platelet activation and TLR pathway. In the future, functional experimentations should enable to confirm the role of these these immune related genes with regard to the interactions between M. glareolus and PUUV.

Download Full-text

Intragenic tRNA-promoted R-loops orchestrate transcription interference for plant oxidative stress responses

The Plant Cell ◽

10.1093/plcell/koab220 ◽

2021 ◽

Author(s):

Kunpeng Liu ◽

Qianwen Sun

Keyword(s):

Oxidative Stress ◽

Arabidopsis Thaliana ◽

Stress Responses ◽

Rna Polymerases ◽

Host Gene ◽

Trna Genes ◽

Nudix Hydrolase ◽

C Subunit ◽

Host Genes ◽

Transcription Interference

Abstract Eukaryotic genomes are transcribed by at least three RNA polymerases, RNAPI, II, and III. Co-transcriptional R-loops play diverse roles in genome regulation and maintenance. However, little is known about how R-loops regulate transcription interference, the transcriptional event that is caused by different RNA polymerases transcribing the same genomic templates. Here, we established that the intragenic tRNA genes can promote sense R-loop enrichment (named intra-tR-loops) in Arabidopsis thaliana, and found that intra-tR-loops are decreased in an RNAPIII mutant, nrpc7-1 (NUCLEAR RNA POLYMERASE C, SUBUNIT 7). NRPC7 is co-localized with RNAPIIS2P at intragenic tRNA genes and interferes with RNAPIIS2P elongation. Conversely, the binding of NRPC7 at intragenic tRNA genes is increased following inhibition of RNAPII elongation. The transcription of specific tRNA host genes is inhibited by RNAPIII, and the inhibition of tRNA host genes is intra-tR-loop dependent. Moreover, alleviating the inhibition of tRNAPro-induced intra-tR-loops on its host gene AtNUDX1 (Arabidopsis Nudix hydrolase 1) promotes oxidative stress tolerance in Arabidopsis thaliana. Our work suggests intra-tR-loops regulate host gene expression by modulating RNA polymerases interference.

Download Full-text

Assembly by Reduced Complexity (ARC): a hybrid approach for targeted assembly of homologous sequences.

10.1101/014662 ◽

2015 ◽

Cited By ~ 17

Author(s):

Samual S Hunter ◽

Robert T Lyon ◽

Brice A.J. Sarver ◽

Kayla Hardwick ◽

Larry J Forney ◽

...

Keyword(s):

De Novo Assembly ◽

High Throughput Sequencing ◽

De Novo ◽

Hybrid Approach ◽

Reference Sequence ◽

Model Organisms ◽

Exome Capture ◽

Mitochondrial Genomes ◽

Homologous Sequences ◽

Reduced Complexity

Analysis of High-throughput sequencing (HTS) data is a difficult problem, especially in the context of non-model organisms where comparison of homologous sequences may be hindered by the lack of a close reference genome. Current mapping-based methods rely on the availability of a highly similar reference sequence, whereas de novo assemblies produce anonymous (unannotated) contigs that are not easily compared across samples. Here, we present Assembly by Reduced Complexity (ARC) a hybrid mapping and assembly approach for targeted assembly of homologous sequences. ARC is an open-source project (http://ibest.github.io/ARC/) implemented in the Python language and consists of the following stages: 1) align sequence reads to reference targets, 2) use alignment results to distribute reads into target specific bins, 3) perform assemblies for each bin (target) to produce contigs, and 4) replace previous reference targets with assembled contigs and iterate. We show that ARC is able to assemble high quality, unbiased mitochondrial genomes seeded from 11 progressively divergent references, and is able to assemble full mitochondrial genomes starting from short, poor quality ancient DNA reads. We also show ARC compares favorably to de novo assembly of a large exome capture dataset for CPU and memory requirements; assembling 7,627 individual targets across 55 samples, completing over 1.3 million assemblies in less than 78 hours, while using under 32 Gb of system memory. ARC breaks the assembly problem down into many smaller problems, solving the anonymous contig and poor scaling inherent in some de novo assembly methods and reference bias inherent in traditional read mapping.

Download Full-text

Field-based species identification in eukaryotes using real-time nanopore sequencing

10.1101/107656 ◽

2017 ◽

Cited By ~ 1

Author(s):

Joe Parker ◽

Andrew J. Helmstetter ◽

Dion Devey ◽

Alexander S.T. Papadopulos

Keyword(s):

Real Time ◽

Species Identification ◽

Dna Sequences ◽

High Throughput Sequencing ◽

De Novo ◽

Reference Database ◽

Nanopore Sequencing ◽

Technological Advances ◽

A Genome ◽

Hybrid Genome

Advances in DNA sequencing and informatics have revolutionised biology over the past four decades, but technological limitations have left many applications unexplored1,2. Recently, portable, real-time, nanopore sequencing (RTnS) has become available. This offers opportunities to rapidly collect and analyse genomic data anywhere3–5. However, the generation of datasets from large, complex genomes has been constrained to laboratories6,7. The portability and long DNA sequences of RTnS offer great potential for field-based species identification, but the feasibility and accuracy of these technologies for this purpose have not been assessed. Here, we show that a field-based RTnS analysis of closely-related plant species (Arabidopsis spp.)8 has many advantages over laboratory-based high-throughput sequencing (HTS) methods for species level identification-by-sequencing and de novo phylogenomics. Samples were collected and sequenced in a single day by RTnS using a portable, “al fresco” laboratory. Our analyses demonstrate that correctly identifying unknown reads from matches to a reference database with RTnS reads enables rapid and confident species identification. Individually annotated RTnS reads can be used to infer the evolutionary relationships of A. thaliana. Furthermore, hybrid genome assembly with RTnS and HTS reads substantially improved upon a genome assembled from HTS reads alone. Field-based RTnS makes real-time, rapid specimen identification and genome wide analyses possible. These technological advances are set to revolutionise research in the biological sciences9 and have broad implications for conservation, taxonomy, border agencies and citizen science.

Download Full-text