scholarly journals TRAPID 2.0: a web application for taxonomic and functional analysis of de novo transcriptomes

2020 ◽  
Author(s):  
François Bucchini ◽  
Andrea Del Cortona ◽  
Łukasz Kreft ◽  
Alexander Botzki ◽  
Michiel Van Bel ◽  
...  

ABSTRACTAdvances in high-throughput sequencing have resulted in a massive increase of RNA-Seq transcriptome data. However, the promise of rapid gene expression profiling in a specific tissue, condition, unicellular organism, or microbial community comes with new computational challenges. Owing to the limited availability of well-resolved reference genomes, de novo assembled (meta)transcriptomes have emerged as popular tools for investigating the gene repertoire of previously uncharacterized organisms. Yet, despite their potential, these datasets often contain fragmented or contaminant sequences, and their analysis remains difficult. To alleviate some of these challenges, we developed TRAPID 2.0, a web application for the fast and efficient processing of assembled transcriptome data. The initial processing phase performs a global characterization of the input data, providing each transcript with several layers of annotation, comprising structural, functional, and taxonomic information. The exploratory phase enables downstream analyses from the web application. Available analyses include the assessment of gene space completeness, the functional analysis and comparison of transcript subsets, and the study of transcripts in an evolutionary context. A comparison with similar tools highlights TRAPID’s unique features. Finally, analyses performed within TRAPID 2.0 are complemented by interactive data visualizations, facilitating the extraction of new biological insights, as demonstrated with diatom community metatranscriptomes.

2021 ◽  
Author(s):  
Biswajit Bhowmick ◽  
Huaqing Chen ◽  
Jesus Lozano-Fernandez ◽  
Joel Vizueta ◽  
Rickard Ignell ◽  
...  

The poultry red mite (PRM), Dermanyssus gallinae (De Geer), and the northern fowl mite (NFM), Ornithonyssus sylviarum (Canestrini and Fanzago), are the most serious pests of poultry, both of which have an expanding global prevalence. Research on NFM has been constrained by a lack of genomic and transcriptomic data. Here, we report and analyze the first transcriptome data for this species. A total of 28,999 unigenes were assembled, of which 19,750 (68.10%) were annotated using seven functional databases. The biological function of these unigenes was predicted using the GO, KOG, and KEGG databases. To gain insight into the chemosensory receptor-based system of parasitiform mites, we furthermore assessed the gene repertoire of gustatory receptors (GRs) and ionotropic receptors (IRs), both of which encode putative ligand-gated ion channel proteins. While these receptors are well characterized in insect model species, our understanding of chemosensory detection in mites and ticks is in its infancy. To address this paucity of data, we identified 9 IR/iGluRs and 2 GRs genes by analyzing transcriptome data in the NFM, while 9 GRs and 41 IR/iGluRs genes were annotated in the PRM genome. Taken together, the transcriptomic and genomic annotation of these two species provide a valuable reference for studies of parasitiform mites, and also helps to understand how chemosensory gene family expansion/contraction events may have been reshaped by an obligate parasitic lifestyle compared with their free-living closest relatives. Future studies should include additional species to validate this observation, and functional characterization of the identified proteins as a step forward in identifying tools for controlling these poultry pests.


2019 ◽  
Vol 220 (8) ◽  
pp. 1312-1324 ◽  
Author(s):  
Sarah Mollerup ◽  
Maria Asplund ◽  
Jens Friis-Nielsen ◽  
Kristín Rós Kjartansdóttir ◽  
Helena Fridholm ◽  
...  

Abstract Background Viruses and other infectious agents cause more than 15% of human cancer cases. High-throughput sequencing-based studies of virus-cancer associations have mainly focused on cancer transcriptome data. Methods In this study, we applied a diverse selection of presequencing enrichment methods targeting all major viral groups, to characterize the viruses present in 197 samples from 18 sample types of cancerous origin. Using high-throughput sequencing, we generated 710 datasets constituting 57 billion sequencing reads. Results Detailed in silico investigation of the viral content, including exclusion of viral artefacts, from de novo assembled contigs and individual sequencing reads yielded a map of the viruses detected. Our data reveal a virome dominated by papillomaviruses, anelloviruses, herpesviruses, and parvoviruses. More than half of the included samples contained 1 or more viruses; however, no link between specific viruses and cancer types were found. Conclusions Our study sheds light on viral presence in cancers and provides highly relevant virome data for future reference.


Nature ◽  
2021 ◽  
Author(s):  
Fides Zenk ◽  
Yinxiu Zhan ◽  
Pavel Kos ◽  
Eva Löser ◽  
Nazerke Atinbayeva ◽  
...  

AbstractFundamental features of 3D genome organization are established de novo in the early embryo, including clustering of pericentromeric regions, the folding of chromosome arms and the segregation of chromosomes into active (A-) and inactive (B-) compartments. However, the molecular mechanisms that drive de novo organization remain unknown1,2. Here, by combining chromosome conformation capture (Hi-C), chromatin immunoprecipitation with high-throughput sequencing (ChIP–seq), 3D DNA fluorescence in situ hybridization (3D DNA FISH) and polymer simulations, we show that heterochromatin protein 1a (HP1a) is essential for de novo 3D genome organization during Drosophila early development. The binding of HP1a at pericentromeric heterochromatin is required to establish clustering of pericentromeric regions. Moreover, HP1a binding within chromosome arms is responsible for overall chromosome folding and has an important role in the formation of B-compartment regions. However, depletion of HP1a does not affect the A-compartment, which suggests that a different molecular mechanism segregates active chromosome regions. Our work identifies HP1a as an epigenetic regulator that is involved in establishing the global structure of the genome in the early embryo.


2021 ◽  
Vol 22 (S2) ◽  
Author(s):  
Daniele D’Agostino ◽  
Pietro Liò ◽  
Marco Aldinucci ◽  
Ivan Merelli

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.


Plants ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 753
Author(s):  
Miroslav Glasa ◽  
Richard Hančinský ◽  
Katarína Šoltys ◽  
Lukáš Predajňa ◽  
Jana Tomašechová ◽  
...  

In recent years, high throughput sequencing (HTS) has brought new possibilities to the study of the diversity and complexity of plant viromes. Mixed infection of a single plant with several viruses is frequently observed in such studies. We analyzed the virome of 10 tomato and sweet pepper samples from Slovakia, all showing the presence of potato virus Y (PVY) infection. Most datasets allow the determination of the nearly complete sequence of a single-variant PVY genome, belonging to one of the PVY recombinant strains (N-Wi, NTNa, or NTNb). However, in three to-mato samples (T1, T40, and T62) the presence of N-type and O-type sequences spanning the same genome region was documented, indicative of mixed infections involving different PVY strains variants, hampering the automated assembly of PVY genomes present in the sample. The N- and O-type in silico data were further confirmed by specific RT-PCR assays targeting UTR-P1 and NIa genomic parts. Although full genomes could not be de novo assembled directly in this situation, their deep coverage by relatively long paired reads allowed their manual re-assembly using very stringent mapping parameters. These results highlight the complexity of PVY infection of some host plants and the challenges that can be met when trying to precisely identify the PVY isolates involved in mixed infection.


Plant Disease ◽  
2020 ◽  
Author(s):  
Yeonhwa Jo ◽  
Hoseong Choi ◽  
Jin Kyong Cho ◽  
Won Kyong Cho

Cherry virus F (CVF) is a tentative member of the genus Fabavirus in the family Secoviridae, consisting of two RNA segments (Koloniuk et al. 2018). To date, CVF has been documented in only sweet cherry (Prunus avium) in the Czech Republic (Koloniuk et al. 2018), Canada, and Greece. In May 2014, we collected leaf samples from four symptomatic (leaf spots and dapple fruits) and two asymptomatic Japanese plum cultivars (Sun and Gadam) grown in an orchard in Hoengseong, South Korea, to identify viruses and viroids infecting plum trees. Total RNA from individual plum trees was extracted using two commercial kits: Fruit-mate for RNA Purification Kit (Takara, Shiga, Japan) and RNeasy Plant Mini Kit (Qiagen, Hilden, Germany). We generated six mRNA libraries from the six different plum cultivars for RNA-sequencing using the TruSeq RNA Library Preparation Kit v2 (Illumina, CA, U.S.A.) as described previously (Jo et al. 2017). The mRNA libraries were paired-end (2 X 100 bp) sequenced with a HiSeq 2000 system (Macrogen, Seoul, Korea). The raw sequence reads were de novo assembled by Trinity program v. 2.8.6, with default parameters (Haas et al. 2013). The assembled contigs were subjected to BLASTX search against the non-redundant protein database in NCBI. Of the two asymptomatic cultivars, the transcriptome of asymptomatic plum cv. Gadam contained five contigs specific to CVF. Two and three contigs were specific to CVF RNA1 (2,571 reads, coverage 42.15%) and RNA2 (2,025 reads, coverage 53.04%), respectively. The size of these five contigs ranged from 241 to 5,986 bp. Contigs of 5,986 and 3,867 bp in length, referred to as CVF isolate Gadam RNA1 (GenBank MN896996) and RNA2 (GenBank MN896995), respectively, were subjected to BLASTP search against NCBI’s non-redundant protein database. The results showed that the polyprotein sequences of RNA1 and RNA2 shared 95.3% and 93.11% amino acid identities with isolates SwC-H_1a from the Czech Republic (GenBank acc. no. AWB36326) and Stac-3B_c8 from Canada (AZZ10055), respectively. To confirm the infection of CVF in cv. Gadam, RT-PCR was conducted using CVF RNA1-specific primers designed based on the CVF reference genome sequences (MH998210 and MH998216), including 5’-CCACCAAATAGGCAAGAGGTCAC-3’ (position 3190–3212) and 5’-CACAATCACCATCAATGGTCTCTGC-3’ (position 3742–3766), and CVF RNA2-specific primers, including 5’-CTGCTTTATGATGCTAGACATCAAGATG-3’ (position 1015–1042) and 5’-ACAATAGGCATGCTCATCTCAACCTC-3’ (position 1594–1619). We amplified 577-bp RNA1-specific and 605-bp RNA2-specific amplicons that were cloned and then performed Sanger sequencing. Sequencing of the cloned amplicons for isolate Gadam RNA1 (GenBank MN896993) and RNA2 (GenBank MN896994) revealed values of 99.48% and 99.17% nucleotide identity to that of RNA1 and RNA2 determined by high-throughput sequencing, respectively. Additionally, we tested five plants for each of the six plum cultivars grown in the same orchard. The detection of CVF was carried out through PCR using the primers and protocol described above. Of the 30 trees, CVF was detected in three trees of cv. Gadam by both primer pairs. To our knowledge, this is the first report of CVF infecting Japanese plum and the first report of the virus in Korea. However, its prevalence in other Prunus species, including apricot, European plum, and peach, should be further elucidated.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Daniel Roush ◽  
Ana Giraldo-Silva ◽  
Ferran Garcia-Pichel

AbstractCyanobacteria are a widespread and important bacterial phylum, responsible for a significant portion of global carbon and nitrogen fixation. Unfortunately, reliable and accurate automated classification of cyanobacterial 16S rRNA gene sequences is muddled by conflicting systematic frameworks, inconsistent taxonomic definitions (including the phylum itself), and database errors. To address this, we introduce Cydrasil 3 (https://www.cydrasil.org), a curated 16S rRNA gene reference package, database, and web application designed to provide a full phylogenetic perspective for cyanobacterial systematics and routine identification. Cydrasil 3 contains over 1300 manually curated sequences longer than 1100 base pairs and can be used for phylogenetic placement or as a reference sequence set for de novo phylogenetic reconstructions. The web application (utilizing PaPaRA and EPA-ng) can place thousands of sequences into the reference tree and has detailed instructions on how to analyze results. While the Cydrasil web application offers no taxonomic assignments, it instead provides phylogenetic placement, as well as a searchable database with curation notes and metadata, and a mechanism for community feedback.


Genes ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 1359
Author(s):  
Esther Camacho ◽  
Sandra González-de la Fuente ◽  
Jose C. Solana ◽  
Alberto Rastrojo ◽  
Fernando Carrasco-Ramiro ◽  
...  

Leishmania major is the main causative agent of cutaneous leishmaniasis in humans. The Friedlin strain of this species (LmjF) was chosen when a multi-laboratory consortium undertook the objective of deciphering the first genome sequence for a parasite of the genus Leishmania. The objective was successfully attained in 2005, and this represented a milestone for Leishmania molecular biology studies around the world. Although the LmjF genome sequence was done following a shotgun strategy and using classical Sanger sequencing, the results were excellent, and this genome assembly served as the reference for subsequent genome assemblies in other Leishmania species. Here, we present a new assembly for the genome of this strain (named LMJFC for clarity), generated by the combination of two high throughput sequencing platforms, Illumina short-read sequencing and PacBio Single Molecular Real-Time (SMRT) sequencing, which provides long-read sequences. Apart from resolving uncertain nucleotide positions, several genomic regions were reorganized and a more precise composition of tandemly repeated gene loci was attained. Additionally, the genome annotation was improved by adding 542 genes and more accurate coding-sequences defined for around two hundred genes, based on the transcriptome delimitation also carried out in this work. As a result, we are providing gene models (including untranslated regions and introns) for 11,238 genes. Genomic information ultimately determines the biology of every organism; therefore, our understanding of molecular mechanisms will depend on the availability of precise genome sequences and accurate gene annotations. In this regard, this work is providing an improved genome sequence and updated transcriptome annotations for the reference L. major Friedlin strain.


Sign in / Sign up

Export Citation Format

Share Document