scholarly journals Analysis of the Genomic Basis of Functional Diversity in Dinoflagellates using a Transcriptome-Based Sequence Similarity Network

2017 ◽  
Author(s):  
Arnaud Meng ◽  
Erwan Corre ◽  
Ian Probert ◽  
Andres Gutierrez-Rodriguez ◽  
Raffaele Siano ◽  
...  

ABSTRACTDinoflagellates are one of the most abundant and functionally diverse groups of eukaryotes. Despite an overall scarcity of genomic information for dinoflagellates, constantly emerging high-throughput sequencing resources can be used to characterize and compare these organisms. We assembled de novo and processed 46 dinoflagellate transcriptomes and used a sequence similarity network (SSN) to compare the underlying genomic basis of functional features within the group. This approach constitutes the most comprehensive picture to date of the genomic potential of dinoflagellates. A core proteome composed of 252 connected components (CCs) of putative conserved protein domains (pCDs) was identified. Of these, 206 were novel and 16 lacked any functional annotation in public databases. Integration of functional information in our network analyses allowed investigation of pCDs specifically associated to functional traits. With respect to toxicity, sequences homologous to those of proteins involved in toxin biosynthesis pathways (e.g. sxtA1-4 and sxtG) were not specific to known toxin-producing species. Although not fully specific to symbiosis, the most represented functions associated with proteins involved in the symbiotic trait were related to membrane processes and ion transport. Overall, our SSN approach led to identification of 45,207 and 90,794 specific and constitutive pCDs of respectively the toxic and symbiotic species represented in our analyses. Of these, 56% and 57% respectively (i.e. 25,393 and 52,193 pCDs) completely lacked annotation in public databases. This stresses the extent of our lack of knowledge, while emphasizing the potential of SSNs to identify candidate pCDs for further functional genomic characterization.

2018 ◽  
Author(s):  
James M Wainaina ◽  
Elijah Ateka ◽  
Timothy Makori ◽  
Monica A Kehoe ◽  
Laura M Boykin

Background: Endornaviruses are non-pathogenic viruses infecting multiple agricultural important crops including legumes, with global distribution. However, there is an absence on the complete genome of endornaviruses from legumes in particular with the sub-Saharan region. In this study, we report the first complete genomes of PvEV1 and PvEV2, and the evolutionary relationship of these genomes. Methods: Viral symptomatic common beans (Phaseolus vulgaris) showing Bean common mosaic necrosis virus (BCMNV) symptoms from Vihiga county, in the western highlands of Kenya were collected during field survey’s in the region. High throughput sequencing (RNA-Seq) was carried out on total RNA isolated from symptomatic leaf samples. Subsequently, de novo assembly and reference mapping was carried out to obtain the complete genomes of PvEV-1 and PvEV-2. Results: We identified the complete genome of Phaseolus vulgaris endornavirus 1 and 2 (PvEV-1 and PvEV-2) from sub-Saharan Africa (SSA). The average genome size of PvEV-1 was ~13,890 nucleotides (nt) while PvEV-2 was ~14,698 nt, encoding a single open reading frame (ORF). Single ORFs ranged from 4,632 to 4,633 aa in PvEV-1 and from 4,899 – to 4,954 aa in PvEV-2. Both ORFs encoded for the RNA-dependent RNA polymerase (RdRP) gene. The percentage sequence similarity between PvEV-1, PvEV-2 from this study GenBanks sequences was 29 % to 99 %. Bayesian phylogenetic analysis resolved in two well-supported monophyletic clades, with isolates from this study clustering with those from Brazil sequences. Discussion: This study provides the first insights into the evolutionary relationships of PvEV from SSA diverse and contributes towards filling the current knowledge gaps on endornaviruses


2018 ◽  
Author(s):  
James M Wainaina ◽  
Elijah Ateka ◽  
Timothy Makori ◽  
Monica A Kehoe ◽  
Laura M Boykin

Background: Endornaviruses are non-pathogenic viruses infecting multiple agricultural important crops including legumes, with global distribution. However, there is an absence on the complete genome of endornaviruses from legumes in particular with the sub-Saharan region. In this study, we report the first complete genomes of PvEV1 and PvEV2, and the evolutionary relationship of these genomes. Methods: Viral symptomatic common beans (Phaseolus vulgaris) showing Bean common mosaic necrosis virus (BCMNV) symptoms from Vihiga county, in the western highlands of Kenya were collected during field survey’s in the region. High throughput sequencing (RNA-Seq) was carried out on total RNA isolated from symptomatic leaf samples. Subsequently, de novo assembly and reference mapping was carried out to obtain the complete genomes of PvEV-1 and PvEV-2. Results: We identified the complete genome of Phaseolus vulgaris endornavirus 1 and 2 (PvEV-1 and PvEV-2) from sub-Saharan Africa (SSA). The average genome size of PvEV-1 was ~13,890 nucleotides (nt) while PvEV-2 was ~14,698 nt, encoding a single open reading frame (ORF). Single ORFs ranged from 4,632 to 4,633 aa in PvEV-1 and from 4,899 – to 4,954 aa in PvEV-2. Both ORFs encoded for the RNA-dependent RNA polymerase (RdRP) gene. The percentage sequence similarity between PvEV-1, PvEV-2 from this study GenBanks sequences was 29 % to 99 %. Bayesian phylogenetic analysis resolved in two well-supported monophyletic clades, with isolates from this study clustering with those from Brazil sequences. Discussion: This study provides the first insights into the evolutionary relationships of PvEV from SSA diverse and contributes towards filling the current knowledge gaps on endornaviruses


2015 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing (HTS) studies is a central task in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on novel diversity discovery, we clustered an environmental marine protist HTS dataset of protist reads together with accessions from the taxonomically curated PR2 reference database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as novel by USEARCH and Swarm were actually highly similar to reference accessions. Using graph theory we found additional novel diversity within OTUs that would have gone unnoticed without further using their underlying network topologies. Our results suggest that novel diversity inferred from clustering approaches requires further validation, whereas graph theory provides a powerful tool for microbial ecology and the analyses of environmental HTS datasets.


2018 ◽  
Vol 27 (10) ◽  
pp. 2365-2380 ◽  
Author(s):  
Arnaud Meng ◽  
Erwan Corre ◽  
Ian Probert ◽  
Andres Gutierrez-Rodriguez ◽  
Raffaele Siano ◽  
...  

2015 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing (HTS) studies is a central task in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on novel diversity discovery, we clustered an environmental marine protist HTS dataset of protist reads together with accessions from the taxonomically curated PR2 reference database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as novel by USEARCH and Swarm were actually highly similar to reference accessions. Using graph theory we found additional novel diversity within OTUs that would have gone unnoticed without further using their underlying network topologies. Our results suggest that novel diversity inferred from clustering approaches requires further validation, whereas graph theory provides a powerful tool for microbial ecology and the analyses of environmental HTS datasets.


2019 ◽  
Author(s):  
Valerie Le Sage ◽  
Jack P. Kanarek ◽  
Eric Nturibi ◽  
Adalena V. Nanni ◽  
Dan J. Snyder ◽  
...  

AbstractThe genome of Influenza A viruses consists of eight negative-sense RNA segments that are bound by viral nucleoprotein (NP). We recently showed that NP binding is not uniform along the segments but exhibits regions of enrichment as well as depletion. Furthermore, genome-wide NP binding profiles are distinct even in strains with high sequence similarity, such as the two H1N1 strains A/WSN/1933 and A/California/07/2009. Here, we performed interstrain segment swapping experiments with segments of either high or low congruency in NP binding, which suggested that a segment with a similar overall NP binding profile preserved replication fitness of the resulting virus. Further sub-segmental swapping experiments demonstrated that NP binding is affected by changes to the underlying nucleotide sequence, as NP peaks can either become lost or appear de novo at mutated regions. Unexpectedly, these local nucleotide changes in one segment not only affect NP binding in cis, but also impact the genome-wide NP binding profile on other segments in a vRNA sequence-independent manner, suggesting that primary sequence alone is not the sole determinant for NP association to vRNA. Moreover, we observed that sub-segmental mutations that affect NP binding profiles can result in reduced replication fitness, which is caused by defects in vRNA packaging efficiency and an increase in semi-infectious particle production. Taken together, our results indicate that the pattern of NP binding to vRNA is important for efficient virus replication.Author SummaryEach viral RNA (vRNA) segment is bound by the polymerase complex at the 5′ and 3′ ends, while the remainder of the vRNA is coated non-uniformly and non-randomly by nucleoprotein (NP). To explore the constraints of NP binding to vRNA, we used high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP) of mutant H1N1 strains with exchanged vRNA sequences and observed that NP binding can be changed based on vRNA sequence. The most striking observation of our study is that nucleotide changes in one segment can have genome-wide effects on the NP binding profile of other segments. We refer to this phenomenon as the ‘butterfly effect’ of influenza packaging. Our results provide an important context in which to consider future studies regarding influenza packaging and assembly.


2018 ◽  
Author(s):  
Ping-Han Hsieh ◽  
Yen-Jen Oyang ◽  
Chien-Yu Chen

AbstractBackgroundCorrect quantification of transcript expression is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For such projects, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of erroneous contigs produced by the assemblers might result in unreliable estimation on the abundance of transcripts. In this regard, this study comprehensively investigates how assembly quality affects the performance of quantification for RNA-Seq analysis based on de novo transcriptome assembly.ResultsSeveral important factors that might seriously affect the accuracy of the RNA-Seq analysis were thoroughly discussed. First, we found that the assemblers perform comparatively well for the transcriptomes with lower biological complexity. Second, we examined the over-extended and incomplete contigs, and then demonstrated that assembly completeness has a strong impact on the estimation of contig abundance. Lastly, we investigated the behavior of the quantifiers with respect to sequence ambiguity which might be originally present in the transcriptome or accidentally produced by assemblers. The results suggest that the quantifiers often over-estimate the expression of family-collapse contigs and under-estimate the expression of duplicated contigs. For organisms without reference transcriptome, it remained challenging to detect the inaccurate abundance estimation on family-collapse contigs. On the contrary, we observed that the situation of under-estimation on duplicated contigs can be warned through analyzing the read distribution of the duplicated contigs.ConclusionsIn summary, we explicated the behavior of quantifiers when erroneous contigs are present and we outlined the potential problems that the assemblers might cause for the downstream analysis of RNA-Seq. We anticipate the analytic results conducted in this study provides valuable insights for future development of transcriptome assembly and quantification.Availabilitywe proposed an open-source Python based package QuantEval that builds connected components for the assembled contigs based on sequence similarity and evaluates the quantification results for each connected component. The package can be downloaded from https://github.com/dn070017/QuantEval.


2021 ◽  
Vol 4 ◽  
Author(s):  
Kálmán Tapolczai ◽  
François Keck ◽  
Valentin Vasselon ◽  
Géza Selmeczy ◽  
Maria Kahlert ◽  
...  

Diatom biomonitoring and ecological studies can greatly benefit from DNA metabarcoding compared to conventional microscopical analysis by potentially providing more reliable and accurate data in a cost- and time-efficient way. A conventional strategy for the bioinformatic treatment of sequencing data involves the clustering of quality filtered sequences into Operational Taxonomic Units (OTUs) based on a global sequence similarity, and their assignment to taxonomy using a reference library. Then, the obtained species lists of the successfully assigned taxa are used for subsequent analyses or quality index calculation. However, the high diversity of bioinformatic methods and parameters make inter-studies comparison difficult, especially because OTUs are specific to a given study. Clustering sequences into OTUs aims to reduce the biasing effect of sequencing artefacts and to reach an approximate species level delimitation at the price of potentially grouping together sequences with different ecology. A similar bias occurs when sequences that differ from each other by their ecological preference are assigned to the same taxa. The incompleteness of reference libraries can further introduce a bias by not taking into account unassigned sequences, thus losing the ecological information they possess. In order to overcome these biases, our studies tested new approaches on de novo developed diatom indices based on periphytic samples collected from streams in France and Hungary. Index development was performed with the leave-one-out cross validation (LOOCV) technique by building a model on a training dataset containing n-1 samples and testing it on the remaining test sample. Test values were correlated with a reference environmental gradient. The model was based on the calculation of optimum and tolerance of taxonomic units along the reference gradient and a modified Zelinka-Marvan diatom index equation. Taxonomic units tested in the studies were morphospecies, OTUs (95% similarity threshold), Individual Sequence Units (ISUs, via minimal bioinformatic quality filtering) and Exact Sequence Variants (ESVs, via DADA2 denoising algorithm). The “clustering-free” approach (ISU- and ESV-based indices) performed better than the OTU-based one, providing a fine taxonomic resolution where the ecological difference on genetically close sequence variants could be detected. Thus, these indices are more adapted to a standardized and comparable routine bioassessment. The “taxonomy-free” approach revealed the ecological preferences for those molecular taxonomic units (ISUs/ESVs) that otherwise either (i) would have been assigned to the same taxa due to genetic similarity, or (ii) would not have been recognized because of their absence from the reference libraries. However, we also found that taxonomic information cannot be neglected in ecological studies when the presence of organisms under particular environmental conditions is to be explained or interpreted e.g. via the traits they possess. New types of clustering methods are welcome in the future of biomonitoring where the delimitation of taxonomic units should be refined based on a higher emphasis on their ecology rather than on morphological or genetical criteria.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1692 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing studies is an important aspect in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on the discovery of novel diversity, we clustered an environmental marine high-throughput sequencing dataset of protist amplicons together with reference sequences from the taxonomically curated Protist Ribosomal Reference (PR2) database using threede novoapproaches: sequence similarity networks, USEARCH, and Swarm. The potentially novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and in the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as potentially novel by USEARCH and Swarm were more than 97% similar to references of PR2. Using shortest path analyses on sequence similarity network OTUs and Swarm OTUs we found additional novel diversity within OTUs that would have gone unnoticed without further exploiting their underlying network topologies. These results demonstrate that graph theory provides powerful tools for microbial ecology and the analysis of environmental high-throughput sequencing datasets. Furthermore, sequence similarity networks were most accurate in delineating novel diversity from previously discovered diversity.


Nature ◽  
2021 ◽  
Author(s):  
Fides Zenk ◽  
Yinxiu Zhan ◽  
Pavel Kos ◽  
Eva Löser ◽  
Nazerke Atinbayeva ◽  
...  

AbstractFundamental features of 3D genome organization are established de novo in the early embryo, including clustering of pericentromeric regions, the folding of chromosome arms and the segregation of chromosomes into active (A-) and inactive (B-) compartments. However, the molecular mechanisms that drive de novo organization remain unknown1,2. Here, by combining chromosome conformation capture (Hi-C), chromatin immunoprecipitation with high-throughput sequencing (ChIP–seq), 3D DNA fluorescence in situ hybridization (3D DNA FISH) and polymer simulations, we show that heterochromatin protein 1a (HP1a) is essential for de novo 3D genome organization during Drosophila early development. The binding of HP1a at pericentromeric heterochromatin is required to establish clustering of pericentromeric regions. Moreover, HP1a binding within chromosome arms is responsible for overall chromosome folding and has an important role in the formation of B-compartment regions. However, depletion of HP1a does not affect the A-compartment, which suggests that a different molecular mechanism segregates active chromosome regions. Our work identifies HP1a as an epigenetic regulator that is involved in establishing the global structure of the genome in the early embryo.


Sign in / Sign up

Export Citation Format

Share Document