Generation of large mitochondrial and nuclear nucleotide sequences and phylogenetic analyses using high-throughput short-read datasets for endangered Placostylinae snails of the southwest Pacific

2021 ◽  
pp. 1-11
Author(s):  
Mathieu Quenu ◽  
Steven A. Trewick ◽  
Elizabeth E. Daly ◽  
Mary Morgan-Richards
2011 ◽  
Vol 140 (6) ◽  
pp. 1013-1017 ◽  
Author(s):  
S. E. MIDGLEY ◽  
C. K. HJULSAGER ◽  
L. E. LARSEN ◽  
G. FALKENHORST ◽  
B. BÖTTIGER

SUMMARYGroup A rotaviruses infect humans and a variety of animals. In July 2006 a rare rotavirus strain with G8P[14] specificity was identified in the stool samples of two adult patients with diarrheoa, who lived in the same geographical area in Denmark. Nucleotide sequences of the VP7, VP4, VP6, and NSP4 genes of the identified strains were identical. Phylogenetic analyses showed that both Danish G8P[14] strains clustered with rotaviruses of animal, mainly, bovine and caprine, origin. The high genetic relatedness to animal rotaviruses and the atypical epidemiological features suggest that these human G8P[14] strains were acquired through direct zoonotic transmission events.


PLoS ONE ◽  
2008 ◽  
Vol 3 (10) ◽  
pp. e3495 ◽  
Author(s):  
Katherine Sorber ◽  
Charles Chiu ◽  
Dale Webster ◽  
Michelle Dimon ◽  
J. Graham Ruby ◽  
...  

2008 ◽  
Vol 24 (13) ◽  
pp. i32-i40 ◽  
Author(s):  
I. Hajirasouliha ◽  
F. Hormozdiari ◽  
S. C. Sahinalp ◽  
I. Birol
Keyword(s):  

2014 ◽  
Author(s):  
Jesse D Bloom

Phylogenetic analyses of molecular data require a quantitative model for how sequences evolve. Traditionally, the details of the site-specific selection that governs sequence evolution are not knowna priori, making it challenging to create evolutionary models that adequately capture the heterogeneity of selection at different sites. However, recent advances in high-throughput experiments have made it possible to quantify the effects of all single mutations on gene function. I have previously shown that such high-throughput experiments can be combined with knowledge of underlying mutation rates to create a parameter-free evolutionary model that describes the phylogeny of influenza nucleoprotein far better than commonly used existing models. Here I extend this work by showing that published experimental data on TEM-1 beta-lactamase (Firnberg et al, 2014) can be combined with a few mutation rate parameters to create an evolutionary model that describes beta-lactamase phylogenies much than most common existing models. This experimentally informed evolutionary model is superior even for homologs that are substantially diverged (about 35% divergence at the protein level) from the TEM-1 parent that was the subject of the experimental study. These results suggest that experimental measurements can inform phylogenetic evolutionary models that are applicable to homologs that span a substantial range of sequence divergence.


Author(s):  
Oliver Schwengers ◽  
Patrick Barth ◽  
Linda Falgenhauer ◽  
Torsten Hain ◽  
Trinad Chakraborty ◽  
...  

ABSTRACTPlasmids are extrachromosomal genetic elements replicating independently of the chromosome which play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next generation sequencing methods, the amount of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of both high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included into existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS) which achieved an accuracy of 96.6%. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5%) and more balanced predictions (F1=82.6%) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced E. coli isolates. Platon is available at: platon.computational.bioData SummaryPlaton was developed as a Python 3 command line application for Linux.The complete source code and documentation is available on GitHub under a GPL3 license: https://github.com/oschwengers/platon and platon.computational.bio.All database versions are hosted at Zenodo: DOI 10.5281/zenodo.3349651.Platon is available via bioconda package platonPlaton is available via PyPI package cb-platonBacterial representative sequences for UniProt’s UniRef90 protein clusters, complete bacterial genome sequences from the NCBI RefSeq database, complete plasmid sequences from the NCBI genomes plasmid section, created artificial contigs, RDS threshold metrics and raw protein replicon hit counts used to create and evaluate the marker protein sequence database are hosted at Zenodo: DOI 10.5281/zenodo.375916924 Escherichia coli isolates sequenced with short read (Illumina MiSeq) and long read sequencing technologies (Oxford Nanopore Technology GridION platform) used for real data benchmarks are available under the following NCBI BioProjects: PRJNA505407, PRJNA387731Impact StatementPlasmids play a vital role in the spread of antibiotic resistance and pathogenicity genes. The increasing numbers of clinical outbreaks involving resistant pathogens worldwide pushed the scientific community to increase their efforts to comprehensively investigate bacterial genomes. Due to the maturation of next-generation sequencing technologies, nowadays entire bacterial genomes including plasmids are sequenced in huge scale. To analyze draft assemblies, a mandatory first step is to separate plasmid from chromosome contigs. Recently, many bioinformatic tools have emerged to tackle this issue. Unfortunately, several tools are implemented only as interactive or web-based tools disabling them for necessary high-throughput analysis of large data sets. Other tools providing such a high-throughput implementation however often come with certain drawbacks, e.g. providing taxon-specific databases only, not providing actionable, i.e. true binary classification or achieving biased classification performances towards either sensitivity or specificity.Here, we introduce the tool Platon implementing a new replicon distribution-based approach combined with higher-level contig characterizations to address the aforementioned issues. In addition to the plasmid detection within draft assemblies, Platon provides the user with valuable information on certain higher-level contig characterizations. We show that Platon provides a balanced classification performance as well as a scalable implementation for high-throughput analyses. We therefore consider Platon to be a powerful, species-independent and flexible tool to scan large amounts of bacterial whole-genome sequencing data for their plasmid content.


2021 ◽  
Vol 12 ◽  
Author(s):  
Carla Dizon Redila ◽  
Savannah Phipps ◽  
Shahideh Nouri

Wheat streak mosaic (WSM), a viral disease affecting cereals and grasses, causes substantial losses in crop yields. Wheat streak mosaic virus (WSMV) is the main causal agent of the complex, but mixed infections with Triticum mosaic virus (TriMV) and High plains wheat mosaic emaravirus (HPWMoV) were reported as well. Although resistant varieties are effective for the disease control, a WSMV resistance-breaking isolate and several potential resistance-breaking isolates have been reported, suggesting that viral populations are genetically diverse. Previous phylogenetic studies of WSMV were conducted by focusing only on the virus coat protein (CP) sequence, while there is no such study for either TriMV or HPWMoV. Here, we studied the genetic variation and evolutionary mechanisms of natural populations of WSM-associated viruses mainly in Kansas fields and fields in some other parts of the Great Plains using high-throughput RNA sequencing. In total, 28 historic and field samples were used for total RNA sequencing to obtain full genome sequences of WSM-associated viruses. Field survey results showed WSMV as the predominant virus followed by mixed infections of WSMV + TriMV. Phylogenetic analyses of the full genome sequences demonstrated that WSMV Kansas isolates are widely distributed in sub-clades. In contrast, phylogenetic analyses for TriMV isolates showed no significant diversity. Recombination was identified as the major evolutionary force of WSMV and TriMV variation in KS fields, and positive selection was detected in some encoding genomic regions in the genome of both viruses. Furthermore, the full genome sequence of a second Kansas HPWMoV isolate was reported. Here, we also identified previously unknown WSMV isolates in the Great Plains sharing clades and high nucleotide sequence similarities with Central Europe isolates. The findings of this study will provide more insights into the genetic structure of WSM-associated viruses and, in turn, help in improving strategies for disease management.


2017 ◽  
Author(s):  
Hajime Suzuki ◽  
Masahiro Kasahara

AbstractMotivationPairwise alignment of nucleotide sequences has previously been carried out using the seed- and-extend strategy, where we enumerate seeds (shared patterns) between sequences and then extend the seeds by Smith-Waterman-like semi-global dynamic programming to obtain full pairwise alignments. With the advent of massively parallel short read sequencers, algorithms and data structures for efficiently finding seeds have been extensively explored. However, recent advances in single-molecule sequencing technologies have enabled us to obtain millions of reads, each of which is orders of magnitude longer than those output by the short-read sequencers, demanding a faster algorithm for the extension step that accounts for most of the computation time required for pairwise local alignment. Our goal is to design a faster extension algorithm suitable for single-molecule sequencers with high sequencing error rates (e.g., 10-15%) and with more frequent insertions and deletions than substitutions.ResultsWe propose an adaptive banded dynamic programming algorithm for calculating pairwise semi-global alignment of nucleotide sequences that allows a relatively high insertion or deletion rate while keeping band width relatively low (e.g., 32 or 64 cells) regardless of sequence lengths. Our new algorithm eliminated mutual dependences between elements in a vector, allowing an efficient Single-Instruction-Multiple-Data parallelization. We experimentally demonstrate that our algorithm runs approximately 5× faster than the extension alignment algorithm in NCBI BLAST+ while retaining similar sensitivity (recall).We also show that our extension algorithm is more sensitive than the extension alignment routine in DALIGNER, while the computation time is comparable.AvailabilityThe implementation of the algorithm and the benchmarking scripts are available at https://github.com/ocxtal/[email protected]


Sign in / Sign up

Export Citation Format

Share Document