Generation of large mitochondrial and nuclear nucleotide sequences and phylogenetic analyses using high-throughput short-read datasets for endangered Placostylinae snails of the southwest Pacific

Phylogenetic analyses of nucleotide sequences confirm a unique plant intercontinental disjunction between tropical Africa, the Caribbean, and the Hawaiian Islands

Journal of Plant Research ◽

10.1007/s10265-009-0258-0 ◽

2009 ◽

Vol 123 (1) ◽

pp. 57-65 ◽

Cited By ~ 13

Author(s):

Sandra Namoff ◽

Quentin Luke ◽

Francisco Jiménez ◽

Alberto Veloz ◽

Carl E. Lewis ◽

...

Keyword(s):

Phylogenetic Analyses ◽

Hawaiian Islands ◽

Nucleotide Sequences ◽

Tropical Africa ◽

The Caribbean

Get full-text (via PubEx)

The long march: a sample preparation technique that enhances contig length and coverage by high-throughput short-read sequencing

SciVee ◽

10.4016/10180.01 ◽

2009 ◽

Keyword(s):

Sample Preparation ◽

High Throughput ◽

Preparation Technique ◽

Short Read ◽

Contig Length ◽

Short Read Sequencing ◽

Sample Preparation Technique

Get full-text (via PubEx)

Suspected zoonotic transmission of rotavirus group A in Danish adults

Epidemiology and Infection ◽

10.1017/s0950268811001981 ◽

2011 ◽

Vol 140 (6) ◽

pp. 1013-1017 ◽

Cited By ~ 21

Author(s):

S. E. MIDGLEY ◽

C. K. HJULSAGER ◽

L. E. LARSEN ◽

G. FALKENHORST ◽

B. BÖTTIGER

Keyword(s):

Genetic Relatedness ◽

Phylogenetic Analyses ◽

Geographical Area ◽

Nucleotide Sequences ◽

Zoonotic Transmission ◽

Adult Patients ◽

Epidemiological Features ◽

Group A Rotaviruses ◽

Group A ◽

Stool Samples

SUMMARYGroup A rotaviruses infect humans and a variety of animals. In July 2006 a rare rotavirus strain with G8P[14] specificity was identified in the stool samples of two adult patients with diarrheoa, who lived in the same geographical area in Denmark. Nucleotide sequences of the VP7, VP4, VP6, and NSP4 genes of the identified strains were identical. Phylogenetic analyses showed that both Danish G8P[14] strains clustered with rotaviruses of animal, mainly, bovine and caprine, origin. The high genetic relatedness to animal rotaviruses and the atypical epidemiological features suggest that these human G8P[14] strains were acquired through direct zoonotic transmission events.

Get full-text (via PubEx)

The Long March: A Sample Preparation Technique that Enhances Contig Length and Coverage by High-Throughput Short-Read Sequencing

PLoS ONE ◽

10.1371/journal.pone.0003495 ◽

2008 ◽

Vol 3 (10) ◽

pp. e3495 ◽

Cited By ~ 22

Author(s):

Katherine Sorber ◽

Charles Chiu ◽

Dale Webster ◽

Michelle Dimon ◽

J. Graham Ruby ◽

...

Keyword(s):

Sample Preparation ◽

High Throughput ◽

Preparation Technique ◽

Short Read ◽

Contig Length ◽

Short Read Sequencing ◽

Sample Preparation Technique

Get full-text (via PubEx)

BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing

Genome Research ◽

10.1101/gr.095299.109 ◽

2009 ◽

Vol 19 (10) ◽

pp. 1884-1895 ◽

Cited By ~ 58

Author(s):

W.-C. Kao ◽

K. Stevens ◽

Y. S. Song

Keyword(s):

High Throughput ◽

Short Read ◽

Short Read Sequencing ◽

Base Calling ◽

Model Based ◽

Calling Algorithm

Get full-text (via PubEx)

Optimal pooling for genome re-sequencing with ultra-high-throughput short-read technologies

Bioinformatics ◽

10.1093/bioinformatics/btn173 ◽

2008 ◽

Vol 24 (13) ◽

pp. i32-i40 ◽

Cited By ~ 10

Author(s):

I. Hajirasouliha ◽

F. Hormozdiari ◽

S. C. Sahinalp ◽

I. Birol

Keyword(s):

High Throughput ◽

Short Read

Get full-text (via PubEx)

An experimentally informed evolutionary model improves phylogenetic fit to divergent lactamase homologs

10.1101/003848 ◽

2014 ◽

Author(s):

Jesse D Bloom

Keyword(s):

High Throughput ◽

Phylogenetic Analyses ◽

Sequence Divergence ◽

Evolutionary Model ◽

Molecular Data ◽

Quantitative Model ◽

Sequence Evolution ◽

Beta Lactamase ◽

Evolutionary Models ◽

High Throughput Experiments

Phylogenetic analyses of molecular data require a quantitative model for how sequences evolve. Traditionally, the details of the site-specific selection that governs sequence evolution are not knowna priori, making it challenging to create evolutionary models that adequately capture the heterogeneity of selection at different sites. However, recent advances in high-throughput experiments have made it possible to quantify the effects of all single mutations on gene function. I have previously shown that such high-throughput experiments can be combined with knowledge of underlying mutation rates to create a parameter-free evolutionary model that describes the phylogeny of influenza nucleoprotein far better than commonly used existing models. Here I extend this work by showing that published experimental data on TEM-1 beta-lactamase (Firnberg et al, 2014) can be combined with a few mutation rate parameters to create an evolutionary model that describes beta-lactamase phylogenies much than most common existing models. This experimentally informed evolutionary model is superior even for homologs that are substantially diverged (about 35% divergence at the protein level) from the TEM-1 parent that was the subject of the experimental study. These results suggest that experimental measurements can inform phylogenetic evolutionary models that are applicable to homologs that span a substantial range of sequence divergence.

Get full-text (via PubEx)

Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein-sequence-based replicon distribution scores

10.1101/2020.04.21.053082 ◽

2020 ◽

Cited By ~ 2

Author(s):

Oliver Schwengers ◽

Patrick Barth ◽

Linda Falgenhauer ◽

Torsten Hain ◽

Trinad Chakraborty ◽

...

Keyword(s):

High Throughput ◽

Protein Sequence ◽

Scientific Community ◽

Vital Role ◽

Bacterial Genomes ◽

Short Read ◽

Link Type ◽

Sequencing Technologies ◽

Generation Sequencing

ABSTRACTPlasmids are extrachromosomal genetic elements replicating independently of the chromosome which play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next generation sequencing methods, the amount of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of both high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included into existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS) which achieved an accuracy of 96.6%. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5%) and more balanced predictions (F1=82.6%) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced E. coli isolates. Platon is available at: platon.computational.bioData SummaryPlaton was developed as a Python 3 command line application for Linux.The complete source code and documentation is available on GitHub under a GPL3 license: https://github.com/oschwengers/platon and platon.computational.bio.All database versions are hosted at Zenodo: DOI 10.5281/zenodo.3349651.Platon is available via bioconda package platonPlaton is available via PyPI package cb-platonBacterial representative sequences for UniProt’s UniRef90 protein clusters, complete bacterial genome sequences from the NCBI RefSeq database, complete plasmid sequences from the NCBI genomes plasmid section, created artificial contigs, RDS threshold metrics and raw protein replicon hit counts used to create and evaluate the marker protein sequence database are hosted at Zenodo: DOI 10.5281/zenodo.375916924 Escherichia coli isolates sequenced with short read (Illumina MiSeq) and long read sequencing technologies (Oxford Nanopore Technology GridION platform) used for real data benchmarks are available under the following NCBI BioProjects: PRJNA505407, PRJNA387731Impact StatementPlasmids play a vital role in the spread of antibiotic resistance and pathogenicity genes. The increasing numbers of clinical outbreaks involving resistant pathogens worldwide pushed the scientific community to increase their efforts to comprehensively investigate bacterial genomes. Due to the maturation of next-generation sequencing technologies, nowadays entire bacterial genomes including plasmids are sequenced in huge scale. To analyze draft assemblies, a mandatory first step is to separate plasmid from chromosome contigs. Recently, many bioinformatic tools have emerged to tackle this issue. Unfortunately, several tools are implemented only as interactive or web-based tools disabling them for necessary high-throughput analysis of large data sets. Other tools providing such a high-throughput implementation however often come with certain drawbacks, e.g. providing taxon-specific databases only, not providing actionable, i.e. true binary classification or achieving biased classification performances towards either sensitivity or specificity.Here, we introduce the tool Platon implementing a new replicon distribution-based approach combined with higher-level contig characterizations to address the aforementioned issues. In addition to the plasmid detection within draft assemblies, Platon provides the user with valuable information on certain higher-level contig characterizations. We show that Platon provides a balanced classification performance as well as a scalable implementation for high-throughput analyses. We therefore consider Platon to be a powerful, species-independent and flexible tool to scan large amounts of bacterial whole-genome sequencing data for their plasmid content.

Get full-text (via PubEx)

Full Genome Evolutionary Studies of Wheat Streak Mosaic-Associated Viruses Using High-Throughput Sequencing

Frontiers in Microbiology ◽

10.3389/fmicb.2021.699078 ◽

2021 ◽

Vol 12 ◽

Author(s):

Carla Dizon Redila ◽

Savannah Phipps ◽

Shahideh Nouri

Keyword(s):

Mosaic Virus ◽

Rna Sequencing ◽

Great Plains ◽

High Throughput ◽

Phylogenetic Analyses ◽

Full Genome Sequence ◽

Mixed Infections ◽

Full Genome ◽

Genome Sequences ◽

Resistance Breaking

Wheat streak mosaic (WSM), a viral disease affecting cereals and grasses, causes substantial losses in crop yields. Wheat streak mosaic virus (WSMV) is the main causal agent of the complex, but mixed infections with Triticum mosaic virus (TriMV) and High plains wheat mosaic emaravirus (HPWMoV) were reported as well. Although resistant varieties are effective for the disease control, a WSMV resistance-breaking isolate and several potential resistance-breaking isolates have been reported, suggesting that viral populations are genetically diverse. Previous phylogenetic studies of WSMV were conducted by focusing only on the virus coat protein (CP) sequence, while there is no such study for either TriMV or HPWMoV. Here, we studied the genetic variation and evolutionary mechanisms of natural populations of WSM-associated viruses mainly in Kansas fields and fields in some other parts of the Great Plains using high-throughput RNA sequencing. In total, 28 historic and field samples were used for total RNA sequencing to obtain full genome sequences of WSM-associated viruses. Field survey results showed WSMV as the predominant virus followed by mixed infections of WSMV + TriMV. Phylogenetic analyses of the full genome sequences demonstrated that WSMV Kansas isolates are widely distributed in sub-clades. In contrast, phylogenetic analyses for TriMV isolates showed no significant diversity. Recombination was identified as the major evolutionary force of WSMV and TriMV variation in KS fields, and positive selection was detected in some encoding genomic regions in the genome of both viruses. Furthermore, the full genome sequence of a second Kansas HPWMoV isolate was reported. Here, we also identified previously unknown WSMV isolates in the Great Plains sharing clades and high nucleotide sequence similarities with Central Europe isolates. The findings of this study will provide more insights into the genetic structure of WSM-associated viruses and, in turn, help in improving strategies for disease management.

Get full-text (via PubEx)

Acceleration of Nucleotide Semi-Global Alignment with Adaptive Banded Dynamic Programming

10.1101/130633 ◽

2017 ◽

Cited By ~ 9

Author(s):

Hajime Suzuki ◽

Masahiro Kasahara

Keyword(s):

Dynamic Programming ◽

Single Molecule ◽

Computation Time ◽

Error Rates ◽

Nucleotide Sequences ◽

Sequencing Error ◽

Local Alignment ◽

Global Alignment ◽

Alignment Algorithm ◽

Short Read

AbstractMotivationPairwise alignment of nucleotide sequences has previously been carried out using the seed- and-extend strategy, where we enumerate seeds (shared patterns) between sequences and then extend the seeds by Smith-Waterman-like semi-global dynamic programming to obtain full pairwise alignments. With the advent of massively parallel short read sequencers, algorithms and data structures for efficiently finding seeds have been extensively explored. However, recent advances in single-molecule sequencing technologies have enabled us to obtain millions of reads, each of which is orders of magnitude longer than those output by the short-read sequencers, demanding a faster algorithm for the extension step that accounts for most of the computation time required for pairwise local alignment. Our goal is to design a faster extension algorithm suitable for single-molecule sequencers with high sequencing error rates (e.g., 10-15%) and with more frequent insertions and deletions than substitutions.ResultsWe propose an adaptive banded dynamic programming algorithm for calculating pairwise semi-global alignment of nucleotide sequences that allows a relatively high insertion or deletion rate while keeping band width relatively low (e.g., 32 or 64 cells) regardless of sequence lengths. Our new algorithm eliminated mutual dependences between elements in a vector, allowing an efficient Single-Instruction-Multiple-Data parallelization. We experimentally demonstrate that our algorithm runs approximately 5× faster than the extension alignment algorithm in NCBI BLAST+ while retaining similar sensitivity (recall).We also show that our extension algorithm is more sensitive than the extension alignment routine in DALIGNER, while the computation time is comparable.AvailabilityThe implementation of the algorithm and the benchmarking scripts are available at https://github.com/ocxtal/[email protected]

Get full-text (via PubEx)