scholarly journals Genome sequence of the banana aphid, Pentalonia nigronervosa Coquerel (Hemiptera: Aphididae) and its symbionts

Author(s):  
Thomas C. Mathers ◽  
Sam T. Mugford ◽  
Saskia A. Hogenhout ◽  
Leena Tripathi

AbstractThe banana aphid, Pentalonia nigronervosa Coquerel (Hemiptera: Aphididae), is a major pest of cultivated bananas (Musa spp., order Zingiberales), primarily due to its role as a vector of Banana bunchy top virus (BBTV), the most severe viral disease of banana worldwide. Here, we generated a highly complete genome assembly of P. nigronervosa using a single PCR-free Illumina sequencing library. Using the same sequence data, we also generated complete genome assemblies of the P. nigronervosa symbiotic bacteria Buchnera aphidicola and Wolbachia. To improve our initial assembly of P. nigronervos a we developed a k-mer based deduplication pipeline to remove genomic scaffolds derived from the assembly of haplotigs (allelic variants assembled as separate scaffolds). To demonstrate the usefulness of this pipeline, we applied it to the recently generated assembly of the aphid Myzus cerasi, reducing the duplication of conserved BUSCO genes by 25%. Phylogenomic analysis of P. nigronervos a, our improved M. cerasi assembly, and seven previously published aphid genomes, spanning three aphid tribes and two subfamilies, reveals that P. nigronervos a falls within the tribe Macrosiphini, but is an outgroup to other Macrosiphini sequenced so far. As such, the genomic resources reported here will be useful for understanding both the evolution of Macrosphini and for the study of P. nigronervosa. Furthermore, our approach using low cost, high-quality, Illumina short-reads to generate complete genome assemblies of understudied aphid species will help to fill in genomic black spots in the diverse aphid tree of life.

2020 ◽  
Vol 10 (12) ◽  
pp. 4315-4321
Author(s):  
Thomas C. Mathers ◽  
Sam T. Mugford ◽  
Saskia A. Hogenhout ◽  
Leena Tripathi

The banana aphid, Pentalonia nigronervosa Coquerel (Hemiptera: Aphididae), is a major pest of cultivated bananas (Musa spp., order Zingiberales), primarily due to its role as a vector of Banana bunchy top virus (BBTV), the most severe viral disease of banana worldwide. Here, we generated a highly complete genome assembly of P. nigronervosa using a single PCR-free Illumina sequencing library. Using the same sequence data, we also generated complete genome assemblies of the P. nigronervosa symbiotic bacteria Buchnera aphidicola and Wolbachia. To improve our initial assembly of P. nigronervosa we developed a k-mer based deduplication pipeline to remove genomic scaffolds derived from the assembly of haplotigs (allelic variants assembled as separate scaffolds). To demonstrate the usefulness of this pipeline, we applied it to the recently generated assembly of the aphid Myzus cerasi, reducing the duplication of conserved BUSCO genes by 25%. Phylogenomic analysis of P. nigronervosa, our improved M. cerasi assembly, and seven previously published aphid genomes, spanning three aphid tribes and two subfamilies, reveals that P. nigronervosa falls within the tribe Macrosiphini, but is an outgroup to other Macrosiphini sequenced so far. As such, the genomic resources reported here will be useful for understanding both the evolution of Macrosphini and for the study of P. nigronervosa. Furthermore, our approach using low cost, high-quality, Illumina short-reads to generate complete genome assemblies of understudied aphid species will help to fill in genomic black spots in the diverse aphid tree of life.


2020 ◽  
Vol 110 (11) ◽  
pp. 1759-1762
Author(s):  
Michael L. O’Leary ◽  
Lindsey P. Burbank ◽  
Rodrigo Krugner ◽  
Drake C. Stenger

Xylella fastidiosa is a xylem-limited bacterial plant pathogen that causes disease on numerous hosts. Additionally, X. fastidiosa asymptomatically colonizes a wide range of plant species. X. fastidiosa subsp. multiplex has been detected in olive (Olea europaea) trees grown in California, U.S.A., as well as in Europe. Strains of X. fastidiosa subsp. multiplex isolated from California olive trees are not known to cause disease on olive, although some can induce leaf-scorch symptoms on almond (Prunus dulcis). No genome assemblies currently exist for olive-associated X. fastidiosa subsp. multiplex strains; therefore, a hybrid assembly method was used to generate complete genome sequences for three X. fastidiosa subsp. multiplex strains (Fillmore, LM10, and RH1) isolated from olive trees grown in Ventura and Los Angeles counties of California.


Author(s):  
Minakshi Prasad ◽  
Koushlesh Ranjan ◽  
Gaya Prasad

Bluetongue disease (BT) is an infectious but non-contagious viral disease of wild and domestic ruminants. The complete genome of BTV isolate K31-08/ABT/HSR was sequenced using Ion-Torrent PGM system. The sequence data were denovo assembled and contig sequences were prepared with reference to known sequences from GenBank. The segment 10 based analysis segregates BTV in five distinct topotypes. The segment 10 of K31-08/ABT/HSR isolate showed maximum identity of >99/99%, nucleotide/amino acid with BTV 16 isolates from India and placed under eastern topotype viruses from India and several other countries. The clustering of BTV isolates from different geographical regions into same group indicated the spatial spread of the segment 10 through introduction of new genes via trade or illegal live vaccine or reassortment. It also indicates the common origin of segment 10 irrespective of BTV serotype. The effect of reassortment and genetic drift on BTV can be predicted using complete genome sequencing technique.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12129
Author(s):  
Paul E. Oluniyi ◽  
Fehintola Ajogbasile ◽  
Judith Oguzie ◽  
Jessica Uwanibe ◽  
Adeyemi Kayode ◽  
...  

Next generation sequencing (NGS)-based studies have vastly increased our understanding of viral diversity. Viral sequence data obtained from NGS experiments are a rich source of information, these data can be used to study their epidemiology, evolution, transmission patterns, and can also inform drug and vaccine design. Viral genomes, however, represent a great challenge to bioinformatics due to their high mutation rate and forming quasispecies in the same infected host, bringing about the need to implement advanced bioinformatics tools to assemble consensus genomes well-representative of the viral population circulating in individual patients. Many tools have been developed to preprocess sequencing reads, carry-out de novo or reference-assisted assembly of viral genomes and assess the quality of the genomes obtained. Most of these tools however exist as standalone workflows and usually require huge computational resources. Here we present (Viral Genomes Easily Analyzed), a Snakemake workflow for analyzing RNA viral genomes. VGEA enables users to map sequencing reads to the human genome to remove human contaminants, split bam files into forward and reverse reads, carry out de novo assembly of forward and reverse reads to generate contigs, pre-process reads for quality and contamination, map reads to a reference tailored to the sample using corrected contigs supplemented by the user’s choice of reference sequences and evaluate/compare genome assemblies. We designed a project with the aim of creating a flexible, easy-to-use and all-in-one pipeline from existing/stand-alone bioinformatics tools for viral genome analysis that can be deployed on a personal computer. VGEA was built on the Snakemake workflow management system and utilizes existing tools for each step: fastp (Chen et al., 2018) for read trimming and read-level quality control, BWA (Li & Durbin, 2009) for mapping sequencing reads to the human reference genome, SAMtools (Li et al., 2009) for extracting unmapped reads and also for splitting bam files into fastq files, IVA (Hunt et al., 2015) for de novo assembly to generate contigs, shiver (Wymant et al., 2018) to pre-process reads for quality and contamination, then map to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences, SeqKit (Shen et al., 2016) for cleaning shiver assembly for QUAST, QUAST (Gurevich et al., 2013) to evaluate/assess the quality of genome assemblies and MultiQC (Ewels et al., 2016) for aggregation of the results from fastp, BWA and QUAST. Our pipeline was successfully tested and validated with SARS-CoV-2 (n = 20), HIV-1 (n = 20) and Lassa Virus (n = 20) datasets all of which have been made publicly available. VGEA is freely available on GitHub at: https://github.com/pauloluniyi/VGEA under the GNU General Public License.


2020 ◽  
Author(s):  
Brendan N. Reid ◽  
Rachel L. Moran ◽  
Christopher J. Kopack ◽  
Sarah W. Fitzpatrick

AbstractResearchers studying non-model organisms have an increasing number of methods available for generating genomic data. However, the applicability of different methods across species, as well as the effect of reference genome choice on population genomic inference, are still difficult to predict in many cases. We evaluated the impact of data type (whole-genome vs. reduced representation) and reference genome choice on data quality and on population genomic and phylogenomic inference across several species of darters (subfamily Etheostomatinae), a highly diverse radiation of freshwater fish. We generated a high-quality reference genome and developed a hybrid RADseq/sequence capture (Rapture) protocol for the Arkansas darter (Etheostoma cragini). Rapture data from 1900 individuals spanning four darter species showed recovery of most loci across darter species at high depth and consistent estimates of heterozygosity regardless of reference genome choice. Loci with baits spanning both sides of the restriction enzyme cut site performed especially well across species. For low-coverage whole-genome data, choice of reference genome affected read depth and inferred heterozygosity. For similar amounts of sequence data, Rapture performed better at identifying fine-scale genetic structure compared to whole-genome sequencing. Rapture loci also recovered an accurate phylogeny for the study species and demonstrated high phylogenetic informativeness across the evolutionary history of the genus Etheostoma. Low cost and high cross-species effectiveness regardless of reference genome suggest that Rapture and similar sequence capture methods may be worthwhile choices for studies of diverse species radiations.


2015 ◽  
Vol 112 (33) ◽  
pp. 10200-10207 ◽  
Author(s):  
Jan Janouškovec ◽  
Denis V. Tikhonenkov ◽  
Fabien Burki ◽  
Alexis T. Howe ◽  
Martin Kolísko ◽  
...  

Apicomplexans are a major lineage of parasites, including causative agents of malaria and toxoplasmosis. How such highly adapted parasites evolved from free-living ancestors is poorly understood, particularly because they contain nonphotosynthetic plastids with which they have a complex metabolic dependency. Here, we examine the origin of apicomplexan parasitism by resolving the evolutionary distribution of several key characteristics in their closest free-living relatives, photosynthetic chromerids and predatory colpodellids. Using environmental sequence data, we describe the diversity of these apicomplexan-related lineages and select five species that represent this diversity for transcriptome sequencing. Phylogenomic analysis recovered a monophyletic lineage of chromerids and colpodellids as the sister group to apicomplexans, and a complex distribution of retention versus loss for photosynthesis, plastid genomes, and plastid organelles. Reconstructing the evolution of all plastid and cytosolic metabolic pathways related to apicomplexan plastid function revealed an ancient dependency on plastid isoprenoid biosynthesis, predating the divergence of apicomplexan and dinoflagellates. Similarly, plastid genome retention is strongly linked to the retention of two genes in the plastid genome, sufB and clpC, altogether suggesting a relatively simple model for plastid retention and loss. Lastly, we examine the broader distribution of a suite of molecular characteristics previously linked to the origins of apicomplexan parasitism and find that virtually all are present in their free-living relatives. The emergence of parasitism may not be driven by acquisition of novel components, but rather by loss and modification of the existing, conserved traits.


2019 ◽  
Vol 8 (4) ◽  
Author(s):  
Everlyn Kamau ◽  
Charles N. Agoti ◽  
Joyce M. Ngoi ◽  
Zaydah R. de Laurent ◽  
John Gitonga ◽  
...  

Dengue infection remains poorly characterized in Africa and little is known regarding its associated viral genetic diversity. Here, we report dengue virus type 2 (DENV-2) sequence data from 10 clinical samples, including 5 complete genome sequences of the cosmopolitan genotype, obtained from febrile adults seeking outpatient care in coastal Kenya.


2020 ◽  
Vol 10 (3) ◽  
pp. 899-906 ◽  
Author(s):  
Thomas C. Mathers

Aphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid (Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence data. However, version 1 of the soybean aphid genome assembly has low contiguity (contig N50 = 57 Kb, scaffold N50 = 174 Kb), poor representation of conserved genes and the presence of genomic scaffolds likely derived from parasitoid wasp contamination. Here, I use recently developed methods to reassemble the soybean aphid genome. The version 2 genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds (contig N50 = 2.00 Mb, scaffold N50 = 2.51 Mb) and contains 11% more conserved single-copy arthropod genes than version 1. To demonstrate the utility of this improved assembly, I identify a region of conserved synteny between aphids and Drosophila containing members of the Osiris gene family that was split over multiple scaffolds in the original assembly. The improved genome assembly and annotation of A. glycines demonstrates the benefit of applying new methods to old data sets and will provide a useful resource for future comparative genome analysis of aphids.


2019 ◽  
Vol 35 (21) ◽  
pp. 4430-4432 ◽  
Author(s):  
René L Warren ◽  
Lauren Coombe ◽  
Hamid Mohamadi ◽  
Jessica Zhang ◽  
Barry Jaquish ◽  
...  

Abstract Motivation In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. Although utilities exist for genome base polishing, they work best with high read coverage and do not scale well. We developed ntEdit, a Bloom filter-based genome sequence editing utility that scales to large mammalian and conifer genomes. Results We first tested ntEdit and the state-of-the-art assembly improvement tools GATK, Pilon and Racon on controlled Escherichia coli and Caenorhabditis elegans sequence data. Generally, ntEdit performs well at low sequence depths (<20×), fixing the majority (>97%) of base substitutions and indels, and its performance is largely constant with increased coverage. In all experiments conducted using a single CPU, the ntEdit pipeline executed in <14 s and <3 m, on average, on E.coli and C.elegans, respectively. We performed similar benchmarks on a sub-20× coverage human genome sequence dataset, inspecting accuracy and resource usage in editing chromosomes 1 and 21, and whole genome. ntEdit scaled linearly, executing in 30–40 m on those sequences. We show how ntEdit ran in <2 h 20 m to improve upon long and linked read human genome assemblies of NA12878, using high-coverage (54×) Illumina sequence data from the same individual, fixing frame shifts in coding sequences. We also generated 17-fold coverage spruce sequence data from haploid sequence sources (seed megagametophyte), and used it to edit our pseudo haploid assemblies of the 20 Gb interior and white spruce genomes in <4 and <5 h, respectively, making roughly 50M edits at a (substitution+indel) rate of 0.0024. Availability and implementation https://github.com/bcgsc/ntedit Supplementary information Supplementary data are available at Bioinformatics online.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5895 ◽  
Author(s):  
Thomas Andreas Kohl ◽  
Christian Utpatel ◽  
Viola Schleusener ◽  
Maria Rosaria De Filippo ◽  
Patrick Beckert ◽  
...  

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.


Sign in / Sign up

Export Citation Format

Share Document