scholarly journals Exploring the Diversity of Bacillus whole genome sequencing projects using Peasant, the Prokaryotic Assembly and Annotation Tool

2017 ◽  
Author(s):  
Jonathon Brenner ◽  
Laurynas Kalesinskas ◽  
Catherine Putonti

ABSTRACTBackgroundThe persistent decrease in cost and difficulty of whole genome sequencing of microbial organisms has led to a dramatic increase in the number of species and strains characterized from a wide variety of environments. Microbial genome sequencing can now be conducted by small laboratories and as part of undergraduate curriculum. While sequencing is routine in microbiology, assembly, annotation and downstream analyses still require computational resources and expertise, often necessitating familiarity with programming languages. To address this problem, we have created a light-weight, user-friendly tool for the assembly and annotation of microbial sequencing projects.ResultsThe Prokaryotic Assembly and Annotation Tool, Peasant, automates the processes of read quality control, genome assembly, and annotation for microbial sequencing projects. High-quality assemblies and annotations can be generated by Peasant without the need of programming expertise or high-performance computing resources. Furthermore, statistics are calculated so that users can evaluate their sequencing project. To illustrate the computational speed and accuracy of Peasant, the SRA records of 322 Illumina platform whole genome sequencing assays for Bacillus species were retrieved from NCBI, assembled and annotated on a single desktop computer. From the assemblies and annotations produced, a comprehensive analysis of the diversity of over 200 high-quality samples was conducted, looking at both the 16S rRNA phylogenetic marker as well as the Bacillus core genome.ConclusionsPeasant provides an intuitive solution for high-quality whole genome sequence assembly and annotation for users with limited programing experience and/or computational resources. The analysis of the Bacillus whole genome sequencing projects exemplifies the utility of this tool. Furthermore, the study conducted here provides insight into the diversity of the species, the largest such comparison conducted to date.

Viruses ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 846 ◽  
Author(s):  
Jan Forth ◽  
Leonie Forth ◽  
Jacqueline King ◽  
Oxana Groza ◽  
Alexandra Hübner ◽  
...  

African swine fever (ASF) is a severe disease of suids caused by African swine fever virus (ASFV). Its dsDNA genome (170–194 kbp) is scattered with homopolymers and repeats as well as inverted-terminal-repeats (ITR), which hamper whole-genome sequencing. To date, only a few genome sequences have been published and only for some are data on sequence quality available enabling in-depth investigations. Especially in Europe and Asia, where ASFV has continuously spread since its introduction into Georgia in 2007, a very low genetic variability of the circulating ASFV-strains was reported. Therefore, only whole-genome sequences can serve as a basis for detailed virus comparisons. Here, we report an effective workflow, combining target enrichment, Illumina and Nanopore sequencing for ASFV whole-genome sequencing. Following this approach, we generated an improved high-quality ASFV Georgia 2007/1 whole-genome sequence leading to the correction of 71 sequencing errors and the addition of 956 and 231 bp at the respective ITRs. This genome, derived from the primary outbreak in 2007, can now serve as a reference for future whole-genome analyses of related ASFV strains and molecular approaches. Using both workflow and the reference genome, we generated the first ASFV-whole-genome sequence from Moldova, expanding the sequence knowledge from Eastern Europe.


2018 ◽  
Author(s):  
Adam C. Naj ◽  
Honghuang Lin ◽  
Badri N. Vardarajan ◽  
Simon White ◽  
Daniel Lancour ◽  
...  

AbstractThe Alzheimer’s Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed “consensus calling,” to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.AbbreviationsAD, Alzheimer’s disease; QC, Quality Control; LSSAC, Large-Scale Sequencing and Analysis Center; Broad, Broad Institute Genomics Service; Baylor, Baylor College of Medicine Human Genome Sequencing Center; WashU, Washington University-St. Louis McDonnell Genome Institute; WGS, whole genome sequencing; WES, whole exome sequencing; indel, insertion-deletion variants; VCF, variant control format; MI, Mendelian inconsistency; MC, Mendelian consistency; GWAS, genome-wide association study; VR, referent allele read depth; DP, overall read depth; MS, mapping score; GQ, genotype quality score; Ti/Tv, Transition/Transversion; CS, concordance code


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Weili Cai ◽  
Schyler Nunziata ◽  
John Rascoe ◽  
Michael J. Stulberg

AbstractHuanglongbing (HLB) is a worldwide deadly citrus disease caused by the phloem-limited bacteria ‘Candidatus Liberibacter asiaticus’ (CLas) vectored by Asian citrus psyllids. In order to effectively manage this disease, it is crucial to understand the relationship among the bacterial isolates from different geographical locations. Whole genome sequencing approaches will provide more precise molecular characterization of the diversity among populations. Due to the lack of in vitro culture, obtaining the whole genome sequence of CLas is still a challenge, especially for medium to low titer samples. Hundreds of millions of sequencing reads are needed to get good coverage of CLas from an HLB positive citrus sample. In order to overcome this limitation, we present here a new method, Agilent SureSelect XT HS target enrichment, which can specifically enrich CLas from a metagenomic sample while greatly reducing cost and increasing whole genome coverage of the pathogen. In this study, the CLas genome was successfully sequenced with 99.3% genome coverage and over 72X sequencing coverage from low titer tissue samples (equivalent to 28.52 Cq using Li 16 S qPCR). More importantly, this method also effectively captures regions of diversity in the CLas genome, which provides precise molecular characterization of different strains.


2015 ◽  
Vol 117 (suppl_1) ◽  
Author(s):  
Matthew Wheeler ◽  
Daryl Waggott ◽  
Megan Grove ◽  
Frederick Dewey ◽  
Cuiping Pan ◽  
...  

Background: Technological advances have greatly reduced the cost of whole genome sequencing. For single individuals clinical application is apparent, while exome sequencing in tens of thousands of people has allowed a more global view of genetic variation that can inform interpretation of specific variants in individuals. We hypothesized that genome sequencing of patients with monogenic cardiomyopathy would facilitate discovery of genetic modifiers of phenotype. Methods and Results: We identified 48 individuals diagnosed with cardiomyopathy and with putative mutations in MYH7, the gene encoding beta myosin heavy chain. We carried out whole genome sequencing and applied a newly developed analytical pipeline optimized for discovery of genes modifying severity of clinical presentation and outcomes. Using a combination of external priors and rare variant burden tests we scored genes as potential modifiers. There were 96 genes that reached a modifier score of 6 out of 12 or better (9=2, 8=8, 7=17, 6=69). We identified NCKAP1, a gene that regulates actin filament dynamics, and CAMSAP1, a calmodulin regulate gene that regulates microtubule dynamics, as top scoring modifiers of hypertrophic cardiomyopathy phenotypes (score=9) while LDB2, RYR2, FBN1 and ATP1A2 had modifier scores of 8. Of the top scoring genes, 21 out of 96 were identified as candidates a priori. Our candidate prioritization scheme identified the previously described modifiers of cardiomyopathy phenotype, FHOD3 and MYBPC3, as top scoring genes. We identified structural variants in 21 clinically sequenced cardiomyopathy associated genes, 13 of which were at less than 10% frequency. Copy number variants in ILK and CSRP3 were nominally associated with ejection fraction (p=0.03), while 8 genes showed copy gains (GLA, FKTN, SGCD, TTN, SOS1, ANKRD1, VCL and NEBL). Structural variants were found in CSRP3, MYL3 and TNNC1, all of which have been implicated as causative for HCM. Conclusion: Evaluation of the whole genome sequence, even in the case of putatively monogenic disease, leads to important diagnostic and scientific insights not revealed by panel-based sequencing.


2015 ◽  
Vol 81 (17) ◽  
pp. 6024-6037 ◽  
Author(s):  
Matthew J. Stasiewicz ◽  
Haley F. Oliver ◽  
Martin Wiedmann ◽  
Henk C. den Bakker

ABSTRACTWhile the food-borne pathogenListeria monocytogenescan persist in food associated environments, there are no whole-genome sequence (WGS) based methods to differentiate persistent from sporadic strains. Whole-genome sequencing of 188 isolates from a longitudinal study ofL. monocytogenesin retail delis was used to (i) apply single-nucleotide polymorphism (SNP)-based phylogenetics for subtyping ofL. monocytogenes, (ii) use SNP counts to differentiate persistent from repeatedly reintroduced strains, and (iii) identify genetic determinants ofL. monocytogenespersistence. WGS analysis revealed three prophage regions that explained differences between three pairs of phylogenetically similar populations with pulsed-field gel electrophoresis types that differed by ≤3 bands. WGS-SNP-based phylogenetics found that putatively persistentL. monocytogenesrepresent SNP patterns (i) unique to a single retail deli, supporting persistence within the deli (11 clades), (ii) unique to a single state, supporting clonal spread within a state (7 clades), or (iii) spanning multiple states (5 clades). Isolates that formed one of 11 deli-specific clades differed by a median of 10 SNPs or fewer. Isolates from 12 putative persistence events had significantly fewer SNPs (median, 2 to 22 SNPs) than between isolates of the same subtype from other delis (median up to 77 SNPs), supporting persistence of the strain. In 13 events, nearly indistinguishable isolates (0 to 1 SNP) were found across multiple delis. No individual genes were enriched among persistent isolates compared to sporadic isolates. Our data show that WGS analysis improves food-borne pathogen subtyping and identification of persistent bacterial pathogens in food associated environments.


2016 ◽  
Vol 4 (6) ◽  
Author(s):  
Claudia Carolina Carbonari ◽  
Nahuel Fittipaldi ◽  
Sarah Teatero ◽  
Taryn B. T. Athey ◽  
Luis Pianciola ◽  
...  

Shiga toxin-producing Escherichia coli strains are worldwide associated with sporadic human infections and outbreaks. In this work, we report the availability of high-quality draft whole-genome sequences for 19 O157:H7 strains isolated in Argentina.


2021 ◽  
Author(s):  
Dario Fernández Do Porto ◽  
Johana Monteserin ◽  
Josefina Campos ◽  
Ezequiel J Sosa ◽  
Mario Matteo ◽  
...  

Abstract BackgroundWhole-genome sequencing has shown that the Mycobacterium tuberculosis infection process can be more heterogeneous than previously thought. Compartmentalized infections, exogenous reinfections, and microevolution are manifestations of this clonal complexity. The analysis of the mechanisms causing the microevolution —the genetic variability of M. tuberculosis at short time scales— of a parental strain into clonal variants with a patient is a relevant issue that has not been yet completely addressed. To our knowledge, a whole genome sequence microevolution analysis in a single patient with inadequate adherence to treatment has not been previously reported.Case Presentations In this work, we applied whole genome sequencing for a more in-depth analysis of the microevolution of a parental Mycobacterium tuberculosis strain into clonal variants within a patient with poor treatment compliance in Argentina. We analyzed the whole-genome sequence of 8 consecutive Mycobacterium. tuberculosis isolates obtained from a patient within 57-month of intermittent therapy. Nineteen mutations (9 short-term, 10 fixed variants) emerged, most of them associated with drug resistance. The first isolate was already resistant to isoniazid, rifampicin, and streptomycin, thereafter the strain developed resistance to fluoroquinolones and pyrazinamide. Surprisingly, isolates remained susceptible to the pro-drug ethionamide after acquiring a frameshift mutation in ethA, a gene required for its activation. We also found a novel variant, (T-54G), in the 5' untranslated region of whiB7 (T-54G), a region allegedly related to kanamycin resistance. Notably, discrepancies between canonical and phage-based susceptibility testing to kanamycin were previously found for the isolate harboring this mutation. In our patience, microevolution was mainly driven by drug selective pressure. Rare short-term mutations fixed together with resistance-conferring mutations during therapy.ConclusionsThis report highlights the relevance of whole-genome sequencing in the clinic for characterization of pre-XDR and MDR resistance profile, particularly in patients with incomplete and/or intermittent treatment.


Sign in / Sign up

Export Citation Format

Share Document