High conservation combined with high plasticity: genomics and evolution of Borrelia bavariensis

Abstract BackgroundBorrelia bavariensis is one of the agents of Lyme Borreliosis (or Lyme disease) in Eurasia. The genome of the Borrelia burgdorferi sensu lato species complex, that includes B. bavariensis , is known to be very complex and fragmented making the assembly of whole genomes with next-generation sequencing data a challenge. ResultsWe present a genome reconstruction for 33 B. bavariensis isolates from Eurasia based on long-read (Pacific Bioscience, for three isolates) and short-read (Illumina) data. We show that the combination of both sequencing techniques allows proper genome reconstruction of all plasmids in most cases but us e of a very close reference is necessary when only short-read sequencing data is available. B. bavariensis genomes combine a high degree of genetic conservation with high plasticity: all isolates share the main chromosome and five plasmids, but the repertoire of other plasmids is highly variable. In addition to plasmid losses and gains through horizontal transfer, we also observe several fusions between plasmids. Although European isolates of B. bavariensis have little diversity in genome content, there is some geographic structure to this variation. In contrast, each Asian isolate has a unique plasmid repertoire and we observe no geographically based differences between Japanese and Russian isolates. Comparing the genomes of Asian and European populations of B. bavariensis suggest s that some genes which are markedly different between the two populations may be good candidates for adaptation to the tick vector, ( Ixodes ricinus in Europe and I. persulcatus in Asia) . ConclusionsWe present the characterization of genomes of a large sample of B. bavariensis isolates and show that their plasmid content is highly variable. This study opens the way for genomic studies seeking to understand host and vector adaptation as well as human pathogenicity in Eurasian Lyme borreliosis agents.

Download Full-text

High conservation combined with high plasticity: genomics and evolution of Borrelia bavariensis

10.21203/rs.3.rs-29892/v2 ◽

2020 ◽

Author(s):

Noémie S Becker ◽

Robert Ethan Rollins ◽

Kateryna Nosenko ◽

Alexander Paulus ◽

Samantha Martin ◽

...

Keyword(s):

Lyme Borreliosis ◽

Next Generation Sequencing Data ◽

Borrelia Burgdorferi Sensu Lato ◽

High Plasticity ◽

Sequencing Data ◽

Short Read ◽

Genome Reconstruction ◽

A Genome ◽

Long Read ◽

Genomic Studies

Abstract Background Borrelia bavariensis is one of the agents of Lyme Borreliosis (or Lyme disease) in Eurasia. The genome of the Borrelia burgdorferi sensu lato species complex, that includes B. bavariensis , is known to be very complex and fragmented making the assembly of whole genomes with next-generation sequencing data a challenge. Results We present a genome reconstruction for 33 B. bavariensis isolates from Eurasia based on long-read (Pacific Bioscience, for three isolates) and short-read (Illumina) data. We show that the combination of both sequencing techniques allows proper genome reconstruction of all plasmids in most cases but use of a very close reference is necessary when only short-read sequencing data is available. B. bavariensis genomes combine a high degree of genetic conservation with high plasticity: all isolates share the main chromosome and five plasmids, but the repertoire of other plasmids is highly variable. In addition to plasmid losses and gains through horizontal transfer, we also observe several fusions between plasmids. Although European isolates of B. bavariensis have little diversity in genome content, there is some geographic structure to this variation. In contrast, each Asian isolate has a unique plasmid repertoire and we observe no geographically based differences between Japanese and Russian isolates. Comparing the genomes of Asian and European populations of B. bavariensis suggests that some genes which are markedly different between the two populations may be good candidates for adaptation to the tick vector, ( Ixodes ricinus in Europe and I. persulcatus in Asia). Conclusions We present the characterization of genomes of a large sample of B. bavariensis isolates and show that their plasmid content is highly variable. This study opens the way for genomic studies seeking to understand host and vector adaptation as well as human pathogenicity in Eurasian Lyme Borreliosis agents.

Download Full-text

High conservation combined with high plasticity: genomics and evolution of Borrelia bavariensis

BMC Genomics ◽

10.1186/s12864-020-07054-3 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Noémie S. Becker ◽

Robert E. Rollins ◽

Kateryna Nosenko ◽

Alexander Paulus ◽

Samantha Martin ◽

...

Keyword(s):

Lyme Borreliosis ◽

Next Generation Sequencing Data ◽

High Plasticity ◽

Sequencing Data ◽

Short Read ◽

Genome Reconstruction ◽

A Genome ◽

Long Read ◽

Genomic Studies ◽

High Conservation

Abstract Background Borrelia bavariensis is one of the agents of Lyme Borreliosis (or Lyme disease) in Eurasia. The genome of the Borrelia burgdorferi sensu lato species complex, that includes B. bavariensis, is known to be very complex and fragmented making the assembly of whole genomes with next-generation sequencing data a challenge. Results We present a genome reconstruction for 33 B. bavariensis isolates from Eurasia based on long-read (Pacific Bioscience, for three isolates) and short-read (Illumina) data. We show that the combination of both sequencing techniques allows proper genome reconstruction of all plasmids in most cases but use of a very close reference is necessary when only short-read sequencing data is available. B. bavariensis genomes combine a high degree of genetic conservation with high plasticity: all isolates share the main chromosome and five plasmids, but the repertoire of other plasmids is highly variable. In addition to plasmid losses and gains through horizontal transfer, we also observe several fusions between plasmids. Although European isolates of B. bavariensis have little diversity in genome content, there is some geographic structure to this variation. In contrast, each Asian isolate has a unique plasmid repertoire and we observe no geographically based differences between Japanese and Russian isolates. Comparing the genomes of Asian and European populations of B. bavariensis suggests that some genes which are markedly different between the two populations may be good candidates for adaptation to the tick vector, (Ixodes ricinus in Europe and I. persulcatus in Asia). Conclusions We present the characterization of genomes of a large sample of B. bavariensis isolates and show that their plasmid content is highly variable. This study opens the way for genomic studies seeking to understand host and vector adaptation as well as human pathogenicity in Eurasian Lyme Borreliosis agents.

Download Full-text

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Nature Communications ◽

10.1038/s41467-021-24041-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chong Chu ◽

Rebeca Borges-Monroy ◽

Vinayak V. Viswanadham ◽

Soohyun Lee ◽

Heng Li ◽

...

Keyword(s):

Transposable Element ◽

Structure And Function ◽

Endogenous Retroviruses ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

And Function

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.

Download Full-text

Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA

Genome Biology ◽

10.1186/gb-2010-11-10-r99 ◽

2010 ◽

Vol 11 (10) ◽

Cited By ~ 53

Author(s):

Nils Homer ◽

Stanley F Nelson

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Short Read ◽

Variant Discovery ◽

Generation Sequencing

Download Full-text

The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab028 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Xueyi Dong ◽

Luyi Tian ◽

Quentin Gouil ◽

Hasaru Kariyawasam ◽

Shian Su ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Transcriptomic Analysis ◽

Statistical Testing ◽

Rna Seq ◽

Sequencing Data ◽

Short Read ◽

Sequencing Platform ◽

Long Read

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.

Download Full-text

Rapid Mycobacterium tuberculosis spoligotyping from uncorrected long reads using Galru

10.1101/2020.05.31.126490 ◽

2020 ◽

Author(s):

Andrew J. Page ◽

Nabil-Fareed Alikhan ◽

Michael Strinden ◽

Thanh Le Viet ◽

Timofey Skvortsov

Keyword(s):

Mycobacterium Tuberculosis ◽

State Of The Art ◽

Sequence Data ◽

Human Pathogen ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Long Reads ◽

Long Read

AbstractSpoligotyping of Mycobacterium tuberculosis provides a subspecies classification of this major human pathogen. Spoligotypes can be predicted from short read genome sequencing data; however, no methods exist for long read sequence data such as from Nanopore or PacBio. We present a novel software package Galru, which can rapidly detect the spoligotype of a Mycobacterium tuberculosis sample from as little as a single uncorrected long read. It allows for near real-time spoligotyping from long read data as it is being sequenced, giving rapid sample typing. We compare it to the existing state of the art software and find it performs identically to the results obtained from short read sequencing data. Galru is freely available from https://github.com/quadram-institute-bioscience/galru under the GPLv3 open source licence.

Download Full-text

PERHAPS: Paired-End short Reads-based HAPlotyping from next-generation Sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa320 ◽

2020 ◽

Author(s):

Jie Huang ◽

Stefano Pallotti ◽

Qianling Zhou ◽

Marcus Kleber ◽

Xiaomeng Xin ◽

...

Keyword(s):

Next Generation Sequencing ◽

Snp Array ◽

Simple Approach ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Short Read ◽

Array Data ◽

Short Reads ◽

Generation Sequencing

Abstract The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.

Download Full-text

GCAT-SEEKquence: Genome Consortium for Active Teaching of Undergraduates through Increased Faculty Access to Next-Generation Sequencing Data

CBE—Life Sciences Education ◽

10.1187/cbe.11-08-0065 ◽

2011 ◽

Vol 10 (4) ◽

pp. 342-345 ◽

Cited By ~ 15

Author(s):

Vincent P. Buonaccorsi ◽

Michael D. Boyle ◽

Deborah Grove ◽

Craig Praul ◽

Eric Sakk ◽

...

Keyword(s):

Undergraduate Students ◽

Biology Education ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Undergraduate Biology ◽

Active Teaching ◽

Nextgen Sequencing ◽

A Genome ◽

Genome Consortium

To transform undergraduate biology education, faculty need to provide opportunities for students to engage in the process of science. The rise of research approaches using next-generation (NextGen) sequencing has been impressive, but incorporation of such approaches into the undergraduate curriculum remains a major challenge. In this paper, we report proceedings of a National Science Foundation–funded workshop held July 11–14, 2011, at Juniata College. The purpose of the workshop was to develop a regional research coordination network for undergraduate biology education (RCN/UBE). The network is collaborating with a genome-sequencing core facility located at Pennsylvania State University (University Park) to enable undergraduate students and faculty at small colleges to access state-of-the-art sequencing technology. We aim to create a database of references, protocols, and raw data related to NextGen sequencing, and to find innovative ways to reduce costs related to sequencing and bioinformatics analysis. It was agreed that our regional network for NextGen sequencing could operate more effectively if it were partnered with the Genome Consortium for Active Teaching (GCAT) as a new arm of that consortium, entitled GCAT-SEEK(quence). This step would also permit the approach to be replicated elsewhere.

Download Full-text

Complete Genome Sequence of Rubrobacter xylanophilus Strain AA3-22, Isolated from Arima Onsen in Japan

Microbiology Resource Announcements ◽

10.1128/mra.00818-19 ◽

2019 ◽

Vol 8 (34) ◽

Cited By ~ 1

Author(s):

Natsuki Tomariguchi ◽

Kentaro Miyazaki

Keyword(s):

Genome Sequence ◽

Complete Genome Sequence ◽

Complete Genome ◽

Hot Spring ◽

Sequencing Data ◽

Short Read ◽

Content Type ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Long Read

Rubrobacter xylanophilus strain AA3-22, belonging to the phylum Actinobacteria, was isolated from nonvolcanic Arima Onsen (hot spring) in Japan. Here, we report the complete genome sequence of this organism, which was obtained by combining Oxford Nanopore long-read and Illumina short-read sequencing data.

Download Full-text

A computational toolset for rapid identification of SARS-CoV-2, other viruses and microorganisms from sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa231 ◽

2020 ◽

Cited By ~ 1

Author(s):

Shifu Chen ◽

Changshou He ◽

Yingqiang Li ◽

Zhicheng Li ◽

Charles E Melançon

Keyword(s):

Middle East ◽

Severe Acute Respiratory Syndrome ◽

Rapid Identification ◽

Large Set ◽

Sequencing Data ◽

Short Read ◽

Microbial Genomes ◽

Extension Method ◽

Long Read ◽

Adapter Trimming

Abstract In this paper, we present a toolset and related resources for rapid identification of viruses and microorganisms from short-read or long-read sequencing data. We present fastv as an ultra-fast tool to detect microbial sequences present in sequencing data, identify target microorganisms and visualize coverage of microbial genomes. This tool is based on the k-mer mapping and extension method. K-mer sets are generated by UniqueKMER, another tool provided in this toolset. UniqueKMER can generate complete sets of unique k-mers for each genome within a large set of viral or microbial genomes. For convenience, unique k-mers for microorganisms and common viruses that afflict humans have been generated and are provided with the tools. As a lightweight tool, fastv accepts FASTQ data as input and directly outputs the results in both HTML and JSON formats. Prior to the k-mer analysis, fastv automatically performs adapter trimming, quality pruning, base correction and other preprocessing to ensure the accuracy of k-mer analysis. Specifically, fastv provides built-in support for rapid severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) identification and typing. Experimental results showed that fastv achieved 100% sensitivity and 100% specificity for detecting SARS-CoV-2 from sequencing data; and can distinguish SARS-CoV-2 from SARS, Middle East respiratory syndrome and other coronaviruses. This toolset is available at: https://github.com/OpenGene/fastv.

Download Full-text