scholarly journals From partial to whole genome imputation of SARS-CoV-2 for epidemiological surveillance

2021 ◽  
Author(s):  
Francisco M Ortuno ◽  
Carlos Loucera ◽  
Carlos S Casimiro-Soriguer ◽  
Jose A Lepe ◽  
Pedro Camacho Martinez ◽  
...  

The current SARS-CoV-2 pandemic has emphasized the utility of viral whole genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is increasingly producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which viruses are sequenced, along with the demand of urgent results, causes a high rate of incomplete and therefore useless, sequences. However, viral sequences evolve in the context of a complex phylogeny and therefore different positions along the genome are in linkage disequilibrium. Therefore, an imputation method would be able to predict missing positions from the available sequencing data. We developed impuSARS, an application that includes Minimac, the most widely used strategy for genomic data imputation and, taking advantage of the enormous amount of SARS-CoV-2 whole genome sequences available, a reference panel containing 239,301 sequences was built. The impuSARS application was tested in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing) showing great fidelity when reconstructing the original sequences. The impuSARS application is also able to impute whole genomes from commercial kits covering less than 20% of the genome or only from the Spike protein with a precision of 0.96. It also recovers the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (< 20%). Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. impuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 whole genome sequencing.

2020 ◽  
Vol 58 (11) ◽  
Author(s):  
Thomas A. Kohl ◽  
Katharina Kranzer ◽  
Sönke Andres ◽  
Thierry Wirth ◽  
Stefan Niemann ◽  
...  

ABSTRACT Mycobacterium bovis is the primary cause of bovine tuberculosis (bTB) and infects a wide range of domestic animal and wildlife species and humans. In Germany, bTB still emerges sporadically in cattle herds, free-ranging wildlife, diverse captive animal species, and humans. In order to understand the underlying population structure and estimate the population size fluctuation through time, we analyzed 131 M. bovis strains from animals (n = 38) and humans (n = 93) in Germany from 1999 to 2017 by whole-genome sequencing (WGS), mycobacterial interspersed repetitive-unit–variable-number tandem-repeat (MIRU-VNTR) typing, and spoligotyping. Based on WGS data analysis, 122 out of the 131 M. bovis strains were classified into 13 major clades, of which 6 contained strains from both human and animal cases and 7 only strains from human cases. Bayesian analyses suggest that the M. bovis population went through two sharp anticlimaxes, one in the middle of the 18th century and another one in the 1950s. WGS-based cluster analysis grouped 46 strains into 13 clusters ranging in size from 2 to 11 members and involving strains from distinct host types, e.g., only cattle and also mixed hosts. Animal strains of four clusters were obtained over a 9-year span, pointing toward autochthonous persistent bTB infection cycles. As expected, WGS had a higher discriminatory power than spoligotyping and MIRU-VNTR typing. In conclusion, our data confirm that WGS and suitable bioinformatics constitute the method of choice to implement prospective molecular epidemiological surveillance of M. bovis. The population of M. bovis in Germany is diverse, with subtle, but existing, interactions between different host groups.


2020 ◽  
Author(s):  
Yingxi Yang ◽  
Yuchen Yang ◽  
Le Huang ◽  
Jai G. Broome ◽  
Adolfo Correa ◽  
...  

AbstractWith advances in whole genome sequencing (WGS) technology, multiple statistical methods for aggregate association testing have been developed. Many common approaches aggregate variants in a given genomic window of a fixed/varying size and are not reliant on existing knowledge to define appropriate test units, resulting in most identified regions not being clearly linked to genes, limiting biological understanding. Functional information from new technologies (such as Hi-C and its derivatives), which can help link enhancers to the genes they affect, can be leveraged to predefine variant sets for aggregate testing in WGS. Therefore, in this paper we propose the eSCAN (Scan the Enhancers) method for genome-wide assessment of enhancer regions in sequencing studies, combining the advantages of dynamic window selection in SCANG with the advantages of increased incorporation of genomic annotation. eSCAN searches biologically meaningful searching windows, increasing power and aiding biological interpretation, as demonstrated by simulation studies under a wide range of scenarios. We also apply eSCAN for association analysis of blood cell traits using TOPMed WGS data from Women’s Health Initiative (WHI) and Jackson Heart Study (JHS). Results from this real data example show that eSCAN is able to capture more significant signals, and these signals are of shorter length and drive association of larger regions detected by other methods.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 63 ◽  
Author(s):  
Maxime Garcia ◽  
Szilveszter Juhos ◽  
Malin Larsson ◽  
Pall I. Olason ◽  
Marcel Martin ◽  
...  

Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.


Author(s):  
M.I. Terekhova ◽  
◽  
E.V. Rogacheva ◽  
I.A. Derevyanchenko ◽  
L.A. Kraeva ◽  
...  

The increasing number of antibiotic-resistant isolates of L. monocytogenes is required to establish a genotypic resistance profile to ensure appropriate antibiotic therapy of listeriosis. In this study, whole-genome sequencing and de novo assembly was performed on L. monocytogenes strains from St. Petersburg and the Vologda region. We obtained the MLST ST, phylogenetic lineage and PCR-serogroups in silico for isolates under the study, revealed genes and mutations associated with antibiotic resistance. In general, the genetic composition was similar between the strains from different regions and included a wide range of antibiotic resistance mechanisms. Listeria strains possessed genes that code for resistance to β-lactam antibiotics, fluoroquinolones, tetracyclines and macrolides, — classes that are commonly used in the treatment of listeria infection. The present study is important in the sanitary and epidemiological surveillance of listeriosis in Russia.


2008 ◽  
Vol 191 (5) ◽  
pp. 1725-1725 ◽  
Author(s):  
Steven L. Salzberg ◽  
Daniela Puiu ◽  
Daniel D. Sommer ◽  
Vish Nene ◽  
Norman H. Lee

ABSTRACT Wolbachia species are endosymbionts of a wide range of invertebrates, including mosquitoes, fruit flies, and nematodes. The wPip strains can cause cytoplasmic incompatibility in some strains of the Culex mosquito. Here we describe the genome sequence of a Wolbachia strain that was discovered in the whole-genome sequencing data for the mosquito Culex quinquefasciatus strain JHB.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 63 ◽  
Author(s):  
Maxime Garcia ◽  
Szilveszter Juhos ◽  
Malin Larsson ◽  
Pall I. Olason ◽  
Marcel Martin ◽  
...  

Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.


2021 ◽  
Vol 97 (6) ◽  
pp. 587-593
Author(s):  
A. S. Vodopianov ◽  
R. V. Pisanov ◽  
S. O. Vodopianov ◽  
I. P. Oleynikov

Aim. To improve the method of the quality assessment of single nucleotide polymorphisms, which are used for SNP-typing, based on the analysis of their distribution in the primary data of whole genome sequencing (reads).Materials and methods. Data of the whole genome sequencing of 56 Vibrio cholerae strains obtained using different types of sequencers were used. The software was developed using Java programming language. Cluster analysis and construction of the dendrogram were performed with the author's software using the UPGMA method.Results and discussion. The «instability» of detection the number of SNP in the genome of cholera causative agent was shown. The method of selection of the SNP list for phylogenetic analysis based on the analysis of the primary data of whole genome sequencing (reads), has been developed. The method of using «control genomes» for cluster analysis of whole genome sequencing data has been proposed.Conclusion. The list of 3198 «stable SNP» for phylogenetic analysis has been composed. Genetic affinity between the non-toxigenic strains that contain the tcpA gene (ctxAB–tcpA+) and preCTX-strains of V. cholerae was shown.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sung Yong Park ◽  
Gina Faraci ◽  
Pamela M. Ward ◽  
Jane F. Emerson ◽  
Ha Youn Lee

AbstractCOVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients’ Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.


Sign in / Sign up

Export Citation Format

Share Document