scholarly journals SARS-CoV-2 surveillance in Italy through phylogenomic inferences based on Hamming distances derived from pan-SNPs, -MNPs and -InDels

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Adriano Di Pasquale ◽  
Nicolas Radomski ◽  
Iolanda Mangone ◽  
Paolo Calistri ◽  
Alessio Lorusso ◽  
...  

Abstract Background Faced with the ongoing global pandemic of coronavirus disease, the ‘National Reference Centre for Whole Genome Sequencing of microbial pathogens: database and bioinformatic analysis’ (GENPAT) formally established at the ‘Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise’ (IZSAM) in Teramo (Italy) is in charge of the SARS-CoV-2 surveillance at the genomic scale. In a context of SARS-CoV-2 surveillance requiring correct and fast assessment of epidemiological clusters from substantial amount of samples, the present study proposes an analytical workflow for identifying accurately the PANGO lineages of SARS-CoV-2 samples and building of discriminant minimum spanning trees (MST) bypassing the usual time consuming phylogenomic inferences based on multiple sequence alignment (MSA) and substitution model. Results GENPAT constituted two collections of SARS-CoV-2 samples. The first collection consisted of SARS-CoV-2 positive swabs collected by IZSAM from the Abruzzo region (Italy), then sequenced by next generation sequencing (NGS) and analyzed in GENPAT (n = 1592), while the second collection included samples from several Italian provinces and retrieved from the reference Global Initiative on Sharing All Influenza Data (GISAID) (n = 17,201). The main results of the present work showed that (i) GENPAT and GISAID detected the same PANGO lineages, (ii) the PANGO lineages B.1.177 (i.e. historical in Italy) and B.1.1.7 (i.e. ‘UK variant’) are major concerns today in several Italian provinces, and the new MST-based method (iii) clusters most of the PANGO lineages together, (iv) with a higher dicriminatory power than PANGO lineages, (v) and faster that the usual phylogenomic methods based on MSA and substitution model. Conclusions The genome sequencing efforts of Italian provinces, combined with a structured national system of NGS data management, provided support for surveillance SARS-CoV-2 in Italy. We propose to build phylogenomic trees of SARS-CoV-2 variants through an accurate, discriminant and fast MST-based method avoiding the typical time consuming steps related to MSA and substitution model-based phylogenomic inference.

2021 ◽  
Author(s):  
Adriano Di Pasquale ◽  
Nicolas Radomski ◽  
Iolanda Mangone ◽  
Paolo Calistri ◽  
Alessio Lorusso ◽  
...  

BACKGROUND: Faced to the ongoing global pandemic of coronavirus disease, the 'National Reference Centre for Whole Genome Sequencing of microbial pathogens: database and bioinformatic analysis' (GENPAT) formally established at the 'Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise' (IZSAM) in Teramo (Italy) supports the genomic surveillance of the SARS-CoV-2. In a context of SARS-CoV-2 surveillance needed proper and fast assessment of epidemiological clusters from large amount of samples, the present manuscript proposes a workflow for identifying accurately the PANGOLIN lineages of SARS-CoV-2 samples and building of discriminant minimum spanning trees (MST) bypassing the usual time consuming phylogenomic inferences based on multiple sequence alignment (MSA) and substitution model. RESULTS: GENPAT constituted two collections of SARS-CoV-2 samples. The samples of the first collection were isolated by IZSAM in the Abruzzo region (Italy), then shotgun sequenced and analyzed in GENPAT (n = 1 592), while those of the second collection were isolated from several Italian provinces and retrieved from the reference Global Initiative on Sharing All Influenza Data (GISAID) (n = 17 201). The main outcomes of the present study showed that (i) GENPAT and GISAID identified identical PANGOLIN lineages, (ii) the PANGOLIN lineages B.1.177 (i.e. historical in Italy) and B.1.1.7 (i.e. 'UK variant') are major concerns today in several Italian provinces, and the new MST-based method (iii) clusters most of the PANGOLIN lineages together, (iv) with a higher dicriminatory power than PANGOLIN, (v) and faster that the usual phylogenomic methods based on MSA and substitution model. CONCLUSIONS: The shotgun sequencing efforts of Italian provinces, combined to a structured national system of metagenomics data management, provided support for surveillance SARS-CoV-2 in Italy. We recommend to infer phylogenomic relationships of SARS-CoV-2 variants through an accurate, discriminant and fast MST-based method bypassing the usual time consuming steps related to MSA and substitution model-based phylogenomic inference.


Author(s):  
Fatma E. Taşbent ◽  
Mehmet Özdemir ◽  
Özge M. Akcan ◽  
Esma K. Kurt

Abstract Objective Genome sequencing is useful for following the change in mutation and variants in viral agent during pandemics. In this study, we performed next-generation sequencing of severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) complete genomes on pediatric patients. Methods Six pediatric patients aged 0 to 18 years who were positive for SARS-CoV-2 by reverse transcription polymerase chain reaction were included in this study. SARS-CoV-2 genome sequencing was performed using Oxford Nanopore Technologies MinION, following the ARTIC Network protocols. Sequencing data were obtained using the FASTQ program and quality assessment was evaluated. The sequence information of all samples was uploaded to the Global Initiative on Sharing All Influenza Data (GISAID) database. Genome, variant, clade, and phylogenetic tree analyses were performed with bioinformatic analysis. Results Two of these six samples were at 20A, two were at 20B, and two were at 19A in the nextstrain clade. According to Pango lineages, B.1.36, B.1.218, B.1, and B.1.260 lineages were detected. A total of 84 mutations were observed in all samples. None of the variants were classified as variants of concern (VOC) nor variants of interest (VOI) according to the Pango database. Conclusion This study is the first comprehensively sequence analysis registered in the GISAID database reported from the Konya region in Turkey. Similar studies will be informative to track changes in the virus genome, obtain epidemiological data, guide studies on diagnosis and treatment, and evaluate vaccine efficacy.


2018 ◽  
Vol 56 (8) ◽  
Author(s):  
Cath Arnold ◽  
Kirstin Edwards ◽  
Meeta Desai ◽  
Steve Platt ◽  
Jonathan Green ◽  
...  

ABSTRACT Routine use of whole-genome analysis for infectious diseases can be used to enlighten various scenarios pertaining to public health, including identification of microbial pathogens, relating individual cases to an outbreak of infectious disease, establishing an association between an outbreak of food poisoning and a specific food vehicle, inferring drug susceptibility, source tracing of contaminants, and study of variations in the genome that affect pathogenicity/virulence. We describe the setup, validation, and ongoing verification of a centralized whole-genome-sequencing (WGS) laboratory to carry out sequencing for these public health functions for the National Infection Services, Public Health England, in the United Kingdom. The performance characteristics and quality control metrics measured during validation and verification of the entire end-to-end process (accuracy, precision, reproducibility, and repeatability) are described and include information regarding the automated pass and release of data to service users without intervention.


2016 ◽  
Vol 72 (9) ◽  
pp. 1017-1025 ◽  
Author(s):  
Pavel Mikulecký ◽  
Jirí Zahradník ◽  
Petr Kolenko ◽  
Jiří Černý ◽  
Tatsiana Charnavets ◽  
...  

Interferon-γ receptor 2 is a cell-surface receptor that is required for interferon-γ signalling and therefore plays a critical immunoregulatory role in innate and adaptive immunity against viral and also bacterial and protozoal infections. A crystal structure of the extracellular part of human interferon-γ receptor 2 (IFNγR2) was solved by molecular replacement at 1.8 Å resolution. Similar to other class 2 receptors, IFNγR2 has two fibronectin type III domains. The characteristic structural features of IFNγR2 are concentrated in its N-terminal domain: an extensive π–cation motif of stacked residues KWRWRH, a NAG–W–NAG sandwich (where NAG stands forN-acetyl-D-glucosamine) and finally a helix formed by residues 78–85, which is unique among class 2 receptors. Mass spectrometry and mutational analyses showed the importance of N-linked glycosylation to the stability of the protein and confirmed the presence of two disulfide bonds. Structure-based bioinformatic analysis revealed independent evolutionary behaviour of both receptor domains and, together with multiple sequence alignment, identified putative binding sites for interferon-γ and receptor 1, the ligands of IFNγR2.


10.2196/22299 ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. e22299
Author(s):  
Boxiang Liu ◽  
Kaibo Liu ◽  
He Zhang ◽  
Liang Zhang ◽  
Yuchen Bian ◽  
...  

Background COVID-19 became a global pandemic not long after its identification in late 2019. The genomes of SARS-CoV-2 are being rapidly sequenced and shared on public repositories. To keep up with these updates, scientists need to frequently refresh and reclean data sets, which is an ad hoc and labor-intensive process. Further, scientists with limited bioinformatics or programming knowledge may find it difficult to analyze SARS-CoV-2 genomes. Objective To address these challenges, we developed CoV-Seq, an integrated web server that enables simple and rapid analysis of SARS-CoV-2 genomes. Methods CoV-Seq is implemented in Python and JavaScript. The web server and source code URLs are provided in this article. Results Given a new sequence, CoV-Seq automatically predicts gene boundaries and identifies genetic variants, which are displayed in an interactive genome visualizer and are downloadable for further analysis. A command-line interface is available for high-throughput processing. In addition, we aggregated all publicly available SARS-CoV-2 sequences from the Global Initiative on Sharing Avian Influenza Data (GISAID), National Center for Biotechnology Information (NCBI), European Nucleotide Archive (ENA), and China National GeneBank (CNGB), and extracted genetic variants from these sequences for download and downstream analysis. The CoV-Seq database is updated weekly. Conclusions We have developed CoV-Seq, an integrated web service for fast and easy analysis of custom SARS-CoV-2 sequences. The web server provides an interactive module for the analysis of custom sequences and a weekly updated database of genetic variants of all publicly accessible SARS-CoV-2 sequences. We believe CoV-Seq will help improve our understanding of the genetic underpinnings of COVID-19.


2021 ◽  
Vol 12 ◽  
Author(s):  
Benjamin Morga ◽  
Maude Jacquot ◽  
Camille Pelletier ◽  
Germain Chevignon ◽  
Lionel Dégremont ◽  
...  

The mechanisms underlying virus emergence are rarely well understood, making the appearance of outbreaks largely unpredictable. This is particularly true for pathogens with low per-site mutation rates, such as DNA viruses, that do not exhibit a large amount of evolutionary change among genetic sequences sampled at different time points. However, whole-genome sequencing can reveal the accumulation of novel genetic variation between samples, promising to render most, if not all, microbial pathogens measurably evolving and suitable for analytical techniques derived from population genetic theory. Here, we aim to assess the measurability of evolution on epidemiological time scales of the Ostreid herpesvirus 1 (OsHV-1), a double stranded DNA virus of which a new variant, OsHV-1 μVar, emerged in France in 2008, spreading across Europe and causing dramatic economic and ecological damage. We performed phylogenetic analyses of heterochronous (n = 21) OsHV-1 genomes sampled worldwide. Results show sufficient temporal signal in the viral sequences to proceed with phylogenetic molecular clock analyses and they indicate that the genetic diversity seen in these OsHV-1 isolates has arisen within the past three decades. OsHV-1 samples from France and New Zealand did not cluster together suggesting a spatial structuration of the viral populations. The genome-wide study of simple and complex polymorphisms shows that specific genomic regions are deleted in several isolates or accumulate a high number of substitutions. These contrasting and non-random patterns of polymorphism suggest that some genomic regions are affected by strong selective pressures. Interestingly, we also found variant genotypes within all infected individuals. Altogether, these results provide baseline evidence that whole genome sequencing could be used to study population dynamic processes of OsHV-1, and more broadly herpesviruses.


2020 ◽  
Author(s):  
Boxiang Liu ◽  
Kaibo Liu ◽  
He Zhang ◽  
Liang Zhang ◽  
Yuchen Bian ◽  
...  

BACKGROUND COVID-19 became a global pandemic not long after its identification in late 2019. The genomes of SARS-CoV-2 are being rapidly sequenced and shared on public repositories. To keep up with these updates, scientists need to frequently refresh and reclean data sets, which is an ad hoc and labor-intensive process. Further, scientists with limited bioinformatics or programming knowledge may find it difficult to analyze SARS-CoV-2 genomes. OBJECTIVE To address these challenges, we developed CoV-Seq, an integrated web server that enables simple and rapid analysis of SARS-CoV-2 genomes. METHODS CoV-Seq is implemented in Python and JavaScript. The web server and source code URLs are provided in this article. RESULTS Given a new sequence, CoV-Seq automatically predicts gene boundaries and identifies genetic variants, which are displayed in an interactive genome visualizer and are downloadable for further analysis. A command-line interface is available for high-throughput processing. In addition, we aggregated all publicly available SARS-CoV-2 sequences from the Global Initiative on Sharing Avian Influenza Data (GISAID), National Center for Biotechnology Information (NCBI), European Nucleotide Archive (ENA), and China National GeneBank (CNGB), and extracted genetic variants from these sequences for download and downstream analysis. The CoV-Seq database is updated weekly. CONCLUSIONS We have developed CoV-Seq, an integrated web service for fast and easy analysis of custom SARS-CoV-2 sequences. The web server provides an interactive module for the analysis of custom sequences and a weekly updated database of genetic variants of all publicly accessible SARS-CoV-2 sequences. We believe CoV-Seq will help improve our understanding of the genetic underpinnings of COVID-19.


2017 ◽  
Vol 114 (11) ◽  
pp. E2077-E2085 ◽  
Author(s):  
Zasha Weinberg ◽  
James W. Nelson ◽  
Christina E. Lünse ◽  
Madeline E. Sherlock ◽  
Ronald R. Breaker

Riboswitches are RNAs that form complex, folded structures that selectively bind small molecules or ions. As with certain groups of protein enzymes and receptors, some riboswitch classes have evolved to change their ligand specificity. We developed a procedure to systematically analyze known riboswitch classes to find additional variants that have altered their ligand specificity. This approach uses multiple-sequence alignments, atomic-resolution structural information, and riboswitch gene associations. Among the discoveries are unique variants of the guanine riboswitch class that most tightly bind the nucleoside 2′-deoxyguanosine. In addition, we identified variants of the glycine riboswitch class that no longer recognize this amino acid, additional members of a rare flavin mononucleotide (FMN) variant class, and also variants of c-di-GMP-I and -II riboswitches that might recognize different bacterial signaling molecules. These findings further reveal the diverse molecular sensing capabilities of RNA, which highlights the potential for discovering a large number of additional natural riboswitch classes.


mBio ◽  
2017 ◽  
Vol 8 (1) ◽  
Author(s):  
Annie N. Cowell ◽  
Dorothy E. Loy ◽  
Sesh A. Sundararaman ◽  
Hugo Valdivia ◽  
Kathleen Fisch ◽  
...  

ABSTRACT Whole-genome sequencing (WGS) of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA), which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax , the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA) can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP) characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens. IMPORTANCE Malaria is a disease caused by Plasmodium parasites that caused 214 million symptomatic cases and 438,000 deaths in 2015. Plasmodium vivax is the most widely distributed species, causing the majority of malaria infections outside sub-Saharan Africa. Whole-genome sequencing (WGS) of Plasmodium parasites from clinical samples has revealed important insights into the epidemiology and mechanisms of drug resistance of malaria. However, WGS of P. vivax is challenging due to low parasite levels in humans and the lack of a routine system to culture the parasites. Selective whole-genome amplification (SWGA) preferentially amplifies the genomes of pathogens from mixtures of target and host gDNA. Here, we demonstrate that SWGA is a simple, robust method that can be used to enrich P. vivax genomic DNA (gDNA) from unprocessed human blood samples and dried blood spots for cost-effective, high-quality WGS.


2013 ◽  
Vol 10 (4) ◽  
pp. 1510-1521
Author(s):  
Lalitha Saroja Thota ◽  
Allam Appa Rao

The advancements in the field of information technology are moving ahead in the discipline of medicine empowering the researchers with superior tools. By taking the advantage of Information Technology, today's researcher successfully navigate the flood of data and many diabetic complications can be overcome. Biomarker plays very major role in disease detection at early stages of its stages and also helpful in knowing the state of treatment and how body is acting or responding to the medication. The dramatic rise in obesity-associated diabetes resulted in an alarming increase in the incidence and prevalence of obesity an important complication of diabetes. The twin epidemic of diabetes and obesity pose daunting challenges worldwide. Differences among individuals in their susceptibility to both these conditions probably reflect their genetic constitutions. Predicting obesity associated diabetes is both useful and important because the number of obese patients is increasing while its main cause cannot yet be defined. Bioinformatics, a truly multidisciplinary science, aims to bring the benefits of computer technologies to bear in understanding the biology of life itself. The dramatic improvements in genomic and bioinformatic resources are accelerating the pace of gene discovery for many medical diseases. It is tempting to speculate the key susceptible genes/proteins biomarker that bridges diabetes mellitus and obesity. The emergence of post-genomic technologies has led to the development of strategies aimed at identifying specific and sensitive biomarkers from the thousands of molecules present in a tissue or biological fluid. In this regard, we evaluated the role of several genes/proteins that are believed to be involved in the evolution of obesity associated diabetes by employing a sequence mining technique, multiple sequence alignment using ClustalW tool and constructed a phylogram tree using functional protein sequences extracted from NCBI. Phylogram was constructed using Neighbor-Joining Algorithm a bioinformatic tool. Our bioinformatic analysis reports a biomarker, resistin gene as ominous link with obesity associated diabetes. This bioinformatic study will be useful for future studies towards therapeutic inventions of obesity associated type 2 diabetes.


Sign in / Sign up

Export Citation Format

Share Document