scholarly journals IPD 2.0: To derive insights from an evolving SARS-CoV-2 genome

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Sanket Desai ◽  
Aishwarya Rane ◽  
Asim Joshi ◽  
Amit Dutt

Abstract Background Rapid analysis of SARS-CoV-2 genomic data plays a crucial role in surveillance and adoption of measures in controlling spread of Covid-19. Fast, inclusive and adaptive methods are required for the heterogenous SARS-CoV-2 sequence data generated at an unprecedented rate. Results We present an updated version of the SARS-CoV-2 analysis module of our automated computational pipeline, Infectious Pathogen Detector (IPD) 2.0, to perform genomic analysis to understand the variability and dynamics of the virus. It adopts the recent clade nomenclature and demonstrates the clade prediction accuracy of 92.8%. IPD 2.0 also contains a SARS-CoV-2 updater module, allowing automatic upgrading of the variant database using genome sequences from GISAID. As a proof of principle, analyzing 208,911 SARS-CoV-2 genome sequences, we generate an extensive database of 2.58 million sample-wise variants. A comparative account of lineage-specific mutations in the newer SARS-CoV-2 strains emerging in the UK, South Africa and Brazil and data reported from India identify overlapping and lineages specific acquired mutations suggesting a repetitive convergent and adaptive evolution. Conclusions A novel and dynamic feature of the SARS-CoV-2 module of IPD 2.0 makes it a contemporary tool to analyze the diverse and growing genomic strains of the virus and serve as a vital tool to help facilitate rapid genomic surveillance in a population to identify variants involved in breakthrough infections. IPD 2.0 is freely available from http://www.actrec.gov.in/pi-webpages/AmitDutt/IPD/IPD.html and the web-application is available at http://ipd.actrec.gov.in/ipdweb/.

2014 ◽  
Vol 64 (Pt_2) ◽  
pp. 316-324 ◽  
Author(s):  
Jongsik Chun ◽  
Fred A. Rainey

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.


2020 ◽  
Author(s):  
Marco Cacciabue ◽  
Pablo Aguilera ◽  
María Inés Gismondi ◽  
Oscar Taboga

SummaryCovidex is an open-source, alignment-free machine learning subtyping tool for viral species. It is a shiny app that allows a fast and accurate classification in pre-defined clusters for SARS-CoV-2 and FMDV genome sequences. The user can also build its own classification models with the Covidex model generator.AvailabilityCovidex is open-source, cross-platform compatible, and is available under the terms of the GNU General Public License v3 (http://www.gnu.org/licenses/gpl.txt). Covidex is available via SourceForge https://sourceforge.net/projects/covidex or the web application https://cacciabue.shinyapps.io/shiny2/[email protected]; [email protected]


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Takuma Nishimaki ◽  
Keiko Sato

Abstract Background Phylogenetic analysis strongly depends on evolutionary models. Most evolutionary models for estimating genetic differences and phylogenetic relationships do not treat gap sites in the alignment of sequences. Appropriately incorporating evolutionary information of sites containing insertions and deletions into genetic difference measures will be improve the accuracy of phylogenetic estimates. Results We introduced a new measure for estimating genetic differences, and presented P*R*O*P, a web application for performing phylogenetic analysis based on genetic difference considering the effect of gaps. As an example of phylogenetic analysis using P*R*O*P, we used complete p53 amino acid sequences of 31 organisms and illustrated that the genetic differences with and without information on sites containing gaps result in trees with different topologies. Conclusions P*R*O*P is available at https://www.rs.tus.ac.jp/bioinformatics/prop and the user can perform phylogenetic analysis by uploading sequence data on the website. The most distinctive feature of P*R*O*P is its genetic difference that is estimated without eliminating gap sites for alignment sequences, which helps users detect meaningful difference in an evolutionary process. The source code is available in GitHub: https://github.com/TUS-Satolab/PROP.


2017 ◽  
Author(s):  
James Hadfield ◽  
Nicholas J. Croucher ◽  
Richard J Goater ◽  
Khalil Abudahab ◽  
David M Aanensen ◽  
...  

ABSTRACTSummaryFully exploiting the wealth of data in current bacterial population genomics datasets requires synthesising and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.AvailabilityPhandango is a web application freely available for use at https://jameshadfield.github.io/phandango and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at https://github.com/jameshadfield/[email protected], [email protected]


2018 ◽  
Vol 3 ◽  
pp. 118
Author(s):  
Anna Smielewska ◽  
Edward Emmott ◽  
Kyriaki Ranellou ◽  
Ashley Popay ◽  
Ian Goodfellow ◽  
...  

Background:Human parainfluenza viruses type 3 (HPIV3) are a prominent cause of respiratory infection with a significant impact in both pediatric and transplant patient cohorts.  Currently there is a paucity of whole genome sequence data that would allow for detailed epidemiological and phylogenetic analysis of circulating strains in the UK. Although it is known that HPIV3 peaks annually in the UK, to date there are no whole genome sequences of HPIV3 UK strains available. Methods:Clinical strains were obtained from HPIV3 positive respiratory patient samples collected between 2011 and 2015.  These were then amplified using an amplicon based method, sequenced on the Illumina platform and assembled using a new robust bioinformatics pipeline. Phylogenetic analysis was carried out in the context of other epidemiological studies and whole genome sequence data currently available with stringent exclusion of significantly culture-adapted strains of HPIV3.Results:In the current paper we have presented twenty full genome sequences of UK circulating strains of HPIV3 and a detailed phylogenetic analysis thereof.  We have analysed the variability along the HPIV3 genome and identified a short hypervariable region in the non-coding segment between the M (matrix) and F (fusion) genes. The epidemiological classifications obtained by using this region and whole genome data were then compared and found to be identical.Conclusions:The majority of HPIV3 strains were observed at different geographical locations and with a wide temporal spread, reflecting the global distribution of HPIV3. Consistent with previous data, a particular subcluster or strain was not identified as specific to the UK, suggesting that a number of genetically diverse strains circulate at any one time. A small hypervariable region in the HPIV3 genome was identified and it was shown that, in the absence of full genome data, this region could be used for epidemiological surveillance of HPIV3.


2020 ◽  
Author(s):  
C. N’Dira Sanoussi ◽  
Mireia Coscolla ◽  
Boatema Ofori-Anyinam ◽  
Isaac Darko Otchere ◽  
Martin Antonio ◽  
...  

AbstractPathogens of the Mycobacterium tuberculosis complex (MTBC) are considered monomorphic, with little gene content variation between strains. Nevertheless, several genotypic and phenotypic factors separate the different MTBC lineages (L), especially L5 and L6 (traditionally termed Mycobacterium africanum), from each other. However, genome variability and gene content especially of L5 and L6 strains have not been fully explored and may be potentially important for pathobiology and current approaches for genomic analysis of MTBC isolates, including transmission studies.We compared the genomes of 358 L5 clinical isolates (including 3 completed genomes and 355 Illumina WGS (whole genome sequenced) isolates) to the L5 complete genomes and H37Rv, and identified multiple genes differentially present or absent between H37Rv and L5 strains. Additionally, considerable gene content variability was found across L5 strains, including a split in the L5.3 sublineage into L5.3.1 and L5.3.2. These gene content differences had a small knock on effect on transmission cluster estimation, with clustering rates influenced by the selection of reference genome, and with potential over-estimation of recent transmission when using H37Rv as the reference genome.Our data show that the use of H37Rv as reference genome results in missing SNPs in genes unique for L5 strains. This potentially leads to an underestimation of the diversity present in the genome of L5 strains and in turn affects the transmission clustering rates. As such, a full capture of the gene diversity, especially for high resolution outbreak analysis, requires a variation of the single H37Rv-centric reference genome mapping approach currently used in most WGS data analysis pipelines. Moreover, the high within-lineage gene content variability suggests that the pan-genome of M. tuberculosis is at least several kilobases larger than previously thought, implying a concatenated or reference-free genome assembly (de novo) approach may be needed for particular questions.Data summarySequence data for the Illumina dataset are available at European Genome-phenome Archive (EGA; https://www.ebi.ac.uk/ega/) under the study accession numbers PRJEB38317 and PRJEB38656. Individual runs accession numbers are indicated in Table S8.PacBio raw reads for the L5 Benin genome are available on the ENA accession SAME3170744. The assembled L5 Benin genome is available on NCBI with accession PRJNA641267. To ensure naming conventions of the genes in the three L5 genomes can be followed, we have uploaded these annotated GFF files to figshare at https://doi.org/10.6084/m9.figshare.12911849.v1.Custom python scripts used in this analysis can be found at https://github.com/conmeehan/pathophy.


2018 ◽  
Vol 3 ◽  
pp. 118 ◽  
Author(s):  
Anna Smielewska ◽  
Edward Emmott ◽  
Kyriaki Ranellou ◽  
Ashley Popay ◽  
Ian Goodfellow ◽  
...  

Background:Human parainfluenza viruses type 3 (HPIV3) are a prominent cause of respiratory infection with a significant impact in both pediatric and transplant patient cohorts.  Currently there is a paucity of whole genome sequence data that would allow for detailed epidemiological and phylogenetic analysis of circulating strains in the UK. Although it is known that HPIV3 peaks annually in the UK, to date there are no whole genome sequences of HPIV3 UK strains available. Methods:Clinical strains were obtained from HPIV3 positive respiratory patient samples collected between 2011 and 2015.  These were then amplified using an amplicon based method, sequenced on the Illumina platform and assembled using a new robust bioinformatics pipeline. Phylogenetic analysis was carried out in the context of other epidemiological studies and whole genome sequence data currently available with stringent exclusion of significantly culture-adapted strains of HPIV3.Results:In the current paper we have presented twenty full genome sequences of UK circulating strains of HPIV3 and a detailed phylogenetic analysis thereof.  We have analysed the variability along the HPIV3 genome and identified a short hypervariable region in the non-coding segment between the M (matrix) and F (fusion) genes. The epidemiological classifications obtained by using this region and whole genome data were then compared and found to be identical.Conclusions:The majority of HPIV3 strains were observed at different geographical locations and with a wide temporal spread, reflecting the global distribution of HPIV3. Consistent with previous data, a particular subcluster or strain was not identified as specific to the UK, suggesting that a number of genetically diverse strains circulate at any one time. A small hypervariable region in the HPIV3 genome was identified and it was shown that, in the absence of full genome data, this region could be used for epidemiological surveillance of HPIV3.


Author(s):  
Qian Tian ◽  
Jiacheng Chuan ◽  
Xianchao Sun ◽  
Aiguo Zhou ◽  
Li Wang ◽  
...  

Clavibacter michiganensis is a Gram-stain-positive bacterium with eight subspecies, five of which have been redefined as different species on the basis of their genome sequence data. On the basis of the results of phylogenetic analysis of dnaA gene sequences, strains of members of the genus Clavibacter isolated from barley have been grouped in a separate clade from other species and subspecies of the genus Clavibacter . In this study, the biochemical, physiological, fatty acids and genetic characteristics of strains DM1T and DM3, which represented the barley isolates, were examined. On the basis of results from multi-locus sequence typing and other biochemical and physiological features, including colony colour, carbon source utilisation and enzyme activities, DM1T and DM3 are categorically differentiated from the aforementioned eight species and subspecies of the genus Clavibacter . Moreover, the results of genomic analysis reveal that the DNA G+C contents of DM1T and DM3 are 73.7 and 73.5 %, respectively, and the average nucleotide identity (ANI) values between DM1T and DM3 and other species and subspecies range from 90.4 to 92.0 %. The ANI value between DM1T and DM3 is 98.0 %. These results indicate that DM1T and DM3 are distinct from other known species and subspecies of the genus Clavibacter . Therefore, we propose a novel species, C. zhangzhiyongii, with DM1T (=CFCC 16553 T=LMG 31970T) as the type strain.


Author(s):  
Sanket Desai ◽  
Aishwarya Rane ◽  
Asim Joshi ◽  
Amit Dutt

AbstractWe present an updated version of our automated computational pipeline, Infection Pathogen Detector IPD 2.0 with a SARS-CoV-2 module, to perform genomic analysis to understand the pathogenesis and virulence of the virus. Analysing the currently available 208911 SARS-CoV2 genome sequences (as accessed on 28 Dec 2020), we generate an extensive database of sample- wise variants and clade annotation, which forms the core of the SARS-CoV-2 analysis module of the analysis pipeline. A comparative account of lineage-specific mutations in the newer SARS-CoV-2 strains emerging in the UK, South Africa and Brazil along with data reported from India identify overlapping and lineages specific acquired mutations suggesting a repetitive convergent and adaptive evolution. Thus, the persistence of pandemic may lead to the emergence of newer regional strains with improved fitness. IPD 2.0 also adopts the recent dynamic clade nomenclature and shows improvement in accuracy of clade assignment, processing time and portability, to its predecessor and thus could be a vital tool to help facilitate genomic surveillance in a population to identify variants involved in breakthrough infections.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1405.1-1406
Author(s):  
F. Morton ◽  
J. Nijjar ◽  
C. Goodyear ◽  
D. Porter

Background:The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) individually and collaboratively have produced/recommended diagnostic classification, response and functional status criteria for a range of different rheumatic diseases. While there are a number of different resources available for performing these calculations individually, currently there are no tools available that we are aware of to easily calculate these values for whole patient cohorts.Objectives:To develop a new software tool, which will enable both data analysts and also researchers and clinicians without programming skills to calculate ACR/EULAR related measures for a number of different rheumatic diseases.Methods:Criteria that had been developed by ACR and/or EULAR that had been approved for the diagnostic classification, measurement of treatment response and functional status in patients with rheumatoid arthritis were identified. Methods were created using the R programming language to allow the calculation of these criteria, which were incorporated into an R package. Additionally, an R/Shiny web application was developed to enable the calculations to be performed via a web browser using data presented as CSV or Microsoft Excel files.Results:acreular is a freely available, open source R package (downloadable fromhttps://github.com/fragla/acreular) that facilitates the calculation of ACR/EULAR related RA measures for whole patient cohorts. Measures, such as the ACR/EULAR (2010) RA classification criteria, can be determined using precalculated values for each component (small/large joint counts, duration in days, normal/abnormal acute-phase reactants, negative/low/high serology classification) or by providing “raw” data (small/large joint counts, onset/assessment dates, ESR/CRP and CCP/RF laboratory values). Other measures, including EULAR response and ACR20/50/70 response, can also be calculated by providing the required information. The accompanying web application is included as part of the R package but is also externally hosted athttps://fragla.shinyapps.io/shiny-acreular. This enables researchers and clinicians without any programming skills to easily calculate these measures by uploading either a Microsoft Excel or CSV file containing their data. Furthermore, the web application allows the incorporation of additional study covariates, enabling the automatic calculation of multigroup comparative statistics and the visualisation of the data through a number of different plots, both of which can be downloaded.Figure 1.The Data tab following the upload of data. Criteria are calculated by the selecting the appropriate checkbox.Figure 2.A density plot of DAS28 scores grouped by ACR/EULAR 2010 RA classification. Statistical analysis has been performed and shows a significant difference in DAS28 score between the two groups.Conclusion:The acreular R package facilitates the easy calculation of ACR/EULAR RA related disease measures for whole patient cohorts. Calculations can be performed either from within R or by using the accompanying web application, which also enables the graphical visualisation of data and the calculation of comparative statistics. We plan to further develop the package by adding additional RA related criteria and by adding ACR/EULAR related measures for other rheumatic disorders.Disclosure of Interests:Fraser Morton: None declared, Jagtar Nijjar Shareholder of: GlaxoSmithKline plc, Consultant of: Janssen Pharmaceuticals UK, Employee of: GlaxoSmithKline plc, Paid instructor for: Janssen Pharmaceuticals UK, Speakers bureau: Janssen Pharmaceuticals UK, AbbVie, Carl Goodyear: None declared, Duncan Porter: None declared


Sign in / Sign up

Export Citation Format

Share Document