scholarly journals Evaluation of whole-genome sequence data analysis approaches for short- and long-read sequencing of Mycobacterium tuberculosis

2021 ◽  
Vol 7 (11) ◽  
Author(s):  
Nilay Peker ◽  
Leonard Schuele ◽  
Nienke Kok ◽  
Miguel Terrazos ◽  
Stefan M. Neuenschwander ◽  
...  

Whole-genome sequencing (WGS) of Mycobacterium tuberculosis (MTB) isolates can be used to get an accurate diagnosis, to guide clinical decision making, to control tuberculosis (TB) and for outbreak investigations. We evaluated the performance of long-read (LR) and/or short-read (SR) sequencing for anti-TB drug-resistance prediction using the TBProfiler and Mykrobe tools, the fraction of genome recovery, assembly accuracies and the robustness of two typing approaches based on core-genome SNP (cgSNP) typing and core-genome multi-locus sequence typing (cgMLST). Most of the discrepancies between phenotypic drug-susceptibility testing (DST) and drug-resistance prediction were observed for the first-line drugs rifampicin, isoniazid, pyrazinamide and ethambutol, mainly with LR sequence data. Resistance prediction to second-line drugs made by both TBProfiler and Mykrobe tools with SR- and LR-sequence data were in complete agreement with phenotypic DST except for one isolate. The SR assemblies were more accurate than the LR assemblies, having significantly (P<0.05) fewer indels and mismatches per 100 kbp. However, the hybrid and LR assemblies had slightly higher genome fractions. For LR assemblies, Canu followed by Racon, and Medaka polishing was the most accurate approach. The cgSNP approach, based on either reads or assemblies, was more robust than the cgMLST approach, especially for LR sequence data. In conclusion, anti-TB drug-resistance prediction, particularly with only LR sequence data, remains challenging, especially for first-line drugs. In addition, SR assemblies appear more accurate than LR ones, and reproducible phylogeny can be achieved using cgSNP approaches.

Antibiotics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 548 ◽  
Author(s):  
Jorge Cervantes ◽  
Noemí Yokobori ◽  
Bo-Young Hong

Clinical management of tuberculosis (TB) in endemic areas is often challenged by a lack of resources including laboratories for Mycobacterium tuberculosis (Mtb) culture. Traditional phenotypic drug susceptibility testing for Mtb is costly and time consuming, while PCR-based methods are limited to selected target loci. We herein utilized a portable, USB-powered, long-read sequencing instrument (MinION), to investigate Mtb genomic DNA from clinical isolates to determine the presence of anti-TB drug-resistance conferring mutations. Data analysis platform EPI2ME and antibiotic-resistance analysis using the real time ARMA workflow, identified Mtb species as well as extensive resistance gene profiles. The approach was highly sensitive, being able to detect almost all described drug resistance conferring mutations based on previous whole genome sequencing analysis. Our findings are supportive of the practical use of this system as a suitable method for the detection of antimicrobial resistance genes, and effective in providing Mtb genomic information. Future improvements in the error rate through statistical analysis, drug resistance prediction algorithms and reference databases would make this a platform suited for the clinical setting. The small size, relatively inexpensive cost of the device, as well as its rapid and simple library preparation protocol and analysis, make it an attractive option for settings with limited laboratory infrastructure.


2021 ◽  
Author(s):  
Karina Pikalyova ◽  
Alexey Orlov ◽  
Arkadii Lin ◽  
Olga Tarasova ◽  
Gilles Marcou ◽  
...  

Motivation: Human immunodeficiency virus (HIV) drug resistance is a global healthcare issue. The emergence of drug resistance demands treatment adaptation. Computational methods predicting the drug resistance profile from genomic data of HIV isolates are advantageous for monitoring drug resistance in patients. Yet, the currently existing computational methods for drug resistance prediction are either not suitable for complex mutational patterns in emerging HIV strains or lack interpretability of prediction results which is of paramount importance in clinical practice. Hence, to overcome these limitations, new approaches for the HIV drug resistance prediction combining high accuracy and interpretability are required. Results: In this work, a new methodology for the analysis of protein sequence data based on the application of generative topographic mapping was developed and applied for HIV drug resistance profiling. It allowed achieving high accuracy of resistance predictions and intuitive interpretation of prediction results. The developed approach was successfully applied for the prediction of HIV re-sistance towards protease, reverse-transcriptase and integrase inhibitors and in-depth analysis of HIV resistance-inducing mutation patterns. Hence, it can serve as an efficient and interpretable tool to suggest optimal treatment regimens. Availability: https://github.com/karinapikalyova/ISIDASeq


Viruses ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 560
Author(s):  
Margaret C. Steiner ◽  
Keylie M. Gibson ◽  
Keith A. Crandall

The fast replication rate and lack of repair mechanisms of human immunodeficiency virus (HIV) contribute to its high mutation frequency, with some mutations resulting in the evolution of resistance to antiretroviral therapies (ART). As such, studying HIV drug resistance allows for real-time evaluation of evolutionary mechanisms. Characterizing the biological process of drug resistance is also critically important for sustained effectiveness of ART. Investigating the link between “black box” deep learning methods applied to this problem and evolutionary principles governing drug resistance has been overlooked to date. Here, we utilized publicly available HIV-1 sequence data and drug resistance assay results for 18 ART drugs to evaluate the performance of three architectures (multilayer perceptron, bidirectional recurrent neural network, and convolutional neural network) for drug resistance prediction, jointly with biological analysis. We identified convolutional neural networks as the best performing architecture and displayed a correspondence between the importance of biologically relevant features in the classifier and overall performance. Our results suggest that the high classification performance of deep learning models is indeed dependent on drug resistance mutations (DRMs). These models heavily weighted several features that are not known DRM locations, indicating the utility of model interpretability to address causal relationships in viral genotype-phenotype data.


2021 ◽  
Vol 7 (11) ◽  
Author(s):  
Tim H. Heupink ◽  
Lennert Verboven ◽  
Robin M. Warren ◽  
Annelies Van Rie

Improved understanding of the genomic variants that allow Mycobacterium tuberculosis (Mtb) to acquire drug resistance, or tolerance, and increase its virulence are important factors in controlling the current tuberculosis epidemic. Current approaches to Mtb sequencing, however, cannot reveal Mtb’s full genomic diversity due to the strict requirements of low contamination levels, high Mtb sequence coverage and elimination of complex regions. We have developed the XBS (compleX Bacterial Samples) bioinformatics pipeline, which implements joint calling and machine-learning-based variant filtering tools to specifically improve variant detection in the important Mtb samples that do not meet these criteria, such as those from unbiased sputum samples. Using novel simulated datasets, which permit exact accuracy verification, XBS was compared to the UVP and MTBseq pipelines. Accuracy statistics showed that all three pipelines performed equally well for sequence data that resemble those obtained from culture isolates of high depth of coverage and low-level contamination. In the complex genomic regions, however, XBS accurately identified 9.0 % more SNPs and 8.1 % more single nucleotide insertions and deletions than the WHO-endorsed unified analysis variant pipeline. XBS also had superior accuracy for sequence data that resemble those obtained directly from sputum samples, where depth of coverage is typically very low and contamination levels are high. XBS was the only pipeline not affected by low depth of coverage (5–10×), type of contamination and excessive contamination levels (>50 %). Simulation results were confirmed using whole genome sequencing (WGS) data from clinical samples, confirming the superior performance of XBS with a higher sensitivity (98.8%) when analysing culture isolates and identification of 13.9 % more variable sites in WGS data from sputum samples as compared to MTBseq, without evidence for false positive variants when rRNA regions were excluded. The XBS pipeline facilitates sequencing of less-than-perfect Mtb samples. These advances will benefit future clinical applications of Mtb sequencing, especially WGS directly from clinical specimens, thereby avoiding in vitro biases and making many more samples available for drug resistance and other genomic analyses. The additional genetic resolution and increased sample success rate will improve genome-wide association studies and sequence-based transmission studies.


2020 ◽  
Vol 57 (1) ◽  
pp. 2001796 ◽  
Author(s):  
Silke Feuerriegel ◽  
Thomas A. Kohl ◽  
Christian Utpatel ◽  
Sönke Andres ◽  
Florian P. Maurer ◽  
...  

2020 ◽  
Author(s):  
Gargi Datta ◽  
Nabeeh A Hasan ◽  
Michael Strong ◽  
Sonia M Leach

Background: The increasing incidence of drug resistance in tuberculosis and other infectious diseases poses an escalating cause for concern, emphasizing the urgent need to devise robust computational and molecular methods identify drug resistant strains. Although machine learning-based approaches using whole-genome sequence data can facilitate the inference of drug resistance, current implementations do not optimally take advantage of information in public databases and are not robust for small sample sizes and mixed attribute types. Results: In this paper we introduce the Composite MetaDistance method, an approach for feature selection and classification of high-dimensional, unbalanced datasets with mixed attribute features from various data sources. We introduce a mixed-attribute, multi-view distance function to calculate distances between samples, with optimal handling of nominal features and different feature views. We also introduce a novel feature set for drug resistance prediction in Mycobacterium tuberculosis, using data from diverse sources. We compare the performance of Composite MetaDistance to multiple machine learning algorithms for Mycobacterium tuberculosis drug resistance prediction for three drugs. Composite MetaDistance consistently outperforms existing algorithms for small sample training sets, and performs as well as other algorithms for training sets with larger sample sizes. Conclusion: The feature set formulation introduced in this paper is utilizes mutational and publicly available information for each gene, and is much richer than ever devised previously. The prediction algorithm, Composite MetaDistance, is sample size agnostic and robust especially given small sample sizes. Proper handling of nominal features improves performance even with a very small number of nominal features. We expect Composite MetaDistance to be even more robust for datasets with a higher percentage of nominal features. The algorithm is application independent and can be used for any mixed attribute dataset.


Data in Brief ◽  
2020 ◽  
Vol 33 ◽  
pp. 106416
Author(s):  
Asset Daniyarov ◽  
Askhat Molkenov ◽  
Saule Rakhimova ◽  
Ainur Akhmetova ◽  
Zhannur Nurkina ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document