Bridging the TB data gap: in silico extraction of rifampicin-resistant tuberculosis diagnostic test results from whole genome sequence data

PeerJ ◽

10.7717/peerj.7564 ◽

2019 ◽

Vol 7 ◽

pp. e7564

Author(s):

Kamela C. S. Ng ◽

Jean Claude S. Ngabonziza ◽

Pauline Lempens ◽

Bouke C. de Jong ◽

Frank van Leth ◽

...

Keyword(s):

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Data Generation ◽

Continuous Analysis ◽

Middle Income ◽

Tb Control ◽

Sequencing Technologies ◽

Surveillance Programs ◽

Low And Middle Income

Background Mycobacterium tuberculosis rapid diagnostic tests (RDTs) are widely employed in routine laboratories and national surveys for detection of rifampicin-resistant (RR)-TB. However, as next-generation sequencing technologies have become more commonplace in research and surveillance programs, RDTs are being increasingly complemented by whole genome sequencing (WGS). While comparison between RDTs is difficult, all RDT results can be derived from WGS data. This can facilitate continuous analysis of RR-TB burden regardless of the data generation technology employed. By converting WGS to RDT results, we enable comparison of data with different formats and sources particularly for low- and middle-income high TB-burden countries that employ different diagnostic algorithms for drug resistance surveys. This allows national TB control programs (NTPs) and epidemiologists to utilize all available data in the setting for improved RR-TB surveillance. Methods We developed the Python-based MycTB Genome to Test (MTBGT) tool that transforms WGS-derived data into laboratory-validated results of the primary RDTs—Xpert MTB/RIF, XpertMTB/RIF Ultra, GenoType MDRTBplus v2.0, and GenoscholarNTM+MDRTB II. The tool was validated through RDT results of RR-TB strains with diverse resistance patterns and geographic origins and applied on routine-derived WGS data. Results The MTBGT tool correctly transformed the single nucleotide polymorphism (SNP) data into the RDT results and generated tabulated frequencies of the RDT probes as well as rifampicin-susceptible cases. The tool supplemented the RDT probe reactions output with the RR-conferring mutation based on identified SNPs. The MTBGT tool facilitated continuous analysis of RR-TB and Xpert probe reactions from different platforms and collection periods in Rwanda. Conclusion Overall, the MTBGT tool allows low- and middle-income countries to make sense of the increasingly generated WGS in light of the readily available RDT results, and assess whether currently implemented RDTs adequately detect RR-TB in their setting. With its feature to transform WGS to RDT results and facilitate continuous RR-TB data analysis, the MTBGT tool may bridge the gap between and among data from periodic surveys, continuous surveillance, research, and routine tests, and may be integrated within the national information system for use by the NTP and epidemiologists to improve setting-specific RR-TB control. The MTBGT source code and accompanying documentation are available at https://github.com/KamelaNg/MTBGT.

Bridging the TB data gap: in silico extraction of rifampicin-resistant tuberculosis diagnostic test results from whole genome sequence data

10.1101/628099 ◽

2019 ◽

Author(s):

Kamela Charmaine S. Ng ◽

Jean Claude S. Ngabonziza ◽

Pauline Lempens ◽

Bouke Catherine de Jong ◽

Frank van Leth ◽

...

Keyword(s):

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Data Generation ◽

Continuous Analysis ◽

Middle Income ◽

Tb Control ◽

Sequencing Technologies ◽

Surveillance Programs ◽

Low And Middle Income

AbstractBackgroundMycobacterium tuberculosis rapid diagnostic tests (RDTs) are widely employed in routine laboratories and national surveys for detection of rifampicin-resistant (RR)-TB. However, as next generation sequencing technologies have become more commonplace in research and surveillance programs, RDTs are being increasingly complemented by whole genome sequencing (WGS). While comparison between RDTs is difficult, all RDT results can be derived from WGS data. This can facilitate continuous analysis of RR-TB burden regardless of the data generation technology employed. By converting WGS to RDT results, we enable comparison of data with different formats and sources particularly for low and middle income high TB burden countries that employ different diagnostic algorithms for drug resistance surveys. This allows national TB control programs (NTPs) and epidemiologists to utilize all available data in the setting for improved RR-TB surveillance.MethodsWe developed the Python-based MTB Genome to Test (MTBGT) tool that transforms WGS-derived data into laboratory-validated results of the primary RDTs – Xpert MTB/RIF, XpertMTB/RIF Ultra, GenoType MDRTBplus v2.0, and GenoscholarNTM+MDRTB II. The tool was validated through RDT results of RR-TB strains with diverse resistance patterns and geographic origins and applied on routine-derived WGS data.ResultsThe MTBGT tool correctly transformed the SNP data into the RDT results and generated tabulated frequencies of the RDT probes as well as rifampicin susceptible cases. The tool supplemented the RDT probe reactions output with the RR-conferring mutation based on identified SNPs. The MTBGT tool facilitated continuous analysis of RR-TB and Xpert probe reactions from different platforms and collection periods in Rwanda.ConclusionOverall, the MTBGT tool allows low and middle income countries to make sense of the increasingly generated WGS in light of the readily available RDT results, and assess whether currently implemented RDTs adequately detect RR-TB in their setting. With its feature to transform WGS to RDT results and facilitate continuous RR-TB data analysis, the MTBGT tool may bridge the gap between and among data from periodic surveys, continuous surveillance, research, and routine tests, and may be integrated within the existing national connectivity platform for use by the NTP and epidemiologists to improve setting-specific RR-TB control. The MTBGT source code and accompanying documentation is available at https://github.com/KamelaNg/MTBGT.

Faculty Opinions recommendation of Optimal algorithms for haplotype assembly from whole-genome sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13339986.14707085 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Optimal Algorithms ◽

Genome Sequence Data ◽

Haplotype Assembly

TIGER: inferring DNA replication timing from whole-genome sequence data

Bioinformatics ◽

10.1093/bioinformatics/btab166 ◽

2021 ◽

Cited By ~ 1

Author(s):

Amnon Koren ◽

Dashiell J Massey ◽

Alexa N Bracci

Keyword(s):

Dna Replication ◽

Genome Sequence ◽

Genomic Dna ◽

Sequence Data ◽

Replication Timing ◽

Whole Genome Sequence ◽

Supplementary Information ◽

Whole Genome ◽

Genome Sequence Data ◽

Dna Replication Timing

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online

Whole genome sequence data of Bacillus australimaris strain B28A, isolated from Marine Water in India

Data in Brief ◽

10.1016/j.dib.2021.107240 ◽

2021 ◽

pp. 107240

Author(s):

Wael Ali Mohammed Hadi ◽

Boby T Edwin ◽

A Jayakumaran Nair

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Marine Water ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Whole genome sequence data of Mycobacterium tuberculosis XDR strain, isolated from patient in Kazakhstan

Data in Brief ◽

10.1016/j.dib.2020.106416 ◽

2020 ◽

Vol 33 ◽

pp. 106416

Author(s):

Asset Daniyarov ◽

Askhat Molkenov ◽

Saule Rakhimova ◽

Ainur Akhmetova ◽

Zhannur Nurkina ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Elucidating the genetic basis of an oligogenic birth defect using whole genome sequence data in a non-model organism, Bubalus bubalis

Scientific Reports ◽

10.1038/srep39719 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 10

Author(s):

Lynsey K. Whitacre ◽

Jesse L. Hoff ◽

Robert D. Schnabel ◽

Sara Albarella ◽

Francesca Ciotola ◽

...

Keyword(s):

Genome Sequence ◽

Birth Defect ◽

Genetic Basis ◽

Sequence Data ◽

Model Organism ◽

Bubalus Bubalis ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Whole genome characterization of strains belonging to the Ralstonia solanacearum species complex and in silico analysis of TaqMan assays for detection in this heterogenous species complex

European Journal of Plant Pathology ◽

10.1007/s10658-020-02190-8 ◽

2021 ◽

Author(s):

Viola Kurm ◽

Ilse Houwers ◽

Claudia E. Coipan ◽

Peter Bonants ◽

Cees Waalwijk ◽

...

Keyword(s):

Ralstonia Solanacearum ◽

In Silico ◽

Species Complex ◽

Sequence Data ◽

In Silico Analysis ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequences ◽

Pcr Assays

AbstractIdentification and classification of members of the Ralstonia solanacearum species complex (RSSC) is challenging due to the heterogeneity of this complex. Whole genome sequence data of 225 strains were used to classify strains based on average nucleotide identity (ANI) and multilocus sequence analysis (MLSA). Based on the ANI score (>95%), 191 out of 192(99.5%) RSSC strains could be grouped into the three species R. solanacearum, R. pseudosolanacearum, and R. syzygii, and into the four phylotypes within the RSSC (I,II, III, and IV). R. solanacearum phylotype II could be split in two groups (IIA and IIB), from which IIB clustered in three subgroups (IIBa, IIBb and IIBc). This division by ANI was in accordance with MLSA. The IIB subgroups found by ANI and MLSA also differed in the number of SNPs in the primer and probe sites of various assays. An in-silico analysis of eight TaqMan and 11 conventional PCR assays was performed using the whole genome sequences. Based on this analysis several cases of potential false positives or false negatives can be expected upon the use of these assays for their intended target organisms. Two TaqMan assays and two PCR assays targeting the 16S rDNA sequence should be able to detect all phylotypes of the RSSC. We conclude that the increasing availability of whole genome sequences is not only useful for classification of strains, but also shows potential for selection and evaluation of clade specific nucleic acid-based amplification methods within the RSSC.

46 Footprints of Selection in Angus and Hanwoo Beef Cattle Using Imputed Whole Genome Sequence Data

Journal of Animal Science ◽

10.1093/jas/skab235.042 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 25-25

Author(s):

Muhammad Yasir Nawaz ◽

Rodrigo Pelicioni Savegnago ◽

Cedric Gondro

Keyword(s):

Beef Cattle ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Fixation Index ◽

Whole Genome ◽

Extended Haplotype Homozygosity ◽

Extended Haplotype ◽

Genome Sequence Data ◽

Genomic Regions

Abstract In this study, we detected genome wide footprints of selection in Hanwoo and Angus beef cattle using different allele frequency and haplotype-based methods based on imputed whole genome sequence data. Our dataset included 13,202 Angus and 10,437 Hanwoo animals with 10,057,633 and 13,241,550 imputed SNPs, respectively. A subset of data with 6,873,624 common SNPs between the two populations was used to estimate signatures of selection parameters, both within (runs of homozygosity and extended haplotype homozygosity) and between (allele fixation index, extended haplotype homozygosity) the breeds in order to infer evidence of selection. We observed that correlations between various measures of selection ranged between 0.01 to 0.42. Assuming these parameters were complementary to each other, we combined them into a composite selection signal to identify regions under selection in both beef breeds. The composite signal was based on the average of fractional ranks of individual selection measures for every SNP. We identified some selection signatures that were common between the breeds while others were independent. We also observed that more genomic regions were selected in Angus as compared to Hanwoo. Candidate genes within significant genomic regions may help explain mechanisms of adaptation, domestication history and loci for important traits in Angus and Hanwoo cattle. In the future, we will use the top SNPs under selection for genomic prediction of carcass traits in both breeds.

148 Multiple Dysregulated Novel Pathways and Genes in Aleutian Mink Disease Revealed by Selection Signatures and Gene Network Analyses Using Whole-genome Sequence Data

Journal of Animal Science ◽

10.1093/jas/skab235.137 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 76-76

Author(s):

Seyed Milad Vahedi ◽

Karim Karimi ◽

Siavash Salek Ardestani ◽

Younes Miar

Keyword(s):

Sequence Data ◽

American Mink ◽

Enrichment Analysis ◽

Whole Genome Sequence ◽

Fixation Index ◽

Pathway Enrichment Analysis ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Network Analyses ◽

Genome Level

Abstract Aleutian disease (AD) is a chronic persistent infection in domestic mink caused by Aleutian mink disease virus (AMDV). Female mink’s fertility and pelt quality depression are the main reasons for the AD’s negative economic impacts on the mink industry. A total number of 79 American mink from the Canadian Center for Fur Animal Research at Dalhousie University (Truro, NS, Canada) were classified based on the results of counter immunoelectrophoresis (CIEP) tests into two groups of positive (n = 48) and negative (n = 31). Whole-genome sequences comprising 4,176 scaffolds and 8,039,737 single nucleotide polymorphisms (SNPs) were used to trace the selection footprints for response to AMDV infection at the genome level. Window-based fixation index (Fst) and nucleotide diversity (θπ) statistics were estimated to compare positive and negative animals’ genomes. The overlapped top 1% genomic windows between two statistics were considered as potential regions underlying selection pressures. A total of 98 genomic regions harboring 33 candidate genes were detected as selective signals. Most of the identified genes were involved in the development and functions of immune system (PPP3CA, SMAP2, TNFRSF21, SKIL, and AKIRIN2), musculoskeletal system (COL9A2, PPP1R9A, ANK2, AKAP9, and STRIT1), nervous system (ASCL1, ZFP69B, SLC25A27, MCF2, and SLC7A14), reproductive system (CAMK2D, GJB7, SSMEM1, C6orf163), liver (PAH and DPYD), and lung (SLC35A1). Gene-expression network analysis showed the interactions among 27 identified genes. Moreover, pathway enrichment analysis of the constructed genes network revealed significant oxytocin (KEGG: hsa04921) and GnRH signaling (KEGG: hsa04912) pathways, which are likely to be impaired by AMDV leading to dams’ fecundity reduction. These results provided a perspective to the genetic architecture of response to AD in American mink and novel insight into the pathogenesis of AMDV.

ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data

BMC Genomics ◽

10.1186/s12864-018-5169-9 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 6

Author(s):

Shuto Hayashi ◽

Rui Yamaguchi ◽

Shinichi Mizuno ◽

Mitsuhiro Komura ◽

Satoru Miyano ◽

...

Keyword(s):

Genome Sequence ◽

Bayesian Method ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data ◽

Hla Genes