scholarly journals Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs

2020 ◽  
Author(s):  
Tsung-Yu Lu ◽  
Mark Chaisson ◽  

AbstractVariable number tandem repeat sequences (VNTR) are composed of consecutive repeats of short segments of DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. We solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We developed software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We used this to discover VNTRs with length stratified by continental population, and novel expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tsung-Yu Lu ◽  
Katherine M. Munson ◽  
Alexandra P. Lewis ◽  
Qihui Zhu ◽  
Luke J. Tallon ◽  
...  

AbstractVariable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.


2004 ◽  
Vol 186 (16) ◽  
pp. 5496-5505 ◽  
Author(s):  
Leo M. Schouls ◽  
Han G. J. van der Heide ◽  
Luc Vauterin ◽  
Paul Vauterin ◽  
Frits R. Mooi

ABSTRACT Bordetella pertussis, the causative agent of whooping cough, has remained endemic in The Netherlands despite extensive nationwide vaccination since 1953. In the 1990s, several epidemic periods have resulted in many cases of pertussis. We have proposed that strain variation has played a major role in the upsurges of this disease in The Netherlands. Therefore, molecular characterization of strains is important in identifying the causes of pertussis epidemiology. For this reason, we have developed a multiple-locus variable-number tandem repeat analysis (MLVA) typing system for B. pertussis. By combining the MLVA profile with the allelic profile based on multiple-antigen sequence typing, we were able to further differentiate strains. The relationships between the various genotypes were visualized by constructing a minimum spanning tree. MLVA of Dutch strains of B. pertussis revealed that the genotypes of the strains isolated in the prevaccination period were diverse and clearly distinct from the strains isolated in the 1990s. Furthermore, there was a decrease in diversity in the strains from the late 1990s, with a remarkable clonal expansion that coincided with the epidemic periods. Using this genotyping, we have been able to show that B. pertussis is much more dynamic than expected.


Sign in / Sign up

Export Citation Format

Share Document