GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

ABSTRACT Variable-number tandem repeat (VNTR) loci have shown a remarkable ability to discriminate among isolates of the recently emerged clonal pathogen Escherichia coli O157:H7, making them a very useful molecular epidemiological tool. However, little is known about the rates at which these sequences mutate, the factors that affect mutation rates, or the mechanisms by which mutations occur at these loci. Here, we measure mutation rates for 28 VNTR loci and investigate the effects of repeat copy number and mismatch repair on mutation rate using in vitro-generated populations for 10 E. coli O157:H7 strains. We find single-locus rates as high as 7.0 × 10−4 mutations/generation and a combined 28-locus rate of 6.4 × 10−4 mutations/generation. We observed single- and multirepeat mutations that were consistent with a slipped-strand mispairing mutation model, as well as a smaller number of large repeat copy number mutations that were consistent with recombination-mediated events. Repeat copy number within an array was strongly correlated with mutation rate both at the most mutable locus, O157-10 (r 2 = 0.565, P = 0.0196), and across all mutating loci. The combined locus model was significant whether locus O157-10 was included (r 2 = 0.833, P < 0.0001) or excluded (r 2 = 0.452, P < 0.0001) from the analysis. Deficient mismatch repair did not affect mutation rate at any of the 28 VNTRs with repeat unit sizes of >5 bp, although a poly(G) homomeric tract was destabilized in the mutS strain. Finally, we describe a general model for VNTR mutations that encompasses insertions and deletions, single- and multiple-repeat mutations, and their relative frequencies based upon our empirical mutation rate data.

Download Full-text

Size heterogeneity among antigenically related Giardia lamblia variant-specific surface proteins is due to differences in tandem repeat copy number.

Infection and Immunity ◽

10.1128/iai.62.4.1213-1218.1994 ◽

1994 ◽

Vol 62 (4) ◽

pp. 1213-1218 ◽

Cited By ~ 22

Author(s):

M R Mowatt ◽

B Y Nguyen ◽

J T Conrad ◽

R D Adam ◽

T E Nash

Keyword(s):

Specific Surface ◽

Tandem Repeat ◽

Giardia Lamblia ◽

Copy Number ◽

Surface Proteins ◽

Repeat Copy Number ◽

Size Heterogeneity ◽

Repeat Copy

Download Full-text

GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

10.1101/246108 ◽

2018 ◽

Cited By ~ 1

Author(s):

Devika Ganesamoorthy ◽

Minh Duc Cao ◽

Tania Duarte ◽

Wenhan Chen ◽

Lachlan Coin

Keyword(s):

High Throughput ◽

Tandem Repeat ◽

Copy Number ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Sequence Data ◽

Complex Diseases ◽

Sequencing Analysis ◽

Reference Dataset ◽

Long Read

ABSTRACTBackgroundTandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations.MethodsWe report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation.ResultsWe used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results.ConclusionsThe novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.

Download Full-text

The role of fimV and the importance of its tandem repeat copy number in twitching motility, pigment production, and morphology in Legionella pneumophila

Archives of Microbiology ◽

10.1007/s00203-010-0590-8 ◽

2010 ◽

Vol 192 (8) ◽

pp. 625-631 ◽

Cited By ~ 12

Author(s):

David A. Coil ◽

Jozef Anné

Keyword(s):

Tandem Repeat ◽

Legionella Pneumophila ◽

Copy Number ◽

Pigment Production ◽

Twitching Motility ◽

Repeat Copy Number ◽

Repeat Copy

Download Full-text

Variation in repeat copy number of the Epithelial adhesin 1 tandem repeat region leads to variable protein display through multiple mechanisms

10.1101/872853 ◽

2019 ◽

Author(s):

Colin J. Raposo ◽

Kyle A. McElroy ◽

Stephen M. Fuchs

Keyword(s):

Cell Surface ◽

Tandem Repeat ◽

Copy Number ◽

Surface Display ◽

Host Tissue ◽

Repeat Region ◽

Repeat Copy Number ◽

Tandem Repeat Region ◽

Linker Domain ◽

Repeat Copy

AbstractThe pathogenic yeast Candida glabrata is reliant on a suite of cell surface adhesins that play a variety of roles necessary for transmission, establishment, and proliferation during infection. One particular adhesin, Epithelial Adhesin 1 [Epa1p], is responsible for binding to host tissue, a process which is essential for fungal propagation. Epa1p structure consists of three domains: an N-terminal intercellular binding domain responsible for epithelial cell binding, a C-terminal GPI anchor for cell wall linkage, and a serine / threonine-rich linker domain connecting these terminal domains. The linker domain contains a 40-amino acid tandem repeat region, which we have found to be variable in repeat copy number between isolates from clinical sources. We hypothesized that natural variation in Epa1p repeat copy may modulate protein function. To test this, we recombinantly expressed Epa1p with various repeat copy numbers in S. cerevisiae to determine how differences in repeat copy number affect Epa1p expression, surface display, and binding to human epithelial cells. Our data suggest that repeat copy number variation has pleiotropic effects, influencing gene expression, protein surface display, shedding from the cell surface, and host tissue adhesion of the Epa1p adhesin. Understanding these links between repeat copy number variants and mechanisms of infection provide new understanding of the variety of roles of repetitive proteins contribute to pathogenicity of C. glabrata.

Download Full-text

Variation in primary sequence and tandem repeat copy number among i-antigens of Ichthyophthirius multifiliis[Mol. Biochem. Parasitol. 120 (2002) 93–106]

Molecular and Biochemical Parasitology ◽

10.1016/s0166-6851(02)00062-2 ◽

2002 ◽

Vol 122 (1) ◽

pp. 117

Author(s):

Yuankai Lin ◽

Tian Long Lin ◽

Chia-Ching Wang ◽

Xuting Wang ◽

Knut Stieger ◽

...

Keyword(s):

Tandem Repeat ◽

Copy Number ◽

Ichthyophthirius Multifiliis ◽

Primary Sequence ◽

Repeat Copy Number ◽

Repeat Copy

Download Full-text

Application of Copy Number Variation Sequencing in Genetic Analysis of Miscarriages in Early and Middle Pregnancy

Cytogenetic and Genome Research ◽

10.1159/000512801 ◽

2020 ◽

Vol 160 (11-12) ◽

pp. 634-642

Author(s):

Shiqiang Luo ◽

Xingyuan Chen ◽

Tizhen Yan ◽

Jiaolian Ya ◽

Zehui Xu ◽

...

Keyword(s):

Copy Number Variation ◽

High Throughput ◽

Copy Number ◽

High Throughput Sequencing ◽

Chromosomal Abnormalities ◽

Pregnancy Termination ◽

Mendelian Inheritance ◽

Copy Number Variations ◽

Abnormal Chromosome ◽

Number Variation

High-throughput sequencing based on copy number variation (CNV-seq) is commonly used to detect chromosomal abnormalities. This study identifies chromosomal abnormalities in aborted embryos/fetuses in early and middle pregnancy and explores the application value of CNV-seq in determining the causes of pregnancy termination. High-throughput sequencing was used to detect chromosome copy number variations (CNVs) in 116 aborted embryos in early and middle pregnancy. The detection data were compared with the Database of Genomic Variants (DGV), the Database of Chromosomal Imbalance and Phenotype in Humans using Ensemble Resources (DECIPHER), and the Online Mendelian Inheritance in Man (OMIM) database to determine the CNV type and the clinical significance. High-throughput sequencing results were successfully obtained in 109 out of 116 specimens, with a detection success rate of 93.97%. In brief, there were 64 cases with abnormal chromosome numbers and 23 cases with CNVs, in which 10 were pathogenic mutations and 13 were variants of uncertain significance. An abnormal chromosome number is the most important reason for embryo termination in early and middle pregnancy, followed by pathogenic chromosome CNVs. CNV-seq can quickly and accurately detect chromosome abnormalities and identify microdeletion and microduplication CNVs that cannot be detected by conventional chromosome analysis, which is convenient and efficient for genetic etiology diagnosis in miscarriage.

Download Full-text