scholarly journals The Mutational Dynamics of Short Tandem Repeats in Large, Multigenerational Families

2021 ◽  
Author(s):  
Cody J Steely ◽  
Scott Watkins ◽  
Lisa Baird ◽  
Lynn Jorde

Short tandem repeats (STRs) are tandemly repeated sequences of 1-6 bp motifs. STRs compose approximately 3% of the genome, and mutations at STR loci have been linked to dozens of human diseases including amyotrophic lateral sclerosis, Friedreich ataxia, Huntington disease, and fragile X syndrome. Improving our understanding of these mutations would increase our knowledge of the mutational dynamics of the genome and may uncover additional loci that contribute to disease. Here, to estimate the genome-wide pattern of mutations at STR loci, we analyzed blood-derived whole-genome sequencing data for 544 individuals from 29 three-generation CEPH pedigrees. These pedigrees contain both sets of grandparents, the parents, and an average of 9 grandchildren per family. Using HipSTR we identified de novo STR mutations in the 2nd generation of these pedigrees. Analyzing ~1.6 million STR loci, we estimate the empircal de novo STR mutation rate to be 5.24*10-5 mutations per locus per generation. We find that perfect repeats mutate ~2x more often than imperfect repeats. De novo STRs are significantly enriched in Alu elements (p < 2.2e-16). Approximately 30% of STR mutations occur within Alu elements, which compose only ~11% of the genome, and ~10% are found in LINE-1 insertions, which compose ~17% of the genome. Phasing these de novo mutations to the parent of origin shows that parental transmission biases vary among families. We estimate the average number of de novo genome-wide STR mutations per individual to be ~85, which is similar to the average number of observed de novo single nucleotide variants.

2016 ◽  
Author(s):  
Thomas Willems ◽  
Dina Zielinski ◽  
Assaf Gordon ◽  
Melissa Gymrek ◽  
Yaniv Erlich

AbstractShort tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, STRs have proven problematic to genotype from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping, haplotyping, and phasing STRs from whole genome sequencing data and report a genome-wide analysis and validation of de novo STR mutations.


2015 ◽  
Vol 25 (5) ◽  
pp. 736-749 ◽  
Author(s):  
Arkarachai Fungtammasan ◽  
Guruprasad Ananda ◽  
Suzanne E. Hile ◽  
Marcia Shu-Wei Su ◽  
Chen Sun ◽  
...  

2021 ◽  
Author(s):  
Linda Zhou ◽  
Chunmin Ge ◽  
Thomas Malachowski ◽  
Ji Hun Kim ◽  
Keerthivasan Raanin Chandradoss ◽  
...  

AbstractShort tandem repeat (STR) instability is causally linked to pathologic transcriptional silencing in a subset of repeat expansion disorders. In fragile X syndrome (FXS), instability of a single CGG STR tract is thought to repress FMR1 via local DNA methylation. Here, we report the acquisition of more than ten Megabase-sized H3K9me3 domains in FXS, including a 5-8 Megabase block around FMR1. Distal H3K9me3 domains encompass synaptic genes with STR instability, and spatially co-localize in trans concurrently with FMR1 CGG expansion and the dissolution of TADs. CRISPR engineering of mutation-length FMR1 CGG to normal-length preserves heterochromatin, whereas cut-out to pre-mutation-length attenuates a subset of H3K9me3 domains. Overexpression of a pre-mutation-length CGG de-represses both FMR1 and distal heterochromatinized genes, indicating that long-range H3K9me3-mediated silencing is exquisitely sensitive to STR length. Together, our data uncover a genome-wide surveillance mechanism by which STR tracts spatially communicate over vast distances to heterochromatinize the pathologically unstable genome in FXS.One-Sentence SummaryHeterochromatinization of distal synaptic genes with repeat instability in fragile X is reversible by overexpression of a pre-mutation length CGG tract.


2021 ◽  
Author(s):  
Jiru Han ◽  
Jacob E Munro ◽  
Anthony Kocoski ◽  
Alyssa E Barry ◽  
Melanie Bahlo

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been made available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).


2019 ◽  
Author(s):  
David Jakubosky ◽  
Erin N. Smith ◽  
Matteo D’Antonio ◽  
Marc Jan Bonder ◽  
William W. Young Greenwald ◽  
...  

AbstractStructural variants (SVs) and short tandem repeats (STRs) are important sources of genetic diversity but are not routinely analyzed in genetic studies because they are difficult to accurately identify and genotype. Because SVs and STRs range in size and type, it is necessary to apply multiple algorithms that incorporate different types of evidence from sequencing data and employ complex filtering strategies to discover a comprehensive set of high-quality and reproducible variants. Here we assembled a set of 719 deep whole genome sequencing (WGS) samples (mean 42x) from 477 distinct individuals which we used to discover and genotype a wide spectrum of SV and STR variants using five algorithms. We used 177 unique pairs of genetic replicates to identify factors that affect variant call reproducibility and developed a systematic filtering strategy to create of one of the most complete and well characterized maps of SVs and STRs to date.


2018 ◽  
Author(s):  
Shubham Saini ◽  
Ileena Mitra ◽  
Nima Mousavi ◽  
Stephanie Feupe Fotsing ◽  
Melissa Gymrek

AbstractShort tandem repeats (STRs) are involved in dozens of Mendelian disorders and have been implicated in a variety of complex traits. However, existing technologies focusing on single nucleotide polymorphisms (SNPs) have not allowed for systematic STR association studies. Here, we leverage next-generation sequencing data from 479 families to create a SNP+STR reference haplotype panel for genome-wide imputation of STRs into SNP data. Imputation achieved an average of 97% concordance between genotyped and imputed STR genotypes in an external dataset compared to 63% expected under a random model. Performance varied widely across STRs, with near perfect concordance at bi-allelic STRs vs. 70% at highly polymorphic forensics markers. We demonstrate that imputation increases power over individual SNPs to detect STR associations using simulated phenotypes and gene expression data. This resource will enable the first large-scale STR association studies using existing SNP datasets, and will likely yield new insights into complex traits.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 200 ◽  
Author(s):  
Andreas Halman ◽  
Alicia Oshlack

Background: Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale; however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. Methods: The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male samples. In total we analysed 433 samples and around a million genotypes for evaluating tools on whole exome sequencing data. Results: We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. Conclusions: All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.


2019 ◽  
Vol 35 (22) ◽  
pp. 4716-4723 ◽  
Author(s):  
Daniel Tello ◽  
Juanita Gil ◽  
Cristian D Loaiza ◽  
John J Riascos ◽  
Nicolás Cardozo ◽  
...  

Abstract Motivation Accurate detection, genotyping and downstream analysis of genomic variants from high-throughput sequencing data are fundamental features in modern production pipelines for genetic-based diagnosis in medicine or genomic selection in plant and animal breeding. Our research group maintains the Next-Generation Sequencing Experience Platform (NGSEP) as a precise, efficient and easy-to-use software solution for these features. Results Understanding that incorrect alignments around short tandem repeats are an important source of genotyping errors, we implemented in NGSEP new algorithms for realignment and haplotype clustering of reads spanning indels and short tandem repeats. We performed extensive benchmark experiments comparing NGSEP to state-of-the-art software using real data from three sequencing protocols and four species with different distributions of repetitive elements. NGSEP consistently shows comparative accuracy and better efficiency compared to the existing solutions. We expect that this work will contribute to the continuous improvement of quality in variant calling needed for modern applications in medicine and agriculture. Availability and implementation NGSEP is available as open source software at http://ngsep.sf.net. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document