scholarly journals Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets

2017 ◽  
Vol 114 (22) ◽  
pp. 5671-5676 ◽  
Author(s):  
Michael D. Edge ◽  
Bridget F. B. Algee-Hewitt ◽  
Trevor J. Pemberton ◽  
Jun Z. Li ◽  
Noah A. Rosenberg

Combining genotypes across datasets is central in facilitating advances in genetics. Data aggregation efforts often face the challenge of record matching—the identification of dataset entries that represent the same individual. We show that records can be matched across genotype datasets that have no shared markers based on linkage disequilibrium between loci appearing in different datasets. Using two datasets for the same 872 people—one with 642,563 genome-wide SNPs and the other with 13 short tandem repeats (STRs) used in forensic applications—we find that 90–98% of forensic STR records can be connected to corresponding SNP records and vice versa. Accuracy increases to 99–100% when ∼30 STRs are used. Our method expands the potential of data aggregation, but it also suggests privacy risks intrinsic in maintenance of databases containing even small numbers of markers—including databases of forensic significance.

2021 ◽  
Author(s):  
Linda Zhou ◽  
Chunmin Ge ◽  
Thomas Malachowski ◽  
Ji Hun Kim ◽  
Keerthivasan Raanin Chandradoss ◽  
...  

AbstractShort tandem repeat (STR) instability is causally linked to pathologic transcriptional silencing in a subset of repeat expansion disorders. In fragile X syndrome (FXS), instability of a single CGG STR tract is thought to repress FMR1 via local DNA methylation. Here, we report the acquisition of more than ten Megabase-sized H3K9me3 domains in FXS, including a 5-8 Megabase block around FMR1. Distal H3K9me3 domains encompass synaptic genes with STR instability, and spatially co-localize in trans concurrently with FMR1 CGG expansion and the dissolution of TADs. CRISPR engineering of mutation-length FMR1 CGG to normal-length preserves heterochromatin, whereas cut-out to pre-mutation-length attenuates a subset of H3K9me3 domains. Overexpression of a pre-mutation-length CGG de-represses both FMR1 and distal heterochromatinized genes, indicating that long-range H3K9me3-mediated silencing is exquisitely sensitive to STR length. Together, our data uncover a genome-wide surveillance mechanism by which STR tracts spatially communicate over vast distances to heterochromatinize the pathologically unstable genome in FXS.One-Sentence SummaryHeterochromatinization of distal synaptic genes with repeat instability in fragile X is reversible by overexpression of a pre-mutation length CGG tract.


2018 ◽  
Author(s):  
Shubham Saini ◽  
Ileena Mitra ◽  
Nima Mousavi ◽  
Stephanie Feupe Fotsing ◽  
Melissa Gymrek

AbstractShort tandem repeats (STRs) are involved in dozens of Mendelian disorders and have been implicated in a variety of complex traits. However, existing technologies focusing on single nucleotide polymorphisms (SNPs) have not allowed for systematic STR association studies. Here, we leverage next-generation sequencing data from 479 families to create a SNP+STR reference haplotype panel for genome-wide imputation of STRs into SNP data. Imputation achieved an average of 97% concordance between genotyped and imputed STR genotypes in an external dataset compared to 63% expected under a random model. Performance varied widely across STRs, with near perfect concordance at bi-allelic STRs vs. 70% at highly polymorphic forensics markers. We demonstrate that imputation increases power over individual SNPs to detect STR associations using simulated phenotypes and gene expression data. This resource will enable the first large-scale STR association studies using existing SNP datasets, and will likely yield new insights into complex traits.


2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Shubham Saini ◽  
Ileena Mitra ◽  
Nima Mousavi ◽  
Stephanie Feupe Fotsing ◽  
Melissa Gymrek

Gene ◽  
2016 ◽  
Vol 587 (1) ◽  
pp. 83-90 ◽  
Author(s):  
A. Bushehri ◽  
M.R. Mashhoudi Barez ◽  
S.K. Mansouri ◽  
A. Biglarian ◽  
M. Ohadi

2020 ◽  
pp. 002580242096500
Author(s):  
Supakit Khacha-ananda ◽  
Phatcharin Mahawong

Short tandem repeats (STRs) are widely used as DNA markers in paternity testing and criminal investigations because of their high genetic polymorphism among individuals in population. However, many factors influence genetic variations of STRs. Therefore, understanding STR information within individual populations could provide database and scientifically reliable STR genotyping for forensic genetic purposes. We aimed to examine allele frequencies of X-STRs, including some forensic parameters, in a northern Thai population. A retrospective descriptive study was conducted by collecting X-STR data from unrelated individuals living in a northern region of Thailand. The allele frequency and forensic parameters – for example polymorphism information content (PIC), power of discrimination in females and males (PDf and PDm), mean exclusion chance (MEC) and haplotype frequency – were calculated. The Hardy–Weinberg equilibrium was analysed. A total of 132 alleles were observed, with corresponding allele frequency ranging from 0.0064 to 0.4904. The PIC of all loci was >0.6, representing high genetic polymorphism, except DXS8378 and DXS7423. Notably, DXS10135 was the most diverse loci with the highest PD and MEC, while DXS7423 was the least polymorphic marker with the lowest PD and MEC. The highest haplotype diversity in male data was on linkage group III (DXS10101-DXS10103-HPRTB) by 0.9895. The genetic distance analysis demonstrated that the northern Thai population had a close relationship with Taiwanese (DA = 0.023). There are no significant deviations among the Hardy–Weinberg equilibrium except DXS10148. This study has established a northern Thai X-STRs reference database to be used as a tool for forensic genetic purposes.


2018 ◽  
Author(s):  
Stephanie Feupe Fotsing ◽  
Jonathan Margoliash ◽  
Catherine Wang ◽  
Shubham Saini ◽  
Richard Yanicky ◽  
...  

AbstractShort tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and provide insights into putative mechanisms. Here, we leverage whole genome sequencing and expression data for 17 tissues from the Genotype-Tissue Expression Project (GTEx) to identify STRs for which repeat number is associated with expression of nearby genes (eSTRs). Our analysis reveals more than 28,000 eSTRs. We employ fine-mapping to quantify the probability that each eSTR is causal and characterize a group of the top 1,400 fine-mapped eSTRs. We identify hundreds of eSTRs linked with published GWAS signals and implicate specific eSTRs in complex traits including height and schizophrenia, inflammatory bowel disease, and intelligence. Overall, our results support the hypothesis that eSTRs contribute to a range of human phenotypes and will serve as a valuable resource for future studies of complex traits.


2015 ◽  
Vol 25 (5) ◽  
pp. 736-749 ◽  
Author(s):  
Arkarachai Fungtammasan ◽  
Guruprasad Ananda ◽  
Suzanne E. Hile ◽  
Marcia Shu-Wei Su ◽  
Chen Sun ◽  
...  

2021 ◽  
Author(s):  
Bernabe I Bustos ◽  
Kimberley Billingsley ◽  
Cornelis Blauwendraat ◽  
J. Raphael Gibbs ◽  
Ziv Gan-Or ◽  
...  

Parkinson′s disease (PD) is a complex neurodegenerative disorder with a strong genetic component, where most known disease-associated variants are single nucleotide polymorphisms (SNPs) and small insertions and deletions (Indels). DNA repetitive elements account for >50% of the human genome, however little is known of their contribution to PD etiology. While select short tandem repeats (STRs) within candidate genes have been studied in PD, their genome-wide contribution remains unknown. Here we present the first genome-wide association study (GWAS) of STRs in PD. Through a meta-analysis of 16 imputed GWAS cohorts from the International Parkinson′s Disease Genomic Consortium (IPDGC), totalling 39,087 individuals (16,642 PD cases and 22,445 controls of European ancestry) we identified 34 genome-wide significant STR loci (p < 5.34x10-6), with the strongest signal located in KANSL1 (chr17:44205351:[T]11, p=3x10-39, OR=1.31 [CI 95%=1.26-1.36]). Conditional-joint analyses suggested that 4 significant STRs mapping nearby NDUFAF2, TRIML2, MIRNA-129-1 and NCOR1 were independent from known PD risk SNPs. Including STRs in heritability estimates increased the variance explained by SNPs alone. Gene expression analysis of STRs (eSTR) in RNASeq data from 13 brain regions, identified significant associations of STRs influencing the expression of multiple genes, including PD known genes. Further functional annotation of candidate STRs revealed that significant eSTRs within NUDFAF2 and ZSWIM7 overlap with regulatory features and are associated with change in the expression levels of nearby genes. Here we show that STRs at known and novel candidate PD loci contribute to PD risk, and have functional effects in disease-relevant tissues and pathways, supporting previously reported disease-associated genes and giving further evidence for their functional prioritization. These data represent a valuable resource for researchers currently dissecting PD risk loci.


2021 ◽  
Author(s):  
Cody J Steely ◽  
Scott Watkins ◽  
Lisa Baird ◽  
Lynn Jorde

Short tandem repeats (STRs) are tandemly repeated sequences of 1-6 bp motifs. STRs compose approximately 3% of the genome, and mutations at STR loci have been linked to dozens of human diseases including amyotrophic lateral sclerosis, Friedreich ataxia, Huntington disease, and fragile X syndrome. Improving our understanding of these mutations would increase our knowledge of the mutational dynamics of the genome and may uncover additional loci that contribute to disease. Here, to estimate the genome-wide pattern of mutations at STR loci, we analyzed blood-derived whole-genome sequencing data for 544 individuals from 29 three-generation CEPH pedigrees. These pedigrees contain both sets of grandparents, the parents, and an average of 9 grandchildren per family. Using HipSTR we identified de novo STR mutations in the 2nd generation of these pedigrees. Analyzing ~1.6 million STR loci, we estimate the empircal de novo STR mutation rate to be 5.24*10-5 mutations per locus per generation. We find that perfect repeats mutate ~2x more often than imperfect repeats. De novo STRs are significantly enriched in Alu elements (p < 2.2e-16). Approximately 30% of STR mutations occur within Alu elements, which compose only ~11% of the genome, and ~10% are found in LINE-1 insertions, which compose ~17% of the genome. Phasing these de novo mutations to the parent of origin shows that parental transmission biases vary among families. We estimate the average number of de novo genome-wide STR mutations per individual to be ~85, which is similar to the average number of observed de novo single nucleotide variants.


2019 ◽  
Author(s):  
Siddharth Jain ◽  
Bijan Mazaheri ◽  
Netanel Raviv ◽  
Jehoshua Bruck

ABSTRACTThe current paradigm in data science is based on the belief that given sufficient amounts of data, classifiers are likely to uncover the distinction between true and false hypotheses. In particular, the abundance of genomic data creates opportunities for discovering disease risk associations and help in screening and treatment. However, working with large amounts of data is statistically beneficial only if the data is statistically unbiased. Here we demonstrate that amplification methods of DNA samples in TCGA have a substantial effect on short tandem repeat (STR) information. In particular, we design a classifier that uses the STR information and can distinguish between samples that have an analyte code D and an analyte code W. This artificial bias might be detrimental to data driven approaches, and might undermine the conclusions based on past and future genome wide studies.


Sign in / Sign up

Export Citation Format

Share Document