scholarly journals Predicting the effects of SNPs on transcription factor binding affinity

2019 ◽  
Author(s):  
Sierra S Nishizaki ◽  
Natalie Ng ◽  
Shengcheng Dong ◽  
Robert S Porter ◽  
Cody Morterud ◽  
...  

Abstract Motivation Genome-wide association studies have revealed that 88% of disease-associated single-nucleotide polymorphisms (SNPs) reside in noncoding regions. However, noncoding SNPs remain understudied, partly because they are challenging to prioritize for experimental validation. To address this deficiency, we developed the SNP effect matrix pipeline (SEMpl). Results SEMpl estimates transcription factor-binding affinity by observing differences in chromatin immunoprecipitation followed by deep sequencing signal intensity for SNPs within functional transcription factor-binding sites (TFBSs) genome-wide. By cataloging the effects of every possible mutation within the TFBS motif, SEMpl can predict the consequences of SNPs to transcription factor binding. This knowledge can be used to identify potential disease-causing regulatory loci. Availability and implementation SEMpl is available from https://github.com/Boyle-Lab/SEM_CPP. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (19) ◽  
pp. 4957-4959
Author(s):  
David B Blumenthal ◽  
Lorenzo Viola ◽  
Markus List ◽  
Jan Baumbach ◽  
Paolo Tieri ◽  
...  

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (15) ◽  
pp. 2657-2659 ◽  
Author(s):  
Sunyoung Shin ◽  
Rebecca Hudson ◽  
Christopher Harrison ◽  
Mark Craven ◽  
Sündüz Keleş

AbstractSummaryUnderstanding the regulatory roles of non-coding genetic variants has become a central goal for interpreting results of genome-wide association studies. The regulatory significance of the variants may be interrogated by assessing their influence on transcription factor binding. We have developed atSNP Search, a comprehensive web database for evaluating motif matches to the human genome with both reference and variant alleles and assessing the overall significance of the variant alterations on the motif matches. Convenient search features, comprehensive search outputs and a useful help menu are key components of atSNP Search. atSNP Search enables convenient interpretation of regulatory variants by statistical significance testing and composite logo plots, which are graphical representations of motif matches with the reference and variant alleles. Existing motif-based regulatory variant discovery tools only consider a limited pool of variants due to storage or other limitations. In contrast, atSNP Search users can test more than 37 billion variant-motif pairs with marginal significance in motif matches or match alteration. Computational evidence from atSNP Search, when combined with experimental validation, may help with the discovery of underlying disease mechanisms.Availability and implementationatSNP Search is freely available at http://atsnp.biostat.wisc.edu.Supplementary informationSupplementary data are available at Bioinformatics online.


2016 ◽  
Vol 283 (1835) ◽  
pp. 20160569 ◽  
Author(s):  
M. E. Goddard ◽  
K. E. Kemper ◽  
I. M. MacLeod ◽  
A. J. Chamberlain ◽  
B. J. Hayes

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.


2020 ◽  
Author(s):  
Celine Charon ◽  
Rodrigue Allodji ◽  
Vincent Meyer ◽  
Jean-François Deleuze

Abstract Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.


Animals ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 2211
Author(s):  
Shan Lin ◽  
Zihui Wan ◽  
Junnan Zhang ◽  
Lingna Xu ◽  
Bo Han ◽  
...  

Albumin can be of particular benefit in fighting infections for newborn calves due to its anti-inflammatory and anti-oxidative stress properties. To identify the candidate genes related to the concentration of albumin in colostrum and serum, we collected the colostrum and blood samples from 572 Chinese Holstein cows within 24 h after calving and measured the concentration of albumin in the colostrum and serum using the ELISA methods. The cows were genotyped with GeneSeek 150 K chips (containing 140,668 single nucleotide polymorphisms; SNPs). After quality control, we performed GWASs via GCTA software with 91,620 SNPs and 563 cows. Consequently, 9 and 7 genome-wide significant SNPs (false discovery rate (FDR) at 1%) were identified. Correspondingly, 42 and 206 functional genes that contained or were approximate to (±1 Mbp) the significant SNPs were acquired. Integrating the biological process of these genes and the reported QTLs for immune and inflammation traits in cattle, 3 and 12 genes were identified as candidates for the concentration of colostrum and serum albumin, respectively; these are RUNX1, CBR1, OTULIN,CDK6, SHARPIN, CYC1, EXOSC4, PARP10, NRBP2, GFUS, PYCR3, EEF1D, GSDMD, PYCR2 and CXCL12. Our findings provide important information for revealing the genetic mechanism behind albumin concentration and for molecular breeding of disease-resistance traits in dairy cattle.


2019 ◽  
Vol 48 (D1) ◽  
pp. D659-D667 ◽  
Author(s):  
Wenqian Yang ◽  
Yanbo Yang ◽  
Cecheng Zhao ◽  
Kun Yang ◽  
Dongyang Wang ◽  
...  

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Gongcheng Li ◽  
Tiejun Pan ◽  
Dan Guo ◽  
Long-Cheng Li

Single nucleotide polymorphisms (SNPs) occurring in noncoding sequences have largely been ignored in genome-wide association studies (GWAS). Yet, amounting evidence suggests that many noncoding SNPs especially those that are in the vicinity of protein coding genes play important roles in shaping chromatin structure and regulate gene expression and, as such, are implicated in a wide variety of diseases. One of such regulatory SNPs (rSNPs) is the E-cadherin (CDH1) promoter −160C/A SNP (rs16260) which is known to affect E-cadherin promoter transcription by displacing transcription factor binding and has been extensively scrutinized for its association with several diseases especially malignancies. Findings from studying this SNP highlight important clinical relevance of rSNPs and justify their inclusion in future GWAS to identify novel disease causing SNPs.


2020 ◽  
Author(s):  
Huan Liu ◽  
Kaylia Duncan ◽  
Annika Helverson ◽  
Priyanka Kumari ◽  
Camille Mumm ◽  
...  

AbstractGenome wide association studies for non-syndromic orofacial cleft (OFC) have identified single nucleotide polymorphisms (SNPs) at loci where the presumed risk-relevant gene is expressed in oral periderm. The functional subsets of such SNPs are difficult to predict because the sequence underpinnings of periderm enhancers are unknown. We applied ATAC-seq to models of human palate periderm, including zebrafish periderm, mouse embryonic palate epithelia, and a human oral epithelium cell line, and to complementary mesenchymal cell types. We identified sets of enhancers specific to the epithelial cells and trained gapped-kmer support-vector-machine classifiers on these sets. We used the classifiers to predict the effect of 14 OFC-associated SNPs at 12q13 near KRT18. All the classifiers picked the same SNP as having the strongest effect, but the significance was highest with the classifier trained on zebrafish periderm. Reporter and deletion analyses support this SNP as lying within a periderm enhancer regulating KRT18/KRT8 expression.


Sign in / Sign up

Export Citation Format

Share Document