Location deviations of DNA functional elements affected SNP mapping in the published databases and references

2019 ◽  
Vol 21 (4) ◽  
pp. 1293-1301
Author(s):  
Hewei Zheng ◽  
Xueying Zhao ◽  
Hong Wang ◽  
Yu Ding ◽  
Xiaoyan Lu ◽  
...  

Abstract The recent extensive application of next-generation sequencing has led to the rapid accumulation of multiple types of data for functional DNA elements. With the advent of precision medicine, the fine-mapping of risk loci based on these elements has become of paramount importance. In this study, we obtained the human reference genome (GRCh38) and the main DNA sequence elements, including protein-coding genes, miRNAs, lncRNAs and single nucleotide polymorphism flanking sequences, from different repositories. We then realigned these elements to identify their exact locations on the genome. Overall, 5%–20% of all sequence element locations deviated among databases, on the scale of kilobase-pair to megabase-pair. These deviations even affected the selection of genome-wide association study risk-associated genes. Our results implied that the location information for functional DNA elements may deviate among public databases. Researchers should take care when using cross-database sources and should perform pilot sequence alignments before element location-based studies.

Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 643
Author(s):  
Thibaud Kuca ◽  
Brandy M. Marron ◽  
Joana G. P. Jacinto ◽  
Julia M. Paris ◽  
Christian Gerspach ◽  
...  

Genodermatosis such as hair disorders mostly follow a monogenic mode of inheritance. Congenital hypotrichosis (HY) belong to this group of disorders and is characterized by abnormally reduced hair since birth. The purpose of this study was to characterize the clinical phenotype of a breed-specific non-syndromic form of HY in Belted Galloway cattle and to identify the causative genetic variant for this recessive disorder. An affected calf born in Switzerland presented with multiple small to large areas of alopecia on the limbs and on the dorsal part of the head, neck, and back. A genome-wide association study using Swiss and US Belted Galloway cattle encompassing 12 cases and 61 controls revealed an association signal on chromosome 29. Homozygosity mapping in a subset of cases refined the HY locus to a 1.5 Mb critical interval and subsequent Sanger sequencing of protein-coding exons of positional candidate genes revealed a stop gain variant in the HEPHL1 gene that encodes a multi-copper ferroxidase protein so-called hephaestin like 1 (c.1684A>T; p.Lys562*). A perfect concordance between the homozygous presence of this most likely pathogenic loss-of-function variant and the HY phenotype was found. Genotyping of more than 700 purebred Swiss and US Belted Galloway cattle showed the global spread of the mutation. This study provides a molecular test that will permit the avoidance of risk matings by systematic genotyping of relevant breeding animals. This rare recessive HEPHL1-related form of hypotrichosis provides a novel large animal model for similar human conditions. The results have been incorporated in the Online Mendelian Inheritance in Animals (OMIA) database (OMIA 002230-9913).


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Yousef Rahimi ◽  
Mohammad Reza Bihamta ◽  
Alireza Taleei ◽  
Hadi Alipour ◽  
Pär K. Ingvarsson

Abstract Background Identification of loci for agronomic traits and characterization of their genetic architecture are crucial in marker-assisted selection (MAS). Genome-wide association studies (GWAS) have increasingly been used as potent tools in identifying marker-trait associations (MTAs). The introduction of new adaptive alleles in the diverse genetic backgrounds may help to improve grain yield of old or newly developed varieties of wheat to balance supply and demand throughout the world. Landraces collected from different climate zones can be an invaluable resource for such adaptive alleles. Results GWAS was performed using a collection of 298 Iranian bread wheat varieties and landraces to explore the genetic basis of agronomic traits during 2016–2018 cropping seasons under normal (well-watered) and stressed (rain-fed) conditions. A high-quality genotyping by sequencing (GBS) dataset was obtained using either all original single nucleotide polymorphism (SNP, 10938 SNPs) or with additional imputation (46,862 SNPs) based on W7984 reference genome. The results confirm that the B genome carries the highest number of significant marker pairs in both varieties (49,880, 27.37%) and landraces (55,086, 28.99%). The strongest linkage disequilibrium (LD) between pairs of markers was observed on chromosome 2D (0.296). LD decay was lower in the D genome, compared to the A and B genomes. Association mapping under two tested environments yielded a total of 313 and 394 significant (−log10P >3) MTAs for the original and imputed SNP data sets, respectively. Gene ontology results showed that 27 and 27.5% of MTAs of SNPs in the original set were located in protein-coding regions for well-watered and rain-fed conditions, respectively. While, for the imputed data set 22.6 and 16.6% of MTAs represented in protein-coding genes for the well-watered and rain-fed conditions, respectively. Conclusions Our finding suggests that Iranian bread wheat landraces harbor valuable alleles that are adaptive under drought stress conditions. MTAs located within coding genes can be utilized in genome-based breeding of new wheat varieties. Although imputation of missing data increased the number of MTAs, the fraction of these MTAs located in coding genes were decreased across the different sub-genomes.


2017 ◽  
Author(s):  
Filip Ruzicka ◽  
Mark S. Hill ◽  
Tanya M. Pennell ◽  
Ilona Flis ◽  
Fiona C. Ingleby ◽  
...  

The evolution of sexual dimorphism is constrained by a shared genome, leading to ‘sexual antagonism’ where different alleles at given loci are favoured by selection in males and females. Despite its wide taxonomic incidence, we know little about the identity, genomic location and evolutionary dynamics of antagonistic genetic variants. To address these deficits, we use sex-specific fitness data from 202 fully sequenced hemiclonal D. melanogaster fly lines to perform a genome-wide association study of sexual antagonism. We identify ~230 chromosomal clusters of candidate antagonistic SNPs. In contradiction to classic theory, we find no clear evidence that the X chromosome is a hotspot for sexually antagonistic variation. Characterising antagonistic SNPs functionally, we find a large excess of missense variants but little enrichment in terms of gene function. We also assess the evolutionary persistence of antagonistic variants by examining extant polymorphism in wild D. melanogaster populations. Remarkably, antagonistic variants are associated with multiple signatures of balancing selection across the D. melanogaster distribution range, indicating widespread and evolutionarily persistent (>10,000 years) genomic constraints. Based on our results, we propose that antagonistic variation accumulates due to constraints on the resolution of sexual conflict over protein coding sequences, thus contributing to the long-term maintenance of heritable fitness variation.


2019 ◽  
Author(s):  
Delesa Damena ◽  
Emile R. Chimusa

ABSTRACTObjectiveEstimating SNP-heritability (h2g) of severe malaria/resistance and its distribution across the genome might shed new light in to the underlying biology.MethodWe investigated h2g of severe malaria susceptibility and resistance from genome-wide association study (GWAS) dataset (sample size =11, 657). We partitioned the h2g in to chromosomes, allele frequencies and annotations. We further examined none-cell type specific and cell type specific enrichments from GWAS-summary statistics.ResultsWe estimated the h2g of severe malaria at 0.21 (se=0.05, p=2.7×10−5), 0.20 (se =0.05, p=7.5×10−5) and 0.17 (se =0.05, p= 7.2×10−4) in Gambian, Kenyan and Malawi populations, respectively. The h2g attributed to the GWAS significant SNPs and the well-known sickle cell (HbS) variant was approximately 0.07 and 0.03, respectively. We prepared African population reference panel and obtained comparable h2g estimate (0.21 (se = 0.02, p< 1×10−5)) from GWAS-summary statistics meta-analysed across the three populations. Partitioning analysis from raw genotype data showed significant enrichment of h2g in protein coding genic SNPs while summary statistics analysis suggests pattern of enrichment in multiple categories.ConclusionWe report for the first time that the heritability of malaria susceptibility and resistance is largely ascribed by common SNPs and the causal variants are overrepresented in protein coding regions of the genome. Overall, our results suggest that malaria susceptibility and resistance is a polygenic trait. Further studies with larger sample sizes are needed to better understand the underpinning genetics of resistance and susceptibility to severe malaria.


Blood ◽  
2019 ◽  
Vol 133 (7) ◽  
pp. 724-729 ◽  
Author(s):  
Maoxiang Qian ◽  
Heng Xu ◽  
Virginia Perez-Andreu ◽  
Kathryn G. Roberts ◽  
Hui Zhang ◽  
...  

Abstract Acute lymphoblastic leukemia (ALL) is the most common malignancy in children. Characterized by high levels of Native American ancestry, Hispanics are disproportionally affected by this cancer with high incidence and inferior survival. However, the genetic basis for this disparity remains poorly understood because of a paucity of genome-wide investigation of ALL in Hispanics. Performing a genome-wide association study (GWAS) in 940 Hispanic children with ALL and 681 ancestry-matched non-ALL controls, we identified a novel susceptibility locus in the ERG gene (rs2836365; P = 3.76 × 10−8; odds ratio [OR] = 1.56), with independent validation (P = .01; OR = 1.43). Imputation analyses pointed to a single causal variant driving the association signal at this locus overlapping with putative regulatory DNA elements. The effect size of the ERG risk variant rose with increasing Native American genetic ancestry. The ERG risk genotype was underrepresented in ALL with the ETV6-RUNX1 fusion (P &lt; .0005) but enriched in the TCF3-PBX1 subtype (P &lt; .05). Interestingly, ALL cases with germline ERG risk alleles were significantly less likely to have somatic ERG deletion (P &lt; .05). Our results provide novel insights into genetic predisposition to ALL and its contribution to racial disparity in this cancer.


2020 ◽  
Author(s):  
Fritz J. Sedlazeck ◽  
Bing Yu ◽  
Adam J. Mansfield ◽  
Han Chen ◽  
Olga Krasheninina ◽  
...  

AbstractGenome sequencing at population scale provides unprecedented access to the genetic foundations of human phenotypic diversity, but genotype-phenotype association analyses limited to small variants have failed to comprehensively characterize the genetic architecture of human health and disease because they ignore structural variants (SVs) known to contribute to phenotypic variation and pathogenic conditions1–3. Here we demonstrate the significance of SVs when assessing genotype-phenotype associations and the importance of ethnic diversity in study design by analyzing SVs across 19,652 individuals and the translational impact on 4,156 aptamerbased proteomic measurements across 4,021 multi-ethnic samples. The majority of 304,533 SVs detected are rare, although we identified 2,336 protein-coding genes impacted by common SVs.\We identified 64 significant SV-protein associations that comprise 36 cis- and 28 trans-acting relationships, and 21 distinct SV regions overlapped with genome-wide association study loci. These findings represent a more comprehensive mapping of regulatory and translational endophenotypes underlying health and disease.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243791
Author(s):  
Caitlin Mills ◽  
Anushya Muruganujan ◽  
Dustin Ebert ◽  
Crystal N. Marconett ◽  
Juan Pablo Lewinger ◽  
...  

Enhancers are powerful and versatile agents of cell-type specific gene regulation, which are thought to play key roles in human disease. Enhancers are short DNA elements that function primarily as clusters of transcription factor binding sites that are spatially coordinated to regulate expression of one or more specific target genes. These regulatory connections between enhancers and target genes can therefore be characterized as enhancer-gene links that can affect development, disease, and homeostatic cellular processes. Despite their implication in disease and the establishment of cell identity during development, most enhancer-gene links remain unknown. Here we introduce a new, publicly accessible database of predicted enhancer-gene links, PEREGRINE. The PEREGRINE human enhancer-gene links interactive web interface incorporates publicly available experimental data from ChIA-PET, eQTL, and Hi-C assays across 78 cell and tissue types to link 449,627 enhancers to 17,643 protein-coding genes. These enhancer-gene links are made available through the new Enhancer module of the PANTHER database and website where the user may easily access the evidence for each enhancer-gene link, as well as query by target gene and enhancer location.


2020 ◽  
Vol 21 (8) ◽  
pp. 509-520
Author(s):  
Natália D Linhares ◽  
Daniela A Pereira ◽  
Izabela MCA Conceição ◽  
Glória R Franco ◽  
Walter L Eckalbar ◽  
...  

Aim: GDF15 levels are a biomarker for metformin use. We performed the functional annotation of noncoding genome-wide association study (GWAS) SNPs for GDF15 levels and the Genotype-Tissue Expression (GTEx)-expression quantitative trait loci (eQTLs) for GDF15 expression within metformin-activated enhancers around GDF15. Materials & methods: These enhancers were identified using chromatin immunoprecipitation followed by sequencing data for active (H3K27ac) and silenced (H3K27me3) histone marks on human hepatocytes treated with metformin, Encyclopedia of DNA Elements data and cis-regulatory elements assignment tools. Results: The GWAS lead SNP rs888663, the SNP rs62122429 associated with GDF15 levels in the Outcome Reduction with Initial Glargine Intervention trial, and the GTEx-expression quantitative trait locus rs4808791 for GDF15 expression in whole blood are located in a metformin-activated enhancer upstream of GDF15 and tightly linked in Europeans and East Asians. Conclusion: Noncoding variation within a metformin-activated enhancer may increase GDF15 expression and help to predict GDF15 levels.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yaozhong Liu ◽  
Na Liu ◽  
Fan Bai ◽  
Qiming Liu

Background: Atrial fibrillation (AF) is the most common arrhythmia. We aimed to construct competing endogenous RNA (ceRNA) networks associated with the susceptibility and persistence of AF by applying the weighted gene co-expression network analysis (WGCNA) and prioritize key genes using the random walk with restart on multiplex networks (RWR-M) algorithm.Methods: RNA sequencing results from 235 left atrial appendage samples were downloaded from the GEO database. The top 5,000 lncRNAs/mRNAs with the highest variance were used to construct a gene co-expression network using the WGCNA method. AF susceptibility- or persistence-associated modules were identified by correlating the module eigengene with the atrial rhythm phenotype. Using a module-specific manner, ceRNA pairs of lncRNA–mRNA were predicted. The RWR-M algorithm was applied to calculate the proximity between lncRNAs and known AF protein-coding genes. Random forest classifiers, based on the expression value of key lncRNA-associated ceRNA pairs, were constructed and validated against an independent data set.Results: From the 21 identified modules, magenta and tan modules were associated with AF susceptibility, whereas turquoise and yellow modules were associated with AF persistence. ceRNA networks in magenta and tan modules were primarily involved in the inflammatory process, whereas ceRNA networks in turquoise and yellow modules were primarily associated with electrical remodeling. A total of 106 previously identified AF-associated protein-coding genes were found in the ceRNA networks, including 16 that were previously implicated in the genome-wide association study. Myocardial infarction–associated transcript (MIAT) and LINC00964 were prioritized as key lncRNAs through RWR-M. The classifiers based on their associated ceRNA pairs were able to distinguish AF from sinus rhythm with respective AUC values of 0.810 and 0.940 in the training set and 0.870 and 0.922 in the independent test set. The AF-related single-nucleotide polymorphism rs35006907 was found in the intronic region of LINC00964 and negatively regulated the LINC00964 expression.Conclusion: Our study constructed AF susceptibility- and persistence-associated ceRNA networks, linked genetics with epigenetics, identified MIAT and LINC00964 as key lncRNAs, and constructed random forest classifiers based on their associated ceRNA pairs. These results will help us to better understand the mechanisms underlying AF from the ceRNA perspective and provide candidate therapeutic and diagnostic tools.


2020 ◽  
Author(s):  
Manisha Ray ◽  
Saurav Sarkar ◽  
Surya Narayan Rath ◽  
Mukund Namdev Sable

AbstractThe COVID-19 pandemic is having a devastating effect on the healthcare system and the economy of the world. The unavailability of a specific treatment regime and a candidate vaccine yet opens up scope for new approaches and discoveries of drugs for mitigation of the sufferings of humankind due to the disease. The present isolated whole-genome sequences of SARS-CoV-2 from 11 different nations subjected to evolutionary study and genome-wide association study through in silico approaches including multiple sequence alignment, phylogenetic study through MEGA7 and have been analyzed through DNAsp respectively. These investigations recognized the nucleotide varieties and single nucleotide mutations/polymorphisms on the genomic regions as well as protein-coding regions. The resulted mutations have diversified the genomic contents of SARS-CoV-2 according to the altered nucleotides found in 11 genome sequences. India and Nepal have found to have progressively more distinct species of SARS-CoV-2 with variations in Spike protein and Nucleocapsid protein-coding sites. These genomic variations might be the explanation behind the less case fatality rate of India and Nepal dependent on the populaces. The anticipated idea of this investigation upgrades the information about genomic medication and might be useful in the planning of antibodies against SARS-CoV-2.


Sign in / Sign up

Export Citation Format

Share Document