scholarly journals Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome

2021 ◽  
Vol 49 (3) ◽  
pp. 1497-1516
Author(s):  
Wilfried M Guiblet ◽  
Marzia A Cremona ◽  
Robert S Harris ◽  
Di Chen ◽  
Kristin A Eckert ◽  
...  

Abstract Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.

2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Haihua Bai ◽  
Haiping Liu ◽  
Suyalatu Suyalatu ◽  
Xiaosen Guo ◽  
Shandan Chu ◽  
...  

The large scale genome wide association studies (GWAS) have identified approximately 80 single nucleotide polymorphisms (SNPs) conferring susceptibility to type 2 diabetes (T2D). However, most of these loci have not been replicated in diverse populations and much genetic heterogeneity has been observed across ethnic groups. We tested 28 SNPs previously found to be associated with T2D by GWAS in a Mongolian sample of Northern China (497 diagnosed with T2D and 469 controls) for association with T2D and diabetes related quantitative traits. We replicated T2D association of 11 SNPs, namely, rs7578326 (IRS1), rs1531343 (HMGA2), rs8042680 (PRC1), rs7578597 (THADA), rs1333051 (CDKN2), rs6723108 (TMEM163), rs163182 and rs2237897 (KCNQ1), rs1387153 (MTNR1B), rs243021 (BCL11A), and rs10229583 (PAX4) in our sample. Further, we showed that risk allele of the strongest T2D associated SNP in our sample, rs757832 (IRS1), is associated with increased level of TG. We observed substantial difference of T2D risk allele frequency between the Mongolian sample and the 1000G Caucasian sample for a few SNPs, including rs6723108 (TMEM163) whose risk allele reaches near fixation in the Mongolian sample. Further study of genetic architecture of these variants in susceptibility of T2D is needed to understand the role of these variants in heterogeneous populations.


2011 ◽  
Vol 96 (2) ◽  
pp. E394-E403 ◽  
Author(s):  
Neeraj K. Sharma ◽  
Kurt A. Langberg ◽  
Ashis K. Mondal ◽  
Steven C. Elbein ◽  
Swapan K. Das

abstract Context: Genome-wide association scans (GWAS) have identified novel single nucleotide polymorphisms (SNPs) that increase T2D susceptibility and indicated the role of nearby genes in T2D pathogenesis. Objective: We hypothesized that T2D-associated SNPs act as cis-regulators of nearby genes in human tissues and that expression of these transcripts may correlate with metabolic traits, including insulin sensitivity (SI). Design, Settings, and Patients: Association of SNPs with the expression of their nearest transcripts was tested in adipose and muscle from 168 healthy individuals who spanned a broad range of SI and body mass index (BMI) and in transformed lymphocytes (TLs). We tested correlations between the expression of these transcripts in adipose and muscle with metabolic traits. Utilizing allelic expression imbalance (AEI) analysis we examined the presence of other cis-regulators for those transcripts in TLs. Results: SNP rs9472138 was significantly (P = 0.037) associated with the expression of VEGFA in TLs while rs6698181 was detected as a cis-regulator for the PKN2 in muscle (P = 0.00027) and adipose (P = 0.018). Significant association was also observed for rs17036101 (P = 0.001) with expression of SYN2 in adipose of Caucasians. Among 19 GWAS-implicated transcripts, expression of VEGFA in adipose was correlated with BMI (r = −0.305) and SI (r = 0.230). Although only a minority of the T2D-associated SNPs were validated as cis-eQTLs for nearby transcripts, AEI analysis indicated presence of other cis-regulatory polymorphisms in 54% of these transcripts. Conclusions: Our study suggests that a small subset of GWAS-identified SNPs may increase T2D susceptibility by modulating expression of nearby transcripts in adipose or muscle.


2021 ◽  
Vol 7 (24) ◽  
pp. eabg3097
Author(s):  
Bo Zhao ◽  
Yanpeng Xi ◽  
Junghyun Kim ◽  
Sibum Sung

Chromatin structure is critical for gene expression and many other cellular processes. In Arabidopsis thaliana, the floral repressor FLC adopts a self-loop chromatin structure via bridging of its flanking regions. This local gene loop is necessary for active FLC expression. However, the molecular mechanism underlying the formation of this class of gene loops is unknown. Here, we report the characterization of a group of linker histone-like proteins, named the GH1-HMGA family in Arabidopsis, which act as chromatin architecture modulators. We demonstrate that these family members redundantly promote the floral transition through the repression of FLC. A genome-wide study revealed that this family preferentially binds to the 5′ and 3′ ends of gene bodies. The loss of this binding increases FLC expression by stabilizing the FLC 5′ to 3′ gene looping. Our study provides mechanistic insights into how a family of evolutionarily conserved proteins regulates the formation of local gene loops.


2017 ◽  
Author(s):  
Claire Marchal ◽  
Takayo Sasaki ◽  
Daniel Vera ◽  
Korey Wilson ◽  
Jiao Sima ◽  
...  

ABSTRACTCycling cells duplicate their DNA content during S phase, following a defined program called replication timing (RT). Early and late replicating regions differ in terms of mutation rates, transcriptional activity, chromatin marks and sub-nuclear position. Moreover, RT is regulated during development and is altered in disease. Exploring mechanisms linking RT to other cellular processes in normal and diseased cells will be facilitated by rapid and robust methods with which to measure RT genome wide. Here, we describe a rapid, robust and relatively inexpensive protocol to analyze genome-wide RT by next-generation sequencing (NGS). This protocol yields highly reproducible results across laboratories and platforms. We also provide computational pipelines for analysis, parsing phased genomes using single nucleotide polymorphisms (SNP) for analyzing RT allelic asynchrony, and for direct comparison to Repli-chip data obtained by analyzing nascent DNA by microarrays.


2015 ◽  
Vol 6 (1) ◽  
Author(s):  
Min Yue ◽  
Xiangan Han ◽  
Leon De Masi ◽  
Chunhong Zhu ◽  
Xun Ma ◽  
...  

Abstract Understanding the molecular parameters that regulate cross-species transmission and host adaptation of potential pathogens is crucial to control emerging infectious disease. Although microbial pathotype diversity is conventionally associated with gene gain or loss, the role of pathoadaptive nonsynonymous single-nucleotide polymorphisms (nsSNPs) has not been systematically evaluated. Here, our genome-wide analysis of core genes within Salmonella enterica serovar Typhimurium genomes reveals a high degree of allelic variation in surface-exposed molecules, including adhesins that promote host colonization. Subsequent multinomial logistic regression, MultiPhen and Random Forest analyses of known/suspected adhesins from 580 independent Typhimurium isolates identifies distinct host-specific nsSNP signatures. Moreover, population and functional analyses of host-associated nsSNPs for FimH, the type 1 fimbrial adhesin, highlights the role of key allelic residues in host-specific adherence in vitro. Together, our data provide the first concrete evidence that functional differences between allelic variants of bacterial proteins likely contribute to pathoadaption to diverse hosts.


2020 ◽  
Vol 287 (1930) ◽  
pp. 20200712 ◽  
Author(s):  
Elahe Parvizi ◽  
Ceridwen I. Fraser ◽  
Ludovic Dutoit ◽  
Dave Craw ◽  
Jonathan M. Waters

Theory suggests that catastrophic earth-history events can drive rapid biological evolution, but empirical evidence for such processes is scarce. Destructive geological events such as earthquakes can represent large-scale natural experiments for inferring such evolutionary processes. We capitalized on a major prehistoric (800 yr BP) geological uplift event affecting a southern New Zealand coastline to test for the lasting genomic impacts of disturbance. Genome-wide analyses of three co-distributed keystone kelp taxa revealed that post-earthquake recolonization drove the evolution of novel, large-scale intertidal spatial genetic ‘sectors’ which are tightly linked to geological fault boundaries. Demographic simulations confirmed that, following widespread extirpation, parallel expansions into newly vacant habitats rapidly restructured genome-wide diversity. Interspecific differences in recolonization mode and tempo reflect differing ecological constraints relating to habitat choice and dispersal capacity among taxa. This study highlights the rapid and enduring evolutionary effects of catastrophic ecosystem disturbance and reveals the key role of range expansion in reshaping spatial genetic patterns.


Author(s):  
Anne Hinks ◽  
Wendy Thomson

Juvenile rheumatic diseases are heterogeneous, complex genetic diseases; to date only juvenile idiopathic arthritis (JIA) has been extensively studied in terms of identifying genetic risk factors. The MHC region is a well-established risk factor but in the last few years candidate gene and large-scale genome-wide association studies have been utilized in the search for non-HLA risk factors. There are now 17 JIA susceptibility loci which reach the genome-wide significance threshold for association and a further 7 regions with evidence for association in more than one study. In addition, some subtype-specific associations are emerging. These risk loci now need to be investigated further using fine-mapping strategies and then appropriate functional studies to show how the variant alters the gene function. This knowledge will not only lead to a better understanding of disease pathogenesis for juvenile rheumatic diseases but may also aid in the classification of these heterogeneous diseases. It may identify new pathways for potential therapeutic targets and help in the prediction of disease outcome and response to treatment.


2020 ◽  
Vol 117 (21) ◽  
pp. 11608-11613 ◽  
Author(s):  
Marcelo Blatt ◽  
Alexander Gusev ◽  
Yuriy Polyakov ◽  
Shafi Goldwasser

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.


2019 ◽  
Vol 48 (D1) ◽  
pp. D659-D667 ◽  
Author(s):  
Wenqian Yang ◽  
Yanbo Yang ◽  
Cecheng Zhao ◽  
Kun Yang ◽  
Dongyang Wang ◽  
...  

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.


Molecules ◽  
2019 ◽  
Vol 24 (9) ◽  
pp. 1711 ◽  
Author(s):  
Martin Bartas ◽  
Michaela Čutová ◽  
Václav Brázda ◽  
Patrik Kaura ◽  
Jiří Šťastný ◽  
...  

The role of local DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, the significance of G-quadruplexes was demonstrated in the last decade, and their presence and functional relevance has been demonstrated in many genomes, including humans. In this study, we analyzed the presence and locations of G-quadruplex-forming sequences by G4Hunter in all complete bacterial genomes available in the NCBI database. G-quadruplex-forming sequences were identified in all species, however the frequency differed significantly across evolutionary groups. The highest frequency of G-quadruplex forming sequences was detected in the subgroup Deinococcus-Thermus, and the lowest frequency in Thermotogae. G-quadruplex forming sequences are non-randomly distributed and are favored in various evolutionary groups. G-quadruplex-forming sequences are enriched in ncRNA segments followed by mRNAs. Analyses of surrounding sequences showed G-quadruplex-forming sequences around tRNA and regulatory sequences. These data point to the unique and non-random localization of G-quadruplex-forming sequences in bacterial genomes.


Sign in / Sign up

Export Citation Format

Share Document