Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome

Abstract Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.

Download Full-text

Association Analysis of Genetic Variants with Type 2 Diabetes in a Mongolian Population in China

Journal of Diabetes Research ◽

10.1155/2015/613236 ◽

2015 ◽

Vol 2015 ◽

pp. 1-7 ◽

Cited By ~ 11

Author(s):

Haihua Bai ◽

Haiping Liu ◽

Suyalatu Suyalatu ◽

Xiaosen Guo ◽

Shandan Chu ◽

...

Keyword(s):

Type 2 Diabetes ◽

Large Scale ◽

Risk Allele ◽

Association Studies ◽

Northern China ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide

The large scale genome wide association studies (GWAS) have identified approximately 80 single nucleotide polymorphisms (SNPs) conferring susceptibility to type 2 diabetes (T2D). However, most of these loci have not been replicated in diverse populations and much genetic heterogeneity has been observed across ethnic groups. We tested 28 SNPs previously found to be associated with T2D by GWAS in a Mongolian sample of Northern China (497 diagnosed with T2D and 469 controls) for association with T2D and diabetes related quantitative traits. We replicated T2D association of 11 SNPs, namely, rs7578326 (IRS1), rs1531343 (HMGA2), rs8042680 (PRC1), rs7578597 (THADA), rs1333051 (CDKN2), rs6723108 (TMEM163), rs163182 and rs2237897 (KCNQ1), rs1387153 (MTNR1B), rs243021 (BCL11A), and rs10229583 (PAX4) in our sample. Further, we showed that risk allele of the strongest T2D associated SNP in our sample, rs757832 (IRS1), is associated with increased level of TG. We observed substantial difference of T2D risk allele frequency between the Mongolian sample and the 1000G Caucasian sample for a few SNPs, including rs6723108 (TMEM163) whose risk allele reaches near fixation in the Mongolian sample. Further study of genetic architecture of these variants in susceptibility of T2D is needed to understand the role of these variants in heterogeneous populations.

Download Full-text

Type 2 Diabetes (T2D) Associated Polymorphisms Regulate Expression of Adjacent Transcripts in Transformed Lymphocytes, Adipose, and Muscle from Caucasian and African-American Subjects

The Journal of Clinical Endocrinology & Metabolism ◽

10.1210/jc.2010-1754 ◽

2011 ◽

Vol 96 (2) ◽

pp. E394-E403 ◽

Cited By ~ 11

Author(s):

Neeraj K. Sharma ◽

Kurt A. Langberg ◽

Ashis K. Mondal ◽

Steven C. Elbein ◽

Swapan K. Das

Keyword(s):

Genome Wide Association ◽

Small Subset ◽

Nucleotide Polymorphisms ◽

Healthy Individuals ◽

Allelic Expression Imbalance ◽

Single Nucleotide ◽

Metabolic Traits ◽

Genome Wide

abstract Context: Genome-wide association scans (GWAS) have identified novel single nucleotide polymorphisms (SNPs) that increase T2D susceptibility and indicated the role of nearby genes in T2D pathogenesis. Objective: We hypothesized that T2D-associated SNPs act as cis-regulators of nearby genes in human tissues and that expression of these transcripts may correlate with metabolic traits, including insulin sensitivity (SI). Design, Settings, and Patients: Association of SNPs with the expression of their nearest transcripts was tested in adipose and muscle from 168 healthy individuals who spanned a broad range of SI and body mass index (BMI) and in transformed lymphocytes (TLs). We tested correlations between the expression of these transcripts in adipose and muscle with metabolic traits. Utilizing allelic expression imbalance (AEI) analysis we examined the presence of other cis-regulators for those transcripts in TLs. Results: SNP rs9472138 was significantly (P = 0.037) associated with the expression of VEGFA in TLs while rs6698181 was detected as a cis-regulator for the PKN2 in muscle (P = 0.00027) and adipose (P = 0.018). Significant association was also observed for rs17036101 (P = 0.001) with expression of SYN2 in adipose of Caucasians. Among 19 GWAS-implicated transcripts, expression of VEGFA in adipose was correlated with BMI (r = −0.305) and SI (r = 0.230). Although only a minority of the T2D-associated SNPs were validated as cis-eQTLs for nearby transcripts, AEI analysis indicated presence of other cis-regulatory polymorphisms in 54% of these transcripts. Conclusions: Our study suggests that a small subset of GWAS-identified SNPs may increase T2D susceptibility by modulating expression of nearby transcripts in adipose or muscle.

Download Full-text

Chromatin architectural proteins regulate flowering time by precluding gene looping

Science Advances ◽

10.1126/sciadv.abg3097 ◽

2021 ◽

Vol 7 (24) ◽

pp. eabg3097

Author(s):

Bo Zhao ◽

Yanpeng Xi ◽

Junghyun Kim ◽

Sibum Sung

Keyword(s):

Chromatin Structure ◽

Cellular Processes ◽

Genome Wide ◽

A Genome ◽

Evolutionarily Conserved ◽

Architectural Proteins ◽

Floral Repressor ◽

Flanking Regions ◽

Genome Wide Study

Chromatin structure is critical for gene expression and many other cellular processes. In Arabidopsis thaliana, the floral repressor FLC adopts a self-loop chromatin structure via bridging of its flanking regions. This local gene loop is necessary for active FLC expression. However, the molecular mechanism underlying the formation of this class of gene loops is unknown. Here, we report the characterization of a group of linker histone-like proteins, named the GH1-HMGA family in Arabidopsis, which act as chromatin architecture modulators. We demonstrate that these family members redundantly promote the floral transition through the repression of FLC. A genome-wide study revealed that this family preferentially binds to the 5′ and 3′ ends of gene bodies. The loss of this binding increases FLC expression by stabilizing the FLC 5′ to 3′ gene looping. Our study provides mechanistic insights into how a family of evolutionarily conserved proteins regulates the formation of local gene loops.

Download Full-text

Repli-seq: genome-wide analysis of replication timing by next-generation sequencing

10.1101/104653 ◽

2017 ◽

Cited By ~ 8

Author(s):

Claire Marchal ◽

Takayo Sasaki ◽

Daniel Vera ◽

Korey Wilson ◽

Jiao Sima ◽

...

Keyword(s):

Next Generation Sequencing ◽

Replication Timing ◽

Nucleotide Polymorphisms ◽

Robust Methods ◽

Next Generation ◽

Single Nucleotide ◽

Cellular Processes ◽

Genome Wide ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

ABSTRACTCycling cells duplicate their DNA content during S phase, following a defined program called replication timing (RT). Early and late replicating regions differ in terms of mutation rates, transcriptional activity, chromatin marks and sub-nuclear position. Moreover, RT is regulated during development and is altered in disease. Exploring mechanisms linking RT to other cellular processes in normal and diseased cells will be facilitated by rapid and robust methods with which to measure RT genome wide. Here, we describe a rapid, robust and relatively inexpensive protocol to analyze genome-wide RT by next-generation sequencing (NGS). This protocol yields highly reproducible results across laboratories and platforms. We also provide computational pipelines for analysis, parsing phased genomes using single nucleotide polymorphisms (SNP) for analyzing RT allelic asynchrony, and for direct comparison to Repli-chip data obtained by analyzing nascent DNA by microarrays.

Download Full-text

Allelic variation contributes to bacterial host specificity

Nature Communications ◽

10.1038/ncomms9754 ◽

2015 ◽

Vol 6 (1) ◽

Cited By ~ 46

Author(s):

Min Yue ◽

Xiangan Han ◽

Leon De Masi ◽

Chunhong Zhu ◽

Xun Ma ◽

...

Keyword(s):

Allelic Variation ◽

Multinomial Logistic Regression ◽

Gene Gain ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Serovar Typhimurium ◽

Functional Analyses ◽

High Degree

Abstract Understanding the molecular parameters that regulate cross-species transmission and host adaptation of potential pathogens is crucial to control emerging infectious disease. Although microbial pathotype diversity is conventionally associated with gene gain or loss, the role of pathoadaptive nonsynonymous single-nucleotide polymorphisms (nsSNPs) has not been systematically evaluated. Here, our genome-wide analysis of core genes within Salmonella enterica serovar Typhimurium genomes reveals a high degree of allelic variation in surface-exposed molecules, including adhesins that promote host colonization. Subsequent multinomial logistic regression, MultiPhen and Random Forest analyses of known/suspected adhesins from 580 independent Typhimurium isolates identifies distinct host-specific nsSNP signatures. Moreover, population and functional analyses of host-associated nsSNPs for FimH, the type 1 fimbrial adhesin, highlights the role of key allelic residues in host-specific adherence in vitro. Together, our data provide the first concrete evidence that functional differences between allelic variants of bacterial proteins likely contribute to pathoadaption to diverse hosts.

Download Full-text

The genomic footprint of coastal earthquake uplift

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2020.0712 ◽

2020 ◽

Vol 287 (1930) ◽

pp. 20200712 ◽

Cited By ~ 1

Author(s):

Elahe Parvizi ◽

Ceridwen I. Fraser ◽

Ludovic Dutoit ◽

Dave Craw ◽

Jonathan M. Waters

Keyword(s):

Large Scale ◽

Biological Evolution ◽

Habitat Choice ◽

Natural Experiments ◽

Ecological Constraints ◽

Dispersal Capacity ◽

Interspecific Differences ◽

Genome Wide ◽

Genetic Patterns

Theory suggests that catastrophic earth-history events can drive rapid biological evolution, but empirical evidence for such processes is scarce. Destructive geological events such as earthquakes can represent large-scale natural experiments for inferring such evolutionary processes. We capitalized on a major prehistoric (800 yr BP) geological uplift event affecting a southern New Zealand coastline to test for the lasting genomic impacts of disturbance. Genome-wide analyses of three co-distributed keystone kelp taxa revealed that post-earthquake recolonization drove the evolution of novel, large-scale intertidal spatial genetic ‘sectors’ which are tightly linked to geological fault boundaries. Demographic simulations confirmed that, following widespread extirpation, parallel expansions into newly vacant habitats rapidly restructured genome-wide diversity. Interspecific differences in recolonization mode and tempo reflect differing ecological constraints relating to habitat choice and dispersal capacity among taxa. This study highlights the rapid and enduring evolutionary effects of catastrophic ecosystem disturbance and reveals the key role of range expansion in reshaping spatial genetic patterns.

Download Full-text

Genetics of juvenile rheumatic diseases

10.1093/med/9780199642489.003.0043_update_002 ◽

2015 ◽

Author(s):

Anne Hinks ◽

Wendy Thomson

Keyword(s):

Risk Factors ◽

Rheumatic Diseases ◽

Large Scale ◽

Association Studies ◽

Genetic Diseases ◽

Response To Treatment ◽

Genome Wide Association Studies ◽

Established Risk Factor ◽

Genome Wide ◽

Juvenile Rheumatic Diseases

Juvenile rheumatic diseases are heterogeneous, complex genetic diseases; to date only juvenile idiopathic arthritis (JIA) has been extensively studied in terms of identifying genetic risk factors. The MHC region is a well-established risk factor but in the last few years candidate gene and large-scale genome-wide association studies have been utilized in the search for non-HLA risk factors. There are now 17 JIA susceptibility loci which reach the genome-wide significance threshold for association and a further 7 regions with evidence for association in more than one study. In addition, some subtype-specific associations are emerging. These risk loci now need to be investigated further using fine-mapping strategies and then appropriate functional studies to show how the variant alters the gene function. This knowledge will not only lead to a better understanding of disease pathogenesis for juvenile rheumatic diseases but may also aid in the classification of these heterogeneous diseases. It may identify new pathways for potential therapeutic targets and help in the prediction of disease outcome and response to treatment.

Download Full-text

Secure large-scale genome-wide association studies using homomorphic encryption

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918257117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11608-11613 ◽

Cited By ~ 1

Author(s):

Marcelo Blatt ◽

Alexander Gusev ◽

Yuriy Polyakov ◽

Shafi Goldwasser

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Association Studies ◽

Genome Wide Association ◽

Single Server ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

User Interactions ◽

Individual Level ◽

Genome Wide

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.

Download Full-text

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkz854 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D659-D667 ◽

Cited By ~ 2

Author(s):

Wenqian Yang ◽

Yanbo Yang ◽

Cecheng Zhao ◽

Kun Yang ◽

Dongyang Wang ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Genome Wide ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Download Full-text

The Presence and Localization of G-Quadruplex Forming Sequences in the Domain of Bacteria

Molecules ◽

10.3390/molecules24091711 ◽

2019 ◽

Vol 24 (9) ◽

pp. 1711 ◽

Cited By ~ 18

Author(s):

Martin Bartas ◽

Michaela Čutová ◽

Václav Brázda ◽

Patrik Kaura ◽

Jiří Šťastný ◽

...

Keyword(s):

Regulatory Sequences ◽

Bacterial Genomes ◽

Dna Structures ◽

Cellular Processes ◽

G Quadruplex ◽

Data Point ◽

Functional Relevance

The role of local DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, the significance of G-quadruplexes was demonstrated in the last decade, and their presence and functional relevance has been demonstrated in many genomes, including humans. In this study, we analyzed the presence and locations of G-quadruplex-forming sequences by G4Hunter in all complete bacterial genomes available in the NCBI database. G-quadruplex-forming sequences were identified in all species, however the frequency differed significantly across evolutionary groups. The highest frequency of G-quadruplex forming sequences was detected in the subgroup Deinococcus-Thermus, and the lowest frequency in Thermotogae. G-quadruplex forming sequences are non-randomly distributed and are favored in various evolutionary groups. G-quadruplex-forming sequences are enriched in ncRNA segments followed by mRNAs. Analyses of surrounding sequences showed G-quadruplex-forming sequences around tRNA and regulatory sequences. These data point to the unique and non-random localization of G-quadruplex-forming sequences in bacterial genomes.

Download Full-text