scholarly journals Mutation rate variations in the human genome are encoded in DNA shape

2021 ◽  
Author(s):  
Zian Liu ◽  
Md Abul Hassan Samee

AbstractSingle nucleotide mutation rates have critical implications for human evolution and genetic diseases. Accurate modeling of these mutation rates has long remained an open problem since the rates vary substantially across the human genome. A recent model, however, explained much of the variation by considering higher order nucleotide interactions in the local (7-mer) sequence context around mutated nucleotides. Despite this model’s predictive value, we still lack a clear understanding of the biophysical mechanisms underlying the variations in genome-wide mutation rates. DNA shape features are geometric measurements of DNA structural properties, such as helical twist and tilt, and are known to capture information on interactions between neighboring nucleotides within a local context. Motivated by this characteristic of DNA shape features, we used them to model mutation rates in the human genome. These DNA shape feature based models improved both the accuracy (up to 14%) and the interpretability over the current nucleotide sequence-based models. The models also discovered the specific shape features that capture the most variability in mutation rates, and distinguished between the most and the least mutated sequence contexts, thus characterizing mutation promoting properties of the genomic DNA. To our knowledge, this is the first attempt that demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future studies to incorporate DNA shape information in modeling genetic variations.

2015 ◽  
Vol 112 (15) ◽  
pp. 4654-4659 ◽  
Author(s):  
Tianyin Zhou ◽  
Ning Shen ◽  
Lin Yang ◽  
Namiko Abe ◽  
John Horton ◽  
...  

DNA binding specificities of transcription factors (TFs) are a key component of gene regulatory processes. Underlying mechanisms that explain the highly specific binding of TFs to their genomic target sites are poorly understood. A better understanding of TF−DNA binding requires the ability to quantitatively model TF binding to accessible DNA as its basic step, before additional in vivo components can be considered. Traditionally, these models were built based on nucleotide sequence. Here, we integrated 3D DNA shape information derived with a high-throughput approach into the modeling of TF binding specificities. Using support vector regression, we trained quantitative models of TF binding specificity based on protein binding microarray (PBM) data for 68 mammalian TFs. The evaluation of our models included cross-validation on specific PBM array designs, testing across different PBM array designs, and using PBM-trained models to predict relative binding affinities derived from in vitro selection combined with deep sequencing (SELEX-seq). Our results showed that shape-augmented models compared favorably to sequence-based models. Although both k-mer and DNA shape features can encode interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space. In addition, analyzing the feature weights of DNA shape-augmented models uncovered TF family-specific structural readout mechanisms that were not revealed by the DNA sequence. As such, this work combines knowledge from structural biology and genomics, and suggests a new path toward understanding TF binding and genome function.


2020 ◽  
Vol 10 (9) ◽  
pp. 3309-3319 ◽  
Author(s):  
Ajith V Pankajam ◽  
Suman Dash ◽  
Asma Saifudeen ◽  
Abhishek Dutta ◽  
Koodali T Nishant

Abstract A growing body of evidence suggests that mutation rates exhibit intra-species specific variation. We estimated genome-wide loss of heterozygosity (LOH), gross chromosomal changes, and single nucleotide mutation rates to determine intra-species specific differences in hybrid and homozygous strains of Saccharomyces cerevisiae. The mutation accumulation lines of the S. cerevisiae hybrid backgrounds - S288c/YJM789 (S/Y) and S288c/RM11-1a (S/R) were analyzed along with the homozygous diploids RM11, S288c, and YJM145. LOH was extensive in both S/Y and S/R hybrid backgrounds. The S/Y background also showed longer LOH tracts, gross chromosomal changes, and aneuploidy. Short copy number aberrations were observed in the S/R background. LOH data from the S/Y and S/R hybrids were used to construct a LOH map for S288c to identify hotspots. Further, we observe up to a sixfold difference in single nucleotide mutation rates among the S. cerevisiae S/Y and S/R genetic backgrounds. Our results demonstrate LOH is common during mitotic divisions in S. cerevisiae hybrids and also highlight genome-wide differences in LOH patterns and rates of single nucleotide mutations between commonly used S. cerevisiae hybrid genetic backgrounds.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Chao-Hsin Chen ◽  
Chao-Yu Pan ◽  
Wen-chang Lin

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.


2020 ◽  
Vol 205 (4) ◽  
pp. 1070-1083
Author(s):  
Mei San Tang ◽  
Emily R. Miraldi ◽  
Natasha M. Girgis ◽  
Richard A. Bonneau ◽  
P’ng Loke

2018 ◽  
Vol 10 (8) ◽  
pp. 1251 ◽  
Author(s):  
Boyu Liu ◽  
Jun Chen ◽  
Jiage Chen ◽  
Weiwei Zhang

Spectral and NDVI values have been used to calculate the change magnitudes of land cover, but may result in many pseudo-changes because of inter-class variance. Recently, the shape information of spectral or NDVI curves such as direction, angle, gradient, or other mathematical indicators have been used to improve the accuracy of land cover change detection. However, these measurements, in terms of the single shape features, can hardly capture the complete trends of curves affected by the unsynchronized phenology. Therefore, the calculated change magnitudes are indistinct such that changes and no-changes have a low contrast. This problem has prevented traditional change detection methods from achieving a higher accuracy using bi-temporal images or NDVI time series. In this paper, a multiple shape parameters-based change detection method is proposed by combining the spectral correlation operator and the shape features of NDVI temporal curves (phase angle cumulant, baseline cumulant, relative cumulation rate, and zero-crossing rate). The change magnitude is derived by integrating all the inter-annual differences of these shape parameters. The change regions are discriminated by an automated threshold selection method known as histogram concavity analysis. The results showed that the mean differences in the change magnitudes of the proposed method between 2100 changed and 2523 unchanged pixels was 32%, the overall accuracy was approximately 88%, and the kappa coefficient was 0.76. A comparative analysis was conducted with bi-temporal image-based methods and NDVI time series-based methods, and we demonstrate that the proposed method is more effective and robust than traditional methods in achieving high-contrast change magnitudes and accuracy.


2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Kyubum Lee ◽  
Mindy Clyne ◽  
Wei Yu ◽  
Zhiyong Lu ◽  
Muin J. Khoury

Abstract Understanding the drivers of research on human genes is a critical component to success of translation efforts of genomics into medicine and public health. Using publicly available curated online databases we sought to identify specific genes that are featured in translational genetic research in comparison to all genomics research publications. Articles in the CDC’s Public Health Genomics and Precision Health Knowledge Base were stratified into studies that have moved beyond basic research to population and clinical epidemiologic studies (T1: clinical and population human genome epidemiology research), and studies that evaluate, implement, and assess impact of genes in clinical and public health areas (T2+: beyond bench to bedside). We examined gene counts and numbers of publications within these phases of translation in comparison to all genes from Medline. We are able to highlight those genes that are moving from basic research to clinical and public health translational research, namely in cancer and a few genetic diseases with high penetrance and clinical actionability. Identifying human genes of translational value is an important step towards determining an evidence-based trajectory of the human genome in clinical and public health practice over time.


2017 ◽  
Vol 2017 ◽  
pp. 1-10
Author(s):  
Zheng Wang ◽  
Qingbiao Wu

Shape completion is an important task in the field of image processing. An alternative method is to capture the shape information and finish the completion by a generative model, such as Deep Boltzmann Machine. With its powerful ability to deal with the distribution of the shapes, it is quite easy to acquire the result by sampling from the model. In this paper, we make use of the hidden activation of the DBM and incorporate it with the convolutional shape features to fit a regression model. We compare the output of the regression model with the incomplete shape feature in order to set a proper and compact mask for sampling from the DBM. The experiment shows that our method can obtain realistic results without any prior information about the incomplete object shape.


2014 ◽  
Vol 70 (a1) ◽  
pp. C1163-C1163
Author(s):  
Anant Agrawal ◽  
Clara Kielkopf

Disease-causing mutations often occur in the polypyrimidine (Py) tract splice site signals of human gene transcripts. The essential U2AF65 protein recognizes Py tracts near 3´ splice sites and initiates assembly of the splicing "machine", a megadalton complex comprised of approximately 100 proteins and five small nuclear RNAs. Our prior structures of a shortened U2AF65 variant reveal the basis for nucleotide interactions at subset of binding sites. How intact U2AF65 recognizes the Py tract splice site signal remains unknown to date. We determined a 2.0 Å resolution structure of intact U2AF65 recognizing an optimal, all-uridine Py tract. The new structure and complementary biochemical experiments reveal integral roles for main-chain atoms of the interdomain linker and residues surrounding two core RNA recognition motifs (RRM1 and RRM2) in recognition of the Py tract. The new U2AF65 structural information sheds light on the splicing defects caused by Py tract mutations in human genetic diseases. We test the U2AF65 structural relationship for a representative Py mutation that causes X-linked retinitis pigmentosa.


2021 ◽  
Author(s):  
Li Ye ◽  
Chunquan Li ◽  
Jiquan Ma

The identification of enhancers has always been an important task in bioinformatics owing to their major role in regulating gene expression. For this reason, many computational algorithms devoted to enhancer identification have been put forward over the years, ranging from statistics and machine learning to the increasing popular deep learning. To boost the performance of their methods, more features tend to be extracted from the single DNA sequences and integrated to develop an ensemble classifier. Nevertheless, the sequence-derived features used in previous studies can hardly provide the 3D structure information of DNA sequences, which is regarded as an important factor affecting the binding preferences of transcription factors to regulatory elements like enhancers. Given that, we here propose DENIES, a deep learning based two-layer predictor for enhancing the identification of enhancers and their strength. Besides two common sequence-derived features (i.e. one-hot and k-mer), it introduces DNA shape for describing the 3D structures of DNA sequences. The results of performance comparison with a series of state-of-the-art methods conducted on the same datasets prove the effectiveness and robustness of our method. The code implementation of our predictor is freely available at https://github.com/hlju-liye/DENIES.


Sign in / Sign up

Export Citation Format

Share Document