scholarly journals A deep learning-based framework for estimating fine-scale germline mutation rates

2021 ◽  
Author(s):  
Yiyuan Fang ◽  
Shuyi Deng ◽  
Cai Li

Germline mutation rates are essential for genetic and evolutionary analyses. Yet, estimating accurate fine-scale mutation rates across the genome is a great challenge, due to relatively few observed mutations and intricate relationships between predictors and mutation rates. Here we present MuRaL (Mutation Rate Learner), a deep learning-based framework to predict fine-scale mutation rates using only genomic sequences as input. Harnessing human germline variants for comprehensive assessment, we show that MuRaL achieves better predictive performance than current state-of-the-art methods. Moreover, MuRaL can build models with relatively few training mutations and a moderate number of sequenced individuals. It can leverage transfer learning to build models with further less training data and time. We apply MuRaL to produce genome-wide mutation rate profiles for four species - Homo sapiens, Macaca mulatta, Arabidopsis thaliana and Drosophila melanogaster, demonstrating its high applicability. The generated mutation rate profiles and open source software can greatly facilitate related research.

2017 ◽  
Author(s):  
Jedidiah Carlson ◽  
Adam E Locke ◽  
Matthew Flickinger ◽  
Matthew Zawistowski ◽  
Shawn Levy ◽  
...  

AbstractA detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here we use ∼36 million singleton variants from 3,560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ∼46,000 de novo mutations, and confirm our estimates are more accurate than previously published estimates based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.


2017 ◽  
Author(s):  
Antoine Frénoy ◽  
Sebastian Bonhoeffer

AbstractThe stress-induced mutagenesis paradigm postulates that in response to stress, bacteria increase their genome-wide mutation rate, in turn increasing the chances that a descendant is able to withstand the stress. This has implications for antibiotic treatment: exposure to sub-inhibitory doses of antibiotics has been reported to increase bacterial mutation rates, and thus probably the rate at which resistance mutations appear and lead to treatment failure.Measuring mutation rates under stress, however, is problematic, because existing methods assume there is no death. Yet sub-inhibitory stress levels may induce a substantial death rate. Death events need to be compensated by extra replication to reach a given population size, thus giving more opportunities to acquire mutations. We show that ignoring death leads to a systematic overestimation of mutation rates under stress.We developed a system using plasmid segregation to measure death and growth rates simultaneously in bacterial populations. We use it to replicate classical experiments reporting antibiotic-induced mutagenesis. We found that a substantial death rate occurs at the tested sub-inhibitory concentrations, and taking this death into account lowers and sometimes removes the signal for stress-induced mutagenesis. Moreover even when antibiotics increase mutation rate, sub-inhibitory treatments do not increase genetic diversity and evolvability, again because of effects of the antibiotics on population dynamics.Beside showing that population dynamic is a crucial but neglected parameter affecting evolvability, we provide better experimental and computational tools to study evolvability under stress, leading to a re-assessment of the magnitude and significance of the stress-induced mutagenesis paradigm.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Michael Habig ◽  
Cecile Lorrain ◽  
Alice Feurtey ◽  
Jovan Komluski ◽  
Eva H. Stukenbrock

AbstractMutations are the source of genetic variation and the substrate for evolution. Genome-wide mutation rates appear to be affected by selection and are probably adaptive. Mutation rates are also known to vary along genomes, possibly in response to epigenetic modifications, but causality is only assumed. In this study we determine the direct impact of epigenetic modifications and temperature stress on mitotic mutation rates in a fungal pathogen using a mutation accumulation approach. Deletion mutants lacking epigenetic modifications confirm that histone mark H3K27me3 increases whereas H3K9me3 decreases the mutation rate. Furthermore, cytosine methylation in transposable elements (TE) increases the mutation rate 15-fold resulting in significantly less TE mobilization. Also accessory chromosomes have significantly higher mutation rates. Finally, we find that temperature stress substantially elevates the mutation rate. Taken together, we find that epigenetic modifications and environmental conditions modify the rate and the location of spontaneous mutations in the genome and alter its evolutionary trajectory.


2019 ◽  
Author(s):  
Michael D. Kessler ◽  
Douglas P. Loesch ◽  
James A. Perry ◽  
Nancy L. Heard-Costa ◽  
Brian E. Cade ◽  
...  

Abstractde novo Mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) program, we directly estimate and analyze DNM counts, rates, and spectra from 1,465 trios across an array of diverse human populations. Using the resulting call set of 86,865 single nucleotide DNMs, we find a significant positive correlation between local recombination rate and local DNM rate, which together can explain up to 35.5% of the genome-wide variation in population level rare genetic variation from 41K unrelated TOPMed samples. While genome-wide heterozygosity does correlate weakly with DNM count, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, interestingly, we do find significantly fewer DNMs in Amish individuals compared with other Europeans, even after accounting for parental age and sequencing center. Specifically, we find significant reductions in the number of T→C mutations in the Amish, which seems to underpin their overall reduction in DNMs. Finally, we calculate near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by non-additive genetic effects and/or the environment, and that a less mutagenic environment may be responsible for the reduced DNM rate in the Amish.SignificanceHere we provide one of the largest and most diverse human de novo mutation (DNM) call sets to date, and use it to quantify the genome-wide relationship between local mutation rate and population-level rare genetic variation. While we demonstrate that the human single nucleotide mutation rate is similar across numerous human ancestries and populations, we also discover a reduced mutation rate in the Amish founder population, which shows that mutation rates can shift rapidly. Finally, we find that variation in mutation rates is not heritable, which suggests that the environment may influence mutation rates more significantly than previously realized.


Author(s):  
Zhuang Liu ◽  
Degen Huang ◽  
Kaiyu Huang ◽  
Zhuang Li ◽  
Jun Zhao

There is growing interest in the tasks of financial text mining. Over the past few years, the progress of Natural Language Processing (NLP) based on deep learning advanced rapidly. Significant progress has been made with deep learning showing promising results on financial text mining models. However, as NLP models require large amounts of labeled training data, applying deep learning to financial text mining is often unsuccessful due to the lack of labeled training data in financial fields. To address this issue, we present FinBERT (BERT for Financial Text Mining) that is a domain specific language model pre-trained on large-scale financial corpora. In FinBERT, different from BERT, we construct six pre-training tasks covering more knowledge, simultaneously trained on general corpora and financial domain corpora, which can enable FinBERT model better to capture language knowledge and semantic information. The results show that our FinBERT outperforms all current state-of-the-art models. Extensive experimental results demonstrate the effectiveness and robustness of FinBERT. The source code and pre-trained models of FinBERT are available online.


2020 ◽  
Author(s):  
Yun Zhang ◽  
Ling Wang ◽  
Xinqiao Wang ◽  
Chengyun Zhang ◽  
Jiamin Ge ◽  
...  

<p><b>Abstract:</b> Effective and rapid deep learning method to predict chemical reactions contributes to the research and development of organic chemistry and drug discovery. Despite the outstanding capability of deep learning in retrosynthesis and forward synthesis, predictions based on small chemical datasets generally result in low accuracy due to an insufficiency of reaction examples. Here, we introduce a new state art of method, which integrates transfer learning with transformer model to predict the outcomes of the Baeyer-Villiger reaction which is a representative small dataset reaction. The results demonstrate that introducing transfer learning strategy markedly improves the top-1 accuracy of the transformer-transfer learning model (81.8%) over that of the transformer-baseline model (58.4%). Moreover, we further introduce data augmentation to the input reaction SMILES, which allows for better performance and improves the accuracy of the transformer-transfer learning model (86.7%). In summary, both transfer learning and data augmentation methods significantly improve the predictive performance of transformer model, which are powerful methods used in chemistry field to eliminate the restriction of limited training data.</p>


2020 ◽  
Author(s):  
Yun Zhang ◽  
Ling Wang ◽  
Xinqiao Wang ◽  
Chengyun Zhang ◽  
Jiamin Ge ◽  
...  

<p><b>Abstract:</b> Effective and rapid deep learning method to predict chemical reactions contributes to the research and development of organic chemistry and drug discovery. Despite the outstanding capability of deep learning in retrosynthesis and forward synthesis, predictions based on small chemical datasets generally result in low accuracy due to an insufficiency of reaction examples. Here, we introduce a new state art of method, which integrates transfer learning with transformer model to predict the outcomes of the Baeyer-Villiger reaction which is a representative small dataset reaction. The results demonstrate that introducing transfer learning strategy markedly improves the top-1 accuracy of the transformer-transfer learning model (81.8%) over that of the transformer-baseline model (58.4%). Moreover, we further introduce data augmentation to the input reaction SMILES, which allows for better performance and improves the accuracy of the transformer-transfer learning model (86.7%). In summary, both transfer learning and data augmentation methods significantly improve the predictive performance of transformer model, which are powerful methods used in chemistry field to eliminate the restriction of limited training data.</p>


2018 ◽  
Author(s):  
Cai Li ◽  
Nicholas M. Luscombe

AbstractUnderstanding the patterns and genesis of germline de novo mutations is important for studying genome evolution and human diseases. Nucleosome organization is suggested to be a contributing factor to mutation rate variation across the genome. However, the small number of published de novo mutations and the low resolution of earlier nucleosome maps limited our understanding of how nucleosome organization affects germline mutation rates in the human genome. Here, we systematically investigated the relationship between nucleosome organization and fine-scale mutation rate variation by analyzing >300,000 de novo mutations from whole-genome trio sequencing and high-resolution nucleosome maps in human. We found that de novo mutation rates are elevated around strong, translationally stable nucleosomes, a previously under-appreciated aspect. We confirmed this observation having controlled for local sequence context and other potential confounding factors. Analysis of the underlying mutational processes suggests that the increased mutation rates around strong nucleosomes are shaped by a combination of low-fidelity replication, frequent DNA damage and insufficient/error-prone repair in these regions. Interestingly, strong nucleosomes are preferentially located in young SINE/LINE elements, implying frequent nucleosome re-positioning (i.e. shifting of dyad position) and their contribution to hypermutation at new retrotransposons during evolution. These findings provide novel insights into how chromatin organization affects germline mutation rates and have important implications in human genetics and genome evolution.


GigaScience ◽  
2021 ◽  
Vol 10 (10) ◽  
Author(s):  
Lucie A Bergeron ◽  
Søren Besenbacher ◽  
Mikkel H Schierup ◽  
Guojie Zhang

Abstract The lack of consensus methods to estimate germline mutation rates from pedigrees has led to substantial differences in computational pipelines in the published literature. Here, we answer Susanne Pfeifer's opinion piece discussing the pipeline choices of our recent article estimating the germline mutation rate of rhesus macaques (Macaca mulatta). We acknowledge the differences between the method that we applied and the one preferred by Pfeifer. Yet, we advocate for full transparency and justification of choices as long as rigorous comparison of pipelines remains absent because it is the only way to conclude on best practices for the field.


2021 ◽  
Author(s):  
Sanjeet Kumar ◽  
Kanika Bansal

COVID-19 has posed unforeseen circumstances and throttled major economies worldwide. India has witnessed two waves affecting around 31 million people representing 16% of the cases globally. To date, the epidemic waves have not been comprehensively investigated to understand pandemic progress in India. In the present study, we aim for a cross-sectional analysis since its first incidence up to 26th July 2021. We have performed the pan Indian evolutionary study using 20,086 high-quality complete genomes of SARS-CoV-2. Based on the number of cases reported and mutation rates, we could divide the Indian epidemic into seven different phases. First, three phases constituting the pre-first wave had a very less average mutation rate (<11), which increased in the first wave to 17 and then doubled in the second wave (~34). In accordance with the mutation rate, variants of concern (alpha, beta, gamma and delta) and interest (eta and kappa) also started appearing in the first wave (1.5% of the genomes), which dominated the second (~96% of genomes) and post-second wave (100% of genomes) phases. Whole genome-based phylogeny could demarcate the post-first wave isolates from previous ones by the point of diversification leading to incidences of VOCs and VOIs in India. Nation-wide mutational analysis depicted more than 0.5 million events with four major mutations in ~97% of the total 20,086 genomes in the study. These included two mutations in coding (spike (D614G) and NSP 12b (P314L) of RNA dependent RNA polymerase), one silent mutation (NSP3 F106F) and one extragenic mutation (5 UTR 241). Large scale genome-wide mutational analysis is crucial in expanding knowledge on evolution of deadly variants of SARS-CoV-2 and timely management of the pandemic.


Sign in / Sign up

Export Citation Format

Share Document