scholarly journals A method to build extended sequence context models of point mutations and indels

2021 ◽  
Author(s):  
Jörn Bethune ◽  
April Kleppe ◽  
Søren Besenbacher

AbstractThe mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers using IUPAC patterns. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. Revealing that for some mutation types, the mutation rate of a position is significantly affected by nucleotides that are up to four base pairs away. As the first method of its kind, it does not only predict rates for point mutations but also indels. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong constraint, e.g. haploinsufficient genes.

2016 ◽  
Author(s):  
Ann-Marie Oppold ◽  
Markus Pfenninger

AbstractMutations are the ultimate basis of evolution, yet their occurrence rate is known only for few species. We directly estimated the spontaneous mutation rate and the mutational spectrum in the non-biting midge C. riparius with a new approach. Individuals from ten mutation accumulation lines over five generations were deep genome sequenced to count de novo mutations (DNMs) that were not present in a pool of F1 individuals, representing parental genotypes. We identified 51 new single site mutations of which 25 were insertions or deletions and 26 single point mutations. This shift in the mutational spectrum compared to other organisms was explained by the high A/T content of the species. We estimated a haploid mutation rate of 2.1 x 10−9 (95% confidence interval: 1.4 x 10−9 – 3.1 x 10−9) which is in the range of recent estimates for other insects and supports the drift barrier hypothesis. We show that accurate mutation rate estimation from a high number of observed mutations is feasible with moderate effort even for non-model species.


2018 ◽  
Author(s):  
Rachael C. Aikens ◽  
Kelsey E. Johnson ◽  
Benjamin F. Voight

ABSTRACTOur understanding of mutation rate helps us build evolutionary models and make sense of genetic variation. Recent work indicates that the frequencies of specific mutation types have been elevated in Europe, and that many more, subtler signatures of global polymorphism variation may yet remain unidentified. Here, we present an analysis of the 1,000 Genomes Project (phase 3), suggesting additional putative signatures of mutation rate variation across populations and the extent to which they are shaped by local sequence context. First, we compiled a list of the most significantly variable polymorphism types in a cross-continental statistical test. Clustering polymorphisms together, we observed four sets of substitution types that showed similar trends of relative mutation rate across populations, and describe the patterns of these mutational clusters among continental groups. For the majority of these signatures, we found that a single flanking base pair of sequence context was sufficient to determine the majority of enrichment or depletion of a mutation type. However, local genetic context up to 2-3 base pairs away contributes additional variability, and helps to interpret a previously noted enrichment of certain polymorphism types in some East Asian groups. Building our understanding of mutation rate in this way can help us to construct more accurate evolutionary models and better understand the mechanisms that underlie genetic change.


2008 ◽  
Vol 73 (1) ◽  
pp. 41-53
Author(s):  
Aleksandra Rakic ◽  
Petar Mitrasinovic

The present study characterizes using molecular dynamics simulations the behavior of the GAA (1186-1188) hairpin triloops with their closing c-g base pairs in large ribonucleoligand complexes (PDB IDs: 1njn, 1nwy, 1jzx). The relative energies of the motifs in the complexes with respect to that in the reference structure (unbound form of rRNA; PDB ID: 1njp) display the trends that agree with those of the conformational parameters reported in a previous study1 utilizing the de novo pseudotorsional (?,?) approach. The RNA regions around the actual RNA-ligand contacts, which experience the most substantial conformational changes upon formation of the complexes were identified. The thermodynamic parameters, based on a two-state conformational model of RNA sequences containing 15, 21 and 27 nucleotides in the immediate vicinity of the particular binding sites, were evaluated. From a more structural standpoint, the strain of a triloop, being far from the specific contacts and interacting primarily with other parts of the ribosome, was established as a structural feature which conforms to the trend of the average values of the thermodynamic variables corresponding to the three motifs defined by the 15-, 21- and 27-nucleotide sequences. From a more functional standpoint, RNA-ligand recognition is suggested to be presumably dictated by the types of ligands in the complexes.


2020 ◽  
pp. 411-425 ◽  
Author(s):  
Jing Zhao ◽  
Yang Xia

PURPOSE HER2 is a critical gene that drives various solid tumors in addition to those of breast cancer. For example, HER2 plays a role in non–small-cell lung cancer (NSCLC). Overexpression, amplification, and point mutations in HER2 have been described in patients with NSCLC; however, the potential roles of these alterations remain unclear. METHODS We summarize the evidence regarding the distinct impacts of different HER2 aberrations on antitumor agents. Also, we update the therapeutic efficacy of HER2-targeted agents, including anti-HER2 antibodies, antibody-drug conjugates, and small-molecule tyrosine kinase inhibitors, tested in HER2-aberrant NSCLC. RESULTS Although these drugs are not yet standard treatments, certain patients may benefit from these therapies. In this review, we aim to provide an improved understanding of HER2 aberrations in NSCLC, including NSCLC biology and the impacts of each aberration on prognosis and standard treatment. We also highlight the potential of novel anti-HER2 therapies approved by regulatory bodies and those in clinical development. CONCLUSION Compared with HER2 amplification or overexpression, HER2 mutations, especially HER2 exon 20 mutations, are emerging as the most clear targetable driver for HER2-directed therapies in lung cancer. De novo and inducible HER2 pathway activation need to be differentially managed. Further investigations with new strategies are needed.


2008 ◽  
Vol 190 (12) ◽  
pp. 4263-4271 ◽  
Author(s):  
Alexis I. Cocozaki ◽  
Ingrid R. Ghattas ◽  
Colin A. Smith

ABSTRACT Transcription antitermination in phages λ and P22 uses N proteins that bind to similar boxB RNA hairpins in regulated transcripts. In contrast to the λ N-boxB interaction, the P22 N-boxB interaction has not been extensively studied. A nuclear magnetic resonance structure of the P22 N peptide boxBleft complex and limited mutagenesis have been reported but do not reveal a consensus sequence for boxB. We have used a plasmid-based antitermination system to screen boxBs with random loops and to test boxB mutants. We find that P22 N requires boxB to have a GNRA-like loop with no simple requirements on the remaining sequences in the loop or stem. U:A or A:U base pairs are strongly preferred adjacent to the loop and appear to modulate N binding in cooperation with the loop and distal stem. A few GNRA-like hexaloops have moderate activity. Some boxB mutants bind P22 and λ N, indicating that the requirements imposed on boxB by P22 N overlap those imposed by λ N. Point mutations can dramatically alter boxB specificity between P22 and λ N. A boxB specific for P22 N can be mutated to λ N specificity by a series of single mutations via a bifunctional intermediate, as predicted by neutral theories of evolution.


2019 ◽  
Author(s):  
Glenn Hickey ◽  
David Heller ◽  
Jean Monlong ◽  
Jonas A. Sibbesen ◽  
Jouni Sirén ◽  
...  

AbstractStructural variants (SVs) remain challenging to represent and study relative to point mutations despite their demonstrated importance. We show that variation graphs, as implemented in the vg toolkit, provide an effective means for leveraging SV catalogs for short-read SV genotyping experiments. We benchmarked vg against state-of-the-art SV genotypers using three sequence-resolved SV catalogs generated by recent long-read sequencing studies. In addition, we use assemblies from 12 yeast strains to show that graphs constructed directly from aligned de novo assemblies improve genotyping compared to graphs built from intermediate SV catalogs in the VCF format.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7500 ◽  
Author(s):  
Mikhail I. Schelkunov ◽  
Maxim S. Nuraliev ◽  
Maria D. Logacheva

Although most plant species are photosynthetic, several hundred species have lost the ability to photosynthesize and instead obtain nutrients via various types of heterotrophic feeding. Their plastid genomes markedly differ from the plastid genomes of photosynthetic plants. In this work, we describe the sequenced plastid genome of the heterotrophic plant Rhopalocnemis phalloides, which belongs to the family Balanophoraceae and feeds by parasitizing other plants. The genome is highly reduced (18,622 base pairs vs. approximately 150 kbp in autotrophic plants) and possesses an extraordinarily high AT content, 86.8%, which is inferior only to AT contents of plastid genomes of Balanophora, a genus from the same family. The gene content of this genome is quite typical of heterotrophic plants, with all of the genes related to photosynthesis having been lost. The remaining genes are notably distorted by a high mutation rate and the aforementioned AT content. The high AT content has led to sequence convergence between some of the remaining genes and their homologs from AT-rich plastid genomes of protists. Overall, the plastid genome of R. phalloides is one of the most unusual plastid genomes known.


2018 ◽  
Author(s):  
Avantika Lal ◽  
Keli Liu ◽  
Robert Tibshirani ◽  
Arend Sidow ◽  
Daniele Ramazzotti

AbstractCancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates DNA replication error as a background, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using standard metrics. We then apply SparseSignatures to whole genome sequences of 147 tumors from pancreatic cancer, discovering 8 signatures in addition to the background.


Sign in / Sign up

Export Citation Format

Share Document