sequence context
Recently Published Documents


TOTAL DOCUMENTS

360
(FIVE YEARS 90)

H-INDEX

45
(FIVE YEARS 5)

2021 ◽  
Author(s):  
Jörn Bethune ◽  
April Kleppe ◽  
Søren Besenbacher

AbstractThe mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers using IUPAC patterns. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. Revealing that for some mutation types, the mutation rate of a position is significantly affected by nucleotides that are up to four base pairs away. As the first method of its kind, it does not only predict rates for point mutations but also indels. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong constraint, e.g. haploinsufficient genes.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259185
Author(s):  
Claudia Perne ◽  
Sophia Peters ◽  
Maria Cartolano ◽  
Sukanya Horpaopan ◽  
Christina Grimm ◽  
...  

The spectrum of somatic genetic variation in colorectal adenomas caused by biallelic pathogenic germline variants in the MSH3 gene, was comprehensively analysed to characterise mutational signatures and identify potential driver genes and pathways of MSH3-related tumourigenesis. Three patients from two families with MSH3-associated polyposis were included. Whole exome sequencing of nine adenomas and matched normal tissue was performed. The amount of somatic variants in the MSH3-deficient adenomas and the pattern of single nucleotide variants (SNVs) was similar to sporadic adenomas, whereas the fraction of small insertions/deletions (indels) (21–42% of all small variants) was significantly higher. Interestingly, pathogenic somatic APC variants were found in all but one adenoma. The vast majority (12/13) of these were di-, tetra-, or penta-base pair (bp) deletions. The fraction of APC indels was significantly higher than that reported in patients with familial adenomatous polyposis (FAP) (p < 0.01) or in sporadic adenomas (p < 0.0001). In MSH3-deficient adenomas, the occurrence of APC indels in a repetitive sequence context was significantly higher than in FAP patients (p < 0.01). In addition, the MSH3-deficient adenomas harboured one to five (recurrent) somatic variants in 13 established or candidate driver genes for early colorectal carcinogenesis, including ACVR2A and ARID genes. Our data suggest that MSH3-related colorectal carcinogenesis seems to follow the classical APC-driven pathway. In line with the specific function of MSH3 in the mismatch repair (MMR) system, we identified a characteristic APC mutational pattern in MSH3-deficient adenomas, and confirmed further driver genes for colorectal tumourigenesis.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Theodore G. Smith ◽  
Anuli C. Uzozie ◽  
Siyuan Chen ◽  
Philipp F. Lange

AbstractThe local sequence context is the most fundamental feature determining the post-translational modification (PTM) of proteins. Recent technological improvements allow for the detection of new and less prevalent modifications. We found that established state-of-the-art algorithms for the detection of PTM motifs in complex datasets failed to keep up with this technological development and are no longer robust. To overcome this limitation, we developed RoLiM, a new linear motif deconvolution algorithm and webserver, that enables robust and unbiased identification of local amino acid sequence determinants in complex biological systems demonstrated here by the analysis of 68 modifications found across 30 tissues in the human draft proteome map. Furthermore, RoLiM analysis of a large-scale phosphorylation dataset comprising 30 kinase inhibitors of 10 protein kinases in the EGF signalling pathway identified prospective substrate motifs for PI3K and EGFR.


2021 ◽  
Author(s):  
Florian Störtz ◽  
Jeffrey Mak ◽  
Peter Minary

CRISPR/Cas programmable nuclease systems have become ubiquitous in the field of gene editing. With progressing development, applications in in vivo therapeutic gene editing are increasingly within reach, yet limited by possible adverse side effects from unwanted edits. Recent years have thus seen continuous development of off-target prediction algorithms trained on in vitro cleavage assay data gained from immortalised cell lines. Here, we implement novel deep learning algorithms and feature encodings for off-target prediction and systematically sample the resulting model space in order to find optimal models and inform future modelling efforts. We lay emphasis on physically informed features, hence terming our approach piCRISPR, which we gain on the large, diverse crisprSQL off-target cleavage dataset. We find that our best-performing model highlights the importance of sequence context and chromatin accessibility for cleavage prediction and outperforms state-of-the-art prediction algorithms in terms of area under precision-recall curve.


2021 ◽  
Author(s):  
David Bartee ◽  
Kellie D Nance ◽  
Jordan L Meier

N4-acetylcytidine (ac4C) is a post-transcriptional modification of RNA that is conserved across all domains of life. All characterized sites of ac4C in eukaryotic RNA occur in the central nucleotide of a CCG consensus sequence. However, the thermodynamic consequences of cytidine acetylation in this context have never been assessed due to its challenging synthesis. Here we report the synthesis and biophysical characterization of ac4C in its endogenous eukaryotic sequence context. First, we develop a synthetic route to homogenous RNAs containing electrophilic acetyl groups. Next, we use thermal denaturation to interrogate the effects of ac4C on duplex stability and mismatch discrimination in a native sequence found in human ribosomal RNA. Finally, we demonstrate the ability of this chemistry to incorporate ac4C into the complex modification landscape of human tRNA, and use duplex melting combined with sequence analysis to highlight a potentially unique enforcing role for ac4C in this setting. By enabling the analysis of nucleic acid acetylation in its physiological sequence context, these studies establish a chemical foundation for understanding the function of a universally-conserved nucleobase in biology and disease.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 996-996
Author(s):  
Marvyn T. Koning ◽  
Julieta Haydee Sepulveda Yanez ◽  
Diego Alvarez-Saravia ◽  
Bas Pilzecker ◽  
Pauline Van Schouwenburg ◽  
...  

Abstract Upon antigen recognition, activation-induced cytosine deaminase initiates affinity maturation of the B-cell receptor by somatic hypermutation (SHM) through error-prone DNA repair pathways. SHM typically creates single nucleotide substitutions, but tandem substitutions may also occur. While tandem substitutions have been described in mice and other species, the incidence of this phenomenon and its underlying mechanism in humans is currently unknown. We investigated incidence and sequence context of tandem substitutions by massive parallel sequencing of V(D)J repertoires in healthy human donors generated by unbiased ARTISAN PCR. Selection of unique, clonally unrelated, antigen-experienced sequences carrying up to 5% mutations yielded 13.532 VDJ, 7.952 VJ-kappa and 7.598 VJ-lambda. Comparison to the closest germline allele allowed for identification of a total of 122.878 single nucleotide substitutions (SNS), 10.735 tandem dinucleotide substitutions (TDNS) and 2.615 longer contiguous substitutions. After correcting for expected clusters of adjacent SNS, tandem substitutions comprised 5,7% of all AID-induced mutations. The mutation of more than one nucleotide in a single event, was shown to overcome amino acid codon redundancy and may therefore enhance the adaptive immune response. Clustering of such mutations around AID hotspots and their overall distribution indicates that tandem substitutions are an integral part of the SHM spectrum. In the majority of tandem substitutions, the mutated sequence may be identified in the directly adjacent reference sequence context. Tandem substitutions in humans therefore represent single nucleotide juxtalocations. Such juxtalocations appear to be favored in polydipyramidine stretches. These observations could be confirmed in patients with MSH2/6 deficiency, but were absent in a VDJ library from an UNG-deficient patient, indicating a strict dependence on abasic sites as an instigating mechanism. Together, these findings delineate a model where tandem substitutions are predominantly generated by translesion synthesis across an apyramidinic site that is typically created by UNG. During replication, apyrimidinic sites transiently adapt an extruded configuration, causing skipping of the extruded base. Consequent strand decontraction leads to the juxtalocation, after which exonucleases repair the apyramidinic site and any directly adjacent mismatched base pairs. The mismatch repair pathway appears to account for the remainder of tandem substitutions. Our study shows that a significant portion of mutations acquired during SHM are caused by tandem substitutions, and that this mechanism may enhance affinity maturation and expedite the adaptive immune response by overcoming amino acid codon degeneracies or mutating two adjacent amino acid residues simultaneously. Figure legend. Corrected incidence of tandem dinucleotide substitutions in healthy donors. (A) Dinucleotide substitutions from unique IGHV, IGKV and IGLV sequences and corrected after in silico predictions of dinucleotide substitutions that did not occur in tandem. Burgundy cells represent sequence inversions, light and dark purple cells represent juxtalocations of the 5' and 3' base in the pair (as seen from the non-transcribed strand), respectively. For unshaded cells, juxtalocation could not be assessed due to one or more nucleotides in the reference sequence matching the mutated sequence. (B) Relative contribution of sequence inversions and juxtalocations. Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
András Tálas ◽  
Dorottya A. Simon ◽  
Péter I. Kulcsár ◽  
Éva Varga ◽  
Sarah L. Krausz ◽  
...  

AbstractAdenine and cytosine base editors (ABE, CBE) allow for precision genome engineering. Here, Base Editor Activity Reporter (BEAR), a plasmid-based fluorescent tool is introduced, which can be applied to report on ABE and CBE editing in a virtually unrestricted sequence context or to label base edited cells for enrichment. Using BEAR-enrichment, we increase the yield of base editing performed by nuclease inactive base editors to the level of the nickase versions while maintaining significantly lower indel background. Furthermore, by exploiting the semi-high-throughput potential of BEAR, we examine whether increased fidelity SpCas9 variants can be used to decrease SpCas9-dependent off-target effects of ABE and CBE. Comparing them on the same target sets reveals that CBE remains active on sequences, where increased fidelity mutations and/or mismatches decrease the activity of ABE. Our results suggest that the deaminase domain of ABE is less effective to act on rather transiently separated target DNA strands, than that of CBE explaining its lower mismatch tolerance.


2021 ◽  
Vol 20 ◽  
pp. S277
Author(s):  
E. Gaines ◽  
R. Mancinone ◽  
S. Laflin ◽  
W. Wang ◽  
S. Rowe ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Andrew Savinov ◽  
Benjamin M. Brandsen ◽  
Brooke E. Angell ◽  
Josh T. Cuperus ◽  
Stanley Fields

Abstract Background The 3′ untranslated region (UTR) plays critical roles in determining the level of gene expression through effects on activities such as mRNA stability and translation. Functional elements within this region have largely been identified through analyses of native genes, which contain multiple co-evolved sequence features. Results To explore the effects of 3′ UTR sequence elements outside of native sequence contexts, we analyze hundreds of thousands of random 50-mers inserted into the 3′ UTR of a reporter gene in the yeast Saccharomyces cerevisiae. We determine relative protein expression levels from the fitness of transformants in a growth selection. We find that the consensus 3′ UTR efficiency element significantly boosts expression, independent of sequence context; on the other hand, the consensus positioning element has only a small effect on expression. Some sequence motifs that are binding sites for Puf proteins substantially increase expression in the library, despite these proteins generally being associated with post-transcriptional downregulation of native mRNAs. Our measurements also allow a systematic examination of the effects of point mutations within efficiency element motifs across diverse sequence backgrounds. These mutational scans reveal the relative in vivo importance of individual bases in the efficiency element, which likely reflects their roles in binding the Hrp1 protein involved in cleavage and polyadenylation. Conclusions The regulatory effects of some 3′ UTR sequence features, like the efficiency element, are consistent regardless of sequence context. In contrast, the consequences of other 3′ UTR features appear to be strongly dependent on their evolved context within native genes.


Sign in / Sign up

Export Citation Format

Share Document