scholarly journals Predicting base editing outcomes using position-specific sequence determinants

2021 ◽  
Author(s):  
Ananth Pallaseni ◽  
Elin Madli Peets ◽  
Jonas Koeppel ◽  
Juliane Weller ◽  
Luca Crepaldi ◽  
...  

Nucleotide-level control over DNA sequences is poised to power functional genomics studies and lead to new therapeutics. CRISPR/Cas base editors promise to achieve this ability, but the determinants of their activity remain incompletely understood. We measured base editing frequencies in two human cell lines for two cytosine and two adenine base editors at ~14,000 target sequences. Base editing activity is sequence-biased, with largest effects from nucleotides flanking the target base, and is correlated with measures of Cas9 guide RNA efficiency. Whether a base is edited depends strongly on the combination of its position in the target and the preceding base, with a preceding thymine in both editor types leading to a wider editing window, while a preceding guanine in cytosine editors and preceding adenine in adenine editors to a narrower one. The impact of features on editing rate depends on the position, with guide RNA efficacy mainly influencing bases around the centre of the window, and sequence biases away from it. We use these observations to train a machine learning model to predict editing activity per position for both adenine and cytosine editors, with accuracy ranging from 0.49 to 0.72 between editors, and with better generalization performance across datasets than existing tools. We demonstrate the usefulness of our model by predicting the efficacy of potential disease mutation correcting guides, and find that most of them suffer from more unwanted editing than corrected outcomes. This work unravels the position-specificity of base editing biases, and provides a solution to account for them, thus allowing more efficient planning of base edits in experimental and therapeutic contexts.

2020 ◽  
Author(s):  
Dhruva Katrekar ◽  
Nathan Palmer ◽  
Yichen Xiang ◽  
Anushka Saha ◽  
Dario Meluzzi ◽  
...  

ABSTRACTAdenosine deaminases acting on RNA (ADARs) can be repurposed to enable programmable RNA editing, however their exogenous delivery leads to transcriptome-wide off-targeting, and additionally, enzymatic activity on certain RNA motifs, especially those flanked by a 5’ guanosine is very low thus limiting their utility as a transcriptome engineering toolset. To address this, we explored comprehensive ADAR2 protein engineering via three approaches: First, we performed a novel deep mutational scan of the deaminase domain that enabled direct coupling of variants to corresponding RNA editing activity. Experimentally measuring the impact of every amino acid substitution across 261 residues, i.e. ~5000 variants, on RNA editing, revealed intrinsic domain properties, and also several mutations that greatly enhanced RNA editing. Second, we performed a domain-wide mutagenesis screen to identify variants that increased activity at 5’-GA-3’ motifs, and discovered novel mutants that enabled robust RNA editing. Third, we engineered the domain at the fragment level to create split deaminases. Notably, compared to full-length deaminase overexpression, split-deaminases resulted in >1000 fold more specific RNA editing. Taken together, we anticipate this comprehensive deaminase engineering will enable broader utility of the ADAR toolset for RNA biotechnology and therapeutic applications.


Author(s):  
Adrien Oliva ◽  
Raymond Tobler ◽  
Alan Cooper ◽  
Bastien Llamas ◽  
Yassine Souilmi

Abstract The current standard practice for assembling individual genomes involves mapping millions of short DNA sequences (also known as DNA ‘reads’) against a pre-constructed reference genome. Mapping vast amounts of short reads in a timely manner is a computationally challenging task that inevitably produces artefacts, including biases against alleles not found in the reference genome. This reference bias and other mapping artefacts are expected to be exacerbated in ancient DNA (aDNA) studies, which rely on the analysis of low quantities of damaged and very short DNA fragments (~30–80 bp). Nevertheless, the current gold-standard mapping strategies for aDNA studies have effectively remained unchanged for nearly a decade, during which time new software has emerged. In this study, we used simulated aDNA reads from three different human populations to benchmark the performance of 30 distinct mapping strategies implemented across four different read mapping software—BWA-aln, BWA-mem, NovoAlign and Bowtie2—and quantified the impact of reference bias in downstream population genetic analyses. We show that specific NovoAlign, BWA-aln and BWA-mem parameterizations achieve high mapping precision with low levels of reference bias, particularly after filtering out reads with low mapping qualities. However, unbiased NovoAlign results required the use of an IUPAC reference genome. While relevant only to aDNA projects where reference population data are available, the benefit of using an IUPAC reference demonstrates the value of incorporating population genetic information into the aDNA mapping process, echoing recent results based on graph genome representations.


2021 ◽  
Vol 11 (8) ◽  
pp. 3561
Author(s):  
Diego Duarte ◽  
Chris Walshaw ◽  
Nadarajah Ramesh

Across the world, healthcare systems are under stress and this has been hugely exacerbated by the COVID pandemic. Key Performance Indicators (KPIs), usually in the form of time-series data, are used to help manage that stress. Making reliable predictions of these indicators, particularly for emergency departments (ED), can facilitate acute unit planning, enhance quality of care and optimise resources. This motivates models that can forecast relevant KPIs and this paper addresses that need by comparing the Autoregressive Integrated Moving Average (ARIMA) method, a purely statistical model, to Prophet, a decomposable forecasting model based on trend, seasonality and holidays variables, and to the General Regression Neural Network (GRNN), a machine learning model. The dataset analysed is formed of four hourly valued indicators from a UK hospital: Patients in Department; Number of Attendances; Unallocated Patients with a DTA (Decision to Admit); Medically Fit for Discharge. Typically, the data exhibit regular patterns and seasonal trends and can be impacted by external factors such as the weather or major incidents. The COVID pandemic is an extreme instance of the latter and the behaviour of sample data changed dramatically. The capacity to quickly adapt to these changes is crucial and is a factor that shows better results for GRNN in both accuracy and reliability.


AMB Express ◽  
2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Neeraja Punde ◽  
Jennifer Kooken ◽  
Dagmar Leary ◽  
Patricia M. Legler ◽  
Evelina Angov

Abstract Codon usage frequency influences protein structure and function. The frequency with which codons are used potentially impacts primary, secondary and tertiary protein structure. Poor expression, loss of function, insolubility, or truncation can result from species-specific differences in codon usage. “Codon harmonization” more closely aligns native codon usage frequencies with those of the expression host particularly within putative inter-domain segments where slower rates of translation may play a role in protein folding. Heterologous expression of Plasmodium falciparum genes in Escherichia coli has been a challenge due to their AT-rich codon bias and the highly repetitive DNA sequences. Here, codon harmonization was applied to the malarial antigen, CelTOS (Cell-traversal protein for ookinetes and sporozoites). CelTOS is a highly conserved P. falciparum protein involved in cellular traversal through mosquito and vertebrate host cells. It reversibly refolds after thermal denaturation making it a desirable malarial vaccine candidate. Protein expressed in E. coli from a codon harmonized sequence of P. falciparum CelTOS (CH-PfCelTOS) was compared with protein expressed from the native codon sequence (N-PfCelTOS) to assess the impact of codon usage on protein expression levels, solubility, yield, stability, structural integrity, recognition with CelTOS-specific mAbs and immunogenicity in mice. While the translated proteins were expected to be identical, the translated products produced from the codon-harmonized sequence differed in helical content and showed a smaller distribution of polypeptides in mass spectra indicating lower heterogeneity of the codon harmonized version and fewer amino acid misincorporations. Substitutions of hydrophobic-to-hydrophobic amino acid were observed more commonly than any other. CH-PfCelTOS induced significantly higher antibody levels compared with N-PfCelTOS; however, no significant differences in either IFN-γ or IL-4 cellular responses were detected between the two antigens.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tanglong Yuan ◽  
Nana Yan ◽  
Tianyi Fei ◽  
Jitan Zheng ◽  
Juan Meng ◽  
...  

AbstractEfficient and precise base editors (BEs) for C-to-G transversion are highly desirable. However, the sequence context affecting editing outcome largely remains unclear. Here we report engineered C-to-G BEs of high efficiency and fidelity, with the sequence context predictable via machine-learning methods. By changing the species origin and relative position of uracil-DNA glycosylase and deaminase, together with codon optimization, we obtain optimized C-to-G BEs (OPTI-CGBEs) for efficient C-to-G transversion. The motif preference of OPTI-CGBEs for editing 100 endogenous sites is determined in HEK293T cells. Using a sgRNA library comprising 41,388 sequences, we develop a deep-learning model that accurately predicts the OPTI-CGBE editing outcome for targeted sites with specific sequence context. These OPTI-CGBEs are further shown to be capable of efficient base editing in mouse embryos for generating Tyr-edited offspring. Thus, these engineered CGBEs are useful for efficient and precise base editing, with outcome predictable based on sequence context of targeted sites.


2015 ◽  
Author(s):  
Javier Estrada ◽  
Teresa Ruiz-Herrero ◽  
Clarissa Scholes ◽  
Zeba Wunderlich ◽  
Angela DePace

DNA-binding proteins control many fundamental biological processes such as transcription, recombination and replication. A major goal is to decipher the role that DNA sequence plays in orchestrating the binding and activity of such regulatory proteins. To address this goal, it is useful to rationally design DNA sequences with desired numbers, affinities and arrangements of protein binding sites. However, removing binding sites from DNA is computationally non-trivial since one risks creating new sites in the process of deleting or moving others. Here we present an online binding site removal tool, SiteOut, that enables users to design arbitrary DNA sequences that entirely lack binding sites for factors of interest. SiteOut can also be used to delete sites from a specific sequence, or to introduce site-free spacers between functional sequences without creating new sites at the junctions. In combination with commercial DNA synthesis services, SiteOut provides a powerful and flexible platform for synthetic projects that interrogate regulatory DNA. Here we describe the algorithm and illustrate the ways in which SiteOut can be used; it is publicly available at https://depace.med.harvard.edu/siteout/


2021 ◽  
Author(s):  
Brian P. Anton ◽  
Alexey Fomenkov ◽  
Victoria Wu ◽  
Richard J. Roberts

ABSTRACTSingle-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.


2021 ◽  
Vol 251 ◽  
pp. 01017
Author(s):  
Zhixiang Lu

With the vigorous development of the sharing economy, the short-term rental industry has also spawned many emerging industries that belong to the sharing economy. However, due to the impact of the COVID-19 pandemic in 2020, many sharing economy industries, including the short-term housing leasing industry, have been affected. This study takes the rental information of 1,004 short-term rental houses in New York in April 2020 as an example, through machine learning and quantitative analysis, we conducted statistical and visual analysis on the impact of different factors on the housing rental status. This project is based on the machine learning model to predict the changes in the rental status of the house on the time series. The results show that the prediction accuracy of the random forest model has reached more than 94%, and the prediction accuracy of the logistic model has reached more than 74%. At the same time, we have further explored the impact of time span differences and regional differences on the housing rental status.


2020 ◽  
Author(s):  
Youngbin Oh ◽  
Hyeonjin Kim ◽  
Bora Lee ◽  
Sang-Gyu Kim

Abstract BackgroundThe Streptococcus pyogenes CRISPR system is composed of a Cas9 endonuclease (SpCas9) and a single-stranded guide RNA (gRNA) harboring a target-specific sequence. Theoretically, SpCas9 proteins could cleave as many targeted loci as gRNAs bind in a genome.ResultsWe introduce a PCR-free multiple gRNA cloning system for editing plant genomes. This method consists of two steps: (1) cloning annealed products of two oligonucleotides harboring target-binding sequence between tRNA and gRNA scaffold sequences in a pGRNA vector; and (2) assembling tRNA-gRNA units from several pGRNA vectors with a plant binary vector containing a SpCas9 expression cassette using the Golden Gate assembly method. We validated the editing efficiency and patterns of the multiplex gRNA expression system in wild tobacco (Nicotiana attenuata) protoplasts and in transformed plants by performing targeted deep sequencing. Two proximal cleavages by SpCas9-gRNA largely increased the editing efficiency and induced large deletions between two cleavage sites.ConclusionsThis multiplex gRNA expression system enables high-throughput production of a single binary vector and increases the efficiency of plant genome editing.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0247541
Author(s):  
Brian P. Anton ◽  
Alexey Fomenkov ◽  
Victoria Wu ◽  
Richard J. Roberts

Single-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.


Sign in / Sign up

Export Citation Format

Share Document