CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing

ABSTRACT Motivation Clustered regularly interspaced short palindromic repeats (CRISPR) technologies allow for facile genomic modification in a site-specific manner. A key step in this process is the in silico design of single guide RNAs to efficiently and specifically target a site of interest. To this end, it is necessary to enumerate all potential off-target sites within a given genome that could be inadvertently altered by nuclease-mediated cleavage. Currently available software for this task is limited by computational efficiency, variant support or annotation, and assessment of the functional impact of potential off-target effects. Results To overcome these limitations, we have developed CRISPRitz, a suite of software tools to support the design and analysis of CRISPR/CRISPR-associated (Cas) experiments. Using efficient data structures combined with parallel computation, we offer a rapid, reliable, and exhaustive search mechanism to enumerate a comprehensive list of putative off-target sites. As proof-of-principle, we performed a head-to-head comparison with other available tools on several datasets. This analysis highlighted the unique features and superior computational performance of CRISPRitz including support for genomic searching with DNA/RNA bulges and mismatches of arbitrary size as specified by the user as well as consideration of genetic variants (variant-aware). In addition, graphical reports are offered for coding and non-coding regions that annotate the potential impact of putative off-target sites that lie within regions of functional genomic annotation (e.g. insulator and chromatin accessible sites from the ENCyclopedia Of DNA Elements [ENCODE] project). Availability and implementation The software is freely available at: https://github.com/pinellolab/CRISPRitzhttps://github.com/InfOmics/CRISPRitz. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DeepKinZero: zero-shot learning for predicting kinase–phosphosite associations involving understudied kinases

Bioinformatics ◽

10.1093/bioinformatics/btaa013 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3652-3661 ◽

Cited By ~ 2

Author(s):

Iman Deznabi ◽

Busra Arabaci ◽

Mehmet Koyutürk ◽

Oznur Tastan

Keyword(s):

Protein Function ◽

Large Body ◽

Supplementary Information ◽

Baseline Model ◽

Source Codes ◽

Target Sites ◽

Specific Manner ◽

Proteome Level ◽

A Site ◽

Experimental Challenge

Abstract Motivation Protein phosphorylation is a key regulator of protein function in signal transduction pathways. Kinases are the enzymes that catalyze the phosphorylation of other proteins in a target-specific manner. The dysregulation of phosphorylation is associated with many diseases including cancer. Although the advances in phosphoproteomics enable the identification of phosphosites at the proteome level, most of the phosphoproteome is still in the dark: more than 95% of the reported human phosphosites have no known kinases. Determining which kinase is responsible for phosphorylating a site remains an experimental challenge. Existing computational methods require several examples of known targets of a kinase to make accurate kinase-specific predictions, yet for a large body of kinases, only a few or no target sites are reported. Results We present DeepKinZero, the first zero-shot learning approach to predict the kinase acting on a phosphosite for kinases with no known phosphosite information. DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model. The kinase-specific positional amino acid preferences are learned using a bidirectional recurrent neural network. We show that DeepKinZero achieves significant improvement in accuracy for kinases with no known phosphosites in comparison to the baseline model and other methods available. By expanding our knowledge on understudied kinases, DeepKinZero can help to chart the phosphoproteome atlas. Availability and implementation The source codes are available at https://github.com/Tastanlab/DeepKinZero. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

movAPA: modeling and visualization of dynamics of alternative polyadenylation across biological samples

Bioinformatics ◽

10.1093/bioinformatics/btaa997 ◽

2020 ◽

Author(s):

Wenbin Ye ◽

Tao Liu ◽

Hongjuan Fu ◽

Congting Ye ◽

Guoli Ji ◽

...

Keyword(s):

Biological Samples ◽

Tissue Specificity ◽

Single Cells ◽

Alternative Polyadenylation ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Mouse Sperm ◽

High Scalability ◽

A Site

Abstract Motivation Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3′ end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. Results We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3′ UTR shortening/lengthening events between conditions. APA site switching involving non-3′ UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. Availability and implementation https://github.com/BMILAB/movAPA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RGEN-seq For Highly Sensitive Amplification-free Screen Of Off-target Sites Of Gene Editors

10.21203/rs.3.rs-961017/v1 ◽

2021 ◽

Author(s):

Alexander Kuzin ◽

Brendan Redler ◽

Jaya Onuska ◽

Alexei Slesarev

Keyword(s):

Target Detection ◽

Dna Cleavage ◽

Affinity Purification ◽

Pcr Amplification ◽

Detailed Comparison ◽

Detection Methods ◽

Guide Rnas ◽

Coverage Bias ◽

Biochemical Assays ◽

Target Sites

Abstract Sensitive detection of off-target sites produced by gene editing nucleases is crucial for developing reliable gene therapy platforms. Although several biochemical assays for the characterization of nuclease off-target effects have been recently published, significant technical and methodological issues still remain.. Of note, existing methods rely on PCR amplification, tagging, and affinity purification which can introduce bias, contaminants, sample loss through handling, etc. Here we describe a sensitive, PCR-free next-generation sequencing method (RGEN-seq) for unbiased detection of double-stranded breaks generated by RNA-guided CRISPR-Cas9 endonuclease. Through use of novel sequencing adapters, the RGEN-Seq method saves time, simplifies workflow, and removes genomic coverage bias and gaps associated with PCR and/or other enrichment procedures. RGEN-seq is fully compatible with existing off-target detection software; moreover, the unbiased nature of RGEN-seq offers a robust foundation for relating assigned DNA cleavage scores to propensity for off-target mutations in cells. A detailed comparison of RGEN-seq with other off-target detection methods is provided using a previously characterized set of guide RNAs.

Download Full-text

Improving the Precision of Base Editing by Bubble Hairpin Single Guide RNA

mBio ◽

10.1128/mbio.00342-21 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

Zhiwei Hu ◽

Yannan Wang ◽

Qian Liu ◽

Yan Qiu ◽

Zhiyu Zhong ◽

...

Keyword(s):

Genome Editing ◽

Dna Breaks ◽

Single Nucleotide ◽

Base Editing ◽

Guide Rna ◽

Guide Rnas ◽

Double Stranded Dna ◽

Target Sites ◽

Guide Sequence ◽

Sgrna Design

ABSTRACT Base editing is a powerful genome editing approach that enables single-nucleotide changes without double-stranded DNA breaks (DSBs). However, off-target effects as well as other undesired editings at on-target sites remain obstacles for its application. Here, we report that bubble hairpin single guide RNAs (BH-sgRNAs), which contain a hairpin structure with a bubble region on the 5′ end of the guide sequence, can be efficiently applied to both cytosine base editor (CBE) and adenine base editor (ABE) and significantly decrease off-target editing without sacrificing on-target editing efficiency. Meanwhile, such a design also improves the purity of C-to-T conversions induced by base editor 3 (BE3) at on-target sites. Our results present a distinctive and effective strategy to improve the specificity of base editing. IMPORTANCE Base editors are DSB-free genome editing tools and have been widely used in diverse living systems. However, it is reported that these tools can cause substantial off-target editings. To meet this challenge, we developed a new approach to improve the specificity of base editors by using hairpin sgRNAs with a bubble. Furthermore, our sgRNA design also dramatically reduced indels and unwanted base substitutions at on-target sites. We believe that the BH-sgRNA design is a significant improvement over existing sgRNAs of base editors, and our design promises to be adaptable to various base editors. We expect that it will make contributions to improving the safety of gene therapy.

Download Full-text

BioPartsBuilder: a synthetic biology tool for combinatorial assembly of biological parts

Bioinformatics ◽

10.1093/bioinformatics/btv664 ◽

2015 ◽

Vol 32 (6) ◽

pp. 937-939 ◽

Cited By ~ 8

Author(s):

Kun Yang ◽

Giovanni Stracquadanio ◽

Jingchuan Luo ◽

Jef D. Boeke ◽

Joel S. Bader

Keyword(s):

Large Scale ◽

Supplementary Information ◽

Market Place ◽

Design Standards ◽

Reusable Components ◽

Assembly Design ◽

Amazon Web Services ◽

Dna Elements ◽

Combinatorial Assembly ◽

The Cost

Abstract Summary: Combinatorial assembly of DNA elements is an efficient method for building large-scale synthetic pathways from standardized, reusable components. These methods are particularly useful because they enable assembly of multiple DNA fragments in one reaction, at the cost of requiring that each fragment satisfies design constraints. We developed BioPartsBuilder as a biologist-friendly web tool to design biological parts that are compatible with DNA combinatorial assembly methods, such as Golden Gate and related methods. It retrieves biological sequences, enforces compliance with assembly design standards and provides a fabrication plan for each fragment. Availability and implementation: BioPartsBuilder is accessible at http://public.biopartsbuilder.org and an Amazon Web Services image is available from the AWS Market Place (AMI ID: ami-508acf38). Source code is released under the MIT license, and available for download at https://github.com/baderzone/biopartsbuilder. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Modifying a covarying protein–DNA interaction changes substrate preference of a site-specific endonuclease

Nucleic Acids Research ◽

10.1093/nar/gkz866 ◽

2019 ◽

Vol 47 (20) ◽

pp. 10830-10841 ◽

Cited By ~ 1

Author(s):

Marc Laforet ◽

Thomas A McMurrough ◽

Michael Vu ◽

Christopher M Brown ◽

Kun Zhang ◽

...

Keyword(s):

Binding Sites ◽

Gene Editing ◽

Dna Interaction ◽

Substrate Preference ◽

Homing Endonucleases ◽

Hairpin Loop ◽

Dna Binding Sites ◽

Target Sites ◽

Site Variation ◽

A Site

Abstract Identifying and validating intermolecular covariation between proteins and their DNA-binding sites can provide insights into mechanisms that regulate selectivity and starting points for engineering new specificity. LAGLIDADG homing endonucleases (meganucleases) can be engineered to bind non-native target sites for gene-editing applications, but not all redesigns successfully reprogram specificity. To gain a global overview of residues that influence meganuclease specificity, we used information theory to identify protein–DNA covariation. Directed evolution experiments of one predicted pair, 227/+3, revealed variants with surprising shifts in I-OnuI substrate preference at the central 4 bases where cleavage occurs. Structural studies showed significant remodeling distant from the covarying position, including restructuring of an inter-hairpin loop, DNA distortions near the scissile phosphates, and new base-specific contacts. Our findings are consistent with a model whereby the functional impacts of covariation can be indirectly propagated to neighboring residues outside of direct contact range, allowing meganucleases to adapt to target site variation and indirectly expand the sequence space accessible for cleavage. We suggest that some engineered meganucleases may have unexpected cleavage profiles that were not rationally incorporated during the design process.

Download Full-text

Crisflash: open-source software to generate CRISPR guide RNAs against genomes annotated with individual variation

Bioinformatics ◽

10.1093/bioinformatics/btz019 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3146-3147 ◽

Cited By ~ 10

Author(s):

Adrien L S Jacquin ◽

Duncan T Odom ◽

Margus Lukk

Keyword(s):

Open Source Software ◽

Software Tool ◽

Supplementary Information ◽

Small Scale ◽

Genome Sequences ◽

Guide Rnas ◽

Genome Modification ◽

Order Of Magnitude ◽

Sgrna Design ◽

Reference Genomes

Abstract Summary CRISPR/Cas9 system requires short guide RNAs (sgRNAs) to direct genome modification. Most currently available tools for sgRNA design operate only with standard reference genomes, and are best suited for small-scale projects. To address these limitations, we developed Crisflash, a software tool for fast sgRNA design and potential off-target discovery, built for performance and flexibility. Crisflash can rapidly design CRISPR guides against any sequenced genome or genome sequences, and can optimize guide accuracy by incorporating user-supplied variant data. Crisflash is over an order of magnitude faster than comparable tools, even using a single CPU core, and efficiently and robustly scores the potential off-targeting of all possible candidate CRISPR guide oligonucleotides. Availability and implementation https://github.com/crisflash Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity

Bioinformatics ◽

10.1093/bioinformatics/btaa656 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xiaoyong Pan ◽

Jasper Zuallaert ◽

Xi Wang ◽

Hong-Bin Shen ◽

Elda Posada Campos ◽

...

Keyword(s):

Deep Learning ◽

In Silico ◽

Protein Domain ◽

Machine Learning Techniques ◽

Supplementary Information ◽

Test Results ◽

Saliency Maps ◽

Learning Techniques ◽

Output Module ◽

Protein Toxicity

Abstract Motivation Genetically engineering food crops involves introducing proteins from other species into crop plant species or modifying already existing proteins with gene editing techniques. In addition, newly synthesized proteins can be used as therapeutic protein drugs against diseases. For both research and safety regulation purposes, being able to assess the potential toxicity of newly introduced/synthesized proteins is of high importance. Results In this study, we present ToxDL, a deep learning-based approach for in silico prediction of protein toxicity from sequence alone. ToxDL consists of (i) a module encompassing a convolutional neural network that has been designed to handle variable-length input sequences, (ii) a domain2vec module for generating protein domain embeddings and (iii) an output module that classifies proteins as toxic or non-toxic, using the outputs of the two aforementioned modules. Independent test results obtained for animal proteins and cross-species transferability results obtained for bacteria proteins indicate that ToxDL outperforms traditional homology-based approaches and state-of-the-art machine-learning techniques. Furthermore, through visualizations based on saliency maps, we are able to verify that the proposed network learns known toxic motifs. Moreover, the saliency maps allow for directed in silico modification of a sequence, thus making it possible to alter its predicted protein toxicity. Availability and implementation ToxDL is freely available at http://www.csbio.sjtu.edu.cn/bioinf/ToxDL/. The source code can be found at https://github.com/xypan1232/ToxDL. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text