Recognition of CRISPR off-target cleavage sites with SeqGAN

2021 ◽  
Vol 16 ◽  
Author(s):  
Wen Li ◽  
Xiao-Bo Wang ◽  
Yan Xu

Background: The CRISPR system can quickly achieve the editing of different gene loci by changing a small sequence on a single guide RNA. But the off-target event limits the further development of the CRISPR system. How to improve the efficiency and specificity of this technology and minimize the risk of off-target has always been a challenge. For genome-wide CRISPR off-target cleavage sites (OTS) prediction, an important issue is data imbalance, that is, the number of true OTS identified is much less than that of all possible nucleotide mismatch loci. Method: In this work, based on the sequence-generating adversarial network (SeqGAN), positive off-target sequences were generated to amplify the off-target gene locus OTS dataset of Cpf1. Then we trained the data by deep convolutional neural network (CNN) to obtain a predictor with stronger generalization ability and better performance. Results: n 10-fold cross-validation, the AUC value of the CNN classifier after SeqGAN balance was 0.941, which was higher than that of original 0.863 and over-sampling 0.929. In independence testing, AUC value of the CNN classifier after SeqGAN balance was 0.841 which was higher than that of original 0.833 and over-sampling 0.836. the PR value was 0.722 after SeqGAN, which was also about higher 0.16 than original data and higher about 0.03 than over-sampling. Conclusion: The sequence generation antagonistic network SeqGAN was firstly used to deal with data imbalance processing on CRISPR data. All the results showed that the SeqGAN can effectively generate positive data for CRISPR off-target sites.

2019 ◽  
Vol 21 (4) ◽  
pp. 1448-1454 ◽  
Author(s):  
Yuli Gao ◽  
Guohui Chuai ◽  
Weichuan Yu ◽  
Shen Qu ◽  
Qi Liu

Abstract For genome-wide CRISPR off-target cleavage sites (OTS) prediction, an important issue is data imbalance—the number of true OTS recognized by whole-genome off-target detection techniques is much smaller than that of all possible nucleotide mismatch loci, making the training of machine learning model very challenging. Therefore, computational models proposed for OTS prediction and scoring should be carefully designed and properly evaluated in order to avoid bias. In our study, two tools are taken as examples to further emphasize the data imbalance issue in CRISPR off-target prediction to achieve better sensitivity and specificity for optimized CRISPR gene editing. We would like to indicate that (1) the benchmark of CRISPR off-target prediction should be properly evaluated and not overestimated by considering data imbalance issue; (2) incorporation of efficient computational techniques (including ensemble learning and data synthesis techniques) can help to address the data imbalance issue and improve the performance of CRISPR off-target prediction. Taking together, we call for more efforts to address the data imbalance issue in CRISPR off-target prediction to facilitate clinical utility of CRISPR-based gene editing techniques.


2020 ◽  
Vol 85 (4) ◽  
pp. 895-901
Author(s):  
Takamitsu Amai ◽  
Tomoka Tsuji ◽  
Mitsuyoshi Ueda ◽  
Kouichi Kuroda

ABSTRACT Mitochondrial dysfunction can occur in a variety of ways, most often due to the deletion or mutation of mitochondrial DNA (mtDNA). The easy generation of yeasts with mtDNA deletion is attractive for analyzing the functions of the mtDNA gene. Treatment of yeasts with ethidium bromide is a well-known method for generating ρ° cells with complete deletion of mtDNA from Saccharomyces cerevisiae. However, the mutagenic effects of ethidium bromide on the nuclear genome cannot be excluded. In this study, we developed a “mito-CRISPR system” that specifically generates ρ° cells of yeasts. This system enabled the specific cleavage of mtDNA by introducing Cas9 fused with the mitochondrial target sequence at the N-terminus and guide RNA into mitochondria, resulting in the specific generation of ρ° cells in yeasts. The mito-CRISPR system provides a concise technology for deleting mtDNA in yeasts.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yingxi Yang ◽  
Hui Wang ◽  
Wen Li ◽  
Xiaobo Wang ◽  
Shizhao Wei ◽  
...  

Abstract Background Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein’s function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. Method We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. Results In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN. Conclusions The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.


2021 ◽  
Vol 11 (5) ◽  
pp. 2166
Author(s):  
Van Bui ◽  
Tung Lam Pham ◽  
Huy Nguyen ◽  
Yeong Min Jang

In the last decade, predictive maintenance has attracted a lot of attention in industrial factories because of its wide use of the Internet of Things and artificial intelligence algorithms for data management. However, in the early phases where the abnormal and faulty machines rarely appeared in factories, there were limited sets of machine fault samples. With limited fault samples, it is difficult to perform a training process for fault classification due to the imbalance of input data. Therefore, data augmentation was required to increase the accuracy of the learning model. However, there were limited methods to generate and evaluate the data applied for data analysis. In this paper, we introduce a method of using the generative adversarial network as the fault signal augmentation method to enrich the dataset. The enhanced data set could increase the accuracy of the machine fault detection model in the training process. We also performed fault detection using a variety of preprocessing approaches and classified the models to evaluate the similarities between the generated data and authentic data. The generated fault data has high similarity with the original data and it significantly improves the accuracy of the model. The accuracy of fault machine detection reaches 99.41% with 20% original fault machine data set and 93.1% with 0% original fault machine data set (only use generate data only). Based on this, we concluded that the generated data could be used to mix with original data and improve the model performance.


Author(s):  
Eiru Kim ◽  
Traver Hart

AbstractIdentifying essential genes in genome-wide loss of function screens is a critical step in functional genomics and cancer target finding. We previously described the Bayesian Analysis of Gene Essentiality (BAGEL) algorithm for accurate classification of gene essentiality from short hairpin RNA and CRISPR/Cas9 genome wide genetic screens. Here, we introduce an updated version, BAGEL2, which employs an improved model that offers greater dynamic range of Bayes Factors, enabling detection of tumor suppressor genes, and a multi-target correction that reduces false positives from off-target CRISPR guide RNA. We also suggest a metric for screen quality at the replicate level and demonstrate how different algorithms handle lower-quality data in substantially different ways. BAGEL2 is written in Python 3 and source code, along with all supporting files, are available on github (https://github.com/hart-lab/bagel).


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Juan Xu ◽  
Yongfang Shi ◽  
Lei Shi ◽  
Zihui Ren ◽  
Yang Lu

In recent years, deep learning has become a popular issue in the intelligent fault diagnosis of industrial equipment. Under practical working conditions, although the collected vibration data are of large capacity, most of the vibration data are not labeled. Collecting and labeling sufficient fault data for each condition are unrealistic. Therefore, constructing a reliable fault diagnosis model with a small amount of labeled vibration data is a significant problem. In this paper, the vibration time-domain signal of the fault bearing is transformed into a 2-dimensional image by wavelet transform to obtain the time-frequency domain information of the original data. A deep adversarial convolutional neural network based on semisupervised learning is proposed. A large amount of fake data generated by the generator and unlabeled true vibration data are used in the discriminator to learn the overall distribution of data by judging the authenticity of the input. Three regular terms for different loss functions are designed to constrain the parameters of the discriminator to improve the learning ability of the model. The proposed method is validated by two bearing fault diagnosis cases. The experiment results show that the proposed method has higher diagnostic accuracy than traditional deep models on multigroup small datasets of different capacities. The proposed method provides a new solution to the fault diagnosis problem with large vibration data but few labels.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Ida Höijer ◽  
Josefin Johansson ◽  
Sanna Gudmundsson ◽  
Chen-Shan Chin ◽  
Ignas Bunikis ◽  
...  

Abstract Background One ongoing concern about CRISPR-Cas9 genome editing is that unspecific guide RNA (gRNA) binding may induce off-target mutations. However, accurate prediction of CRISPR-Cas9 off-target activity is challenging. Here, we present SMRT-OTS and Nano-OTS, two novel, amplification-free, long-read sequencing protocols for detection of gRNA-driven digestion of genomic DNA by Cas9 in vitro. Results The methods are assessed using the human cell line HEK293, re-sequenced at 18x coverage using highly accurate HiFi SMRT reads. SMRT-OTS and Nano-OTS are first applied to three different gRNAs targeting HEK293 genomic DNA, resulting in a set of 55 high-confidence gRNA cleavage sites identified by both methods. Twenty-five of these sites are not reported by off-target prediction software, either because they contain four or more single nucleotide mismatches or insertion/deletion mismatches, as compared with the human reference. Additional experiments reveal that 85% of Cas9 cleavage sites are also found by other in vitro-based methods and that on- and off-target sites are detectable in gene bodies where short-reads fail to uniquely align. Even though SMRT-OTS and Nano-OTS identify several sites with previously validated off-target editing activity in cells, our own CRISPR-Cas9 editing experiments in human fibroblasts do not give rise to detectable off-target mutations at the in vitro-predicted sites. However, indel and structural variation events are enriched at the on-target sites. Conclusions Amplification-free long-read sequencing reveals Cas9 cleavage sites in vitro that would have been difficult to predict using computational tools, including in dark genomic regions inaccessible by short-read sequencing.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 891-891
Author(s):  
Laura Hinze ◽  
Maren Pfirrmann ◽  
Salmaan Karim ◽  
James Degar ◽  
Connor McGuckin ◽  
...  

Abstract Asparaginase, a bacterial enzyme that depletes the nonessential amino acid asparagine, is an integral component of acute leukemia therapy. However, asparaginase resistance is a common clinical problem whose biologic basis is poorly understood. We hypothesized, based on the concept of synthetic lethality, that gain-of-fitness alterations in the drug-resistant cells had conferred a survival advantage that could be exploited therapeutically. To identify molecular pathways that promote fitness of leukemic cells upon treatment with asparaginase, we performed a genome-wide CRISPR/Cas9 loss-of-function screen in the asparaginase-resistant T-ALL cell line CCRF-CEM. Cas9-expressing CCRF-CEM cells were transduced with a genome-wide guide RNA library (Shalem et al. Science343, 84-87, 2014), treated with either vehicle or asparaginase (10 U/L), and guide RNA representation was assessed. Our internal positive control, asparagine synthetase, was the gene most significantly depleted in asparaginase-treated cells (RRA significance score = 1.56 x 10-7), followed closely by two regulators of Wnt signaling, NKD2 and LGR6 (RRA score = 6 x 10-6and 2.19 x 10-5, respectively). To test how these genes regulate Wnt signaling in T-ALL, we transduced CCRF-CEM cells with shRNAs targeting NKD2 or LGR6, or with an shLuciferase control. Knockdown of NKD2 or LGR6 increased levels of active β-catenin, as well as the activity of a TopFLASH reporter of canonical Wnt/β-catenin transcriptional activity (P < 0.0001), indicating that NKD2 and LGR6 are negative regulators of Wnt signaling in these cells. We then validated the screen results using shRNA knockdown of NKD2 or LGR6, which profoundly sensitized these cells to asparaginase (P< 0.0001) and potentiated asparaginase-induced apoptosis (P < 0.0001). Inhibition of glycogen synthase kinase 3 (GSK3) is a key event in Wnt-induced signal transduction. Thus, we tested whether CHIR99021, an ATP-competitive inhibitor of both GSK3 isoforms (GSK3α and GSK3β), could phenocopy the effect of Wnt pathway activation. Pharmacologic inhibition of GSK3 induced significant sensitization to asparaginase across a panel of cell lines representing distinct subtypes of treatment-resistant acute leukemia, including T-ALL, AML and hypodiploid B-ALL (Fig. 1a, b). Importantly, GSK3 inhibition did not sensitize normal hematopoietic progenitors to asparaginase, suggesting a leukemia-specific effect. Wnt-induced sensitization to asparaginase was independent of β-catenin and mTOR activation, because genetic and pharmacologic manipulation of these Wnt targets had no effect on asparaginase response. Instead, it was mediated by Wnt-dependent stabilization of proteins (Wnt/STOP), which inhibits GSK3-dependent protein ubiquitination and proteasomal degradation (Acebron et al. Mol Cell54, 663-674, 2014, Taelman et al. Cell143, 1136-1148, 2010). Indeed,Wnt-induced sensitization to asparaginase was completely blocked by the transduction of leukemia cells with FBXW7 (P < 0.0001), whose overexpression can reverse Wnt/STOP (Acebron et al. Mol Cell54, 663-674; 2014), or by expression of a hyperactive proteasomal subunit ΔN-PSMA4 (P < 0.0001), which globally increases protein degradation (Choi et al. Nat Commun7, 10963, 2016). Although GSK3α and GSK3βare redundant for many of their biologic functions, genetic or pharmacologic inhibition of GSK3α fully phenocopied Wnt-induced sensitization to asparaginase (P < 0.0001), whereas selective inhibition of GSK3β had no effect. We then leveraged the recently developed GSK3α-selective small molecule inhibitor BRD0705 (Wagner et al. Sci Transl Med10, 2018) to test the in vivo therapeutic potential of our findings. Immunodeficient NRG mice were injected with leukemic cells from a primary asparaginase-resistant T-ALL patient derived xenograft, and treatment was begun after confirmation of leukemic engraftment (n=16 mice per group). In vivo, this PDX proved completely resistant to asparaginase or BRD0705 monotherapy, whereas the combination was highly efficacious (median survival of 17 days in vehicle, vs. median not reached at 60 days in combo-treated mice; P < 0.0001; Fig. 2a, b). The combination was also well-tolerated, with no appreciable weight changes or increases in serum bilirubin levels. Our findings provide a molecular rationale for activating Wnt/STOP signaling to improve the therapeutic index of asparaginase. Disclosures No relevant conflicts of interest to declare.


2017 ◽  
Author(s):  
Kathleen M. Chen ◽  
Jie Tan ◽  
Gregory P. Way ◽  
Georgia Doing ◽  
Deborah A. Hogan ◽  
...  

AbstractBackgroundInvestigators often interpret genome-wide data by analyzing the expression levels of genes within pathways. While this within-pathway analysis is routine, the products of any one pathway can affect the activity of other pathways. Past efforts to identify relationships between biological processes have evaluated overlap in knowledge bases or evaluated changes that occur after specific treatments. Individual experiments can highlight condition-specific pathway-pathway relationships; however, constructing a complete network of such relationships across many conditions requires analyzing results from many studies.ResultsWe developed PathCORE-T framework by implementing existing methods to identify pathway-pathway transcriptional relationships evident across a broad data compendium. PathCORE-T is applied to the output of feature construction algorithms; it identifies pairs of pathways observed in features more than expected by chance as functionally co-occurring. We demonstrate PathCORE-T by analyzing an existing eADAGE model of a microbial compendium and building and analyzing NMF features from the TCGA dataset of 33 cancer types. The PathCORE-T framework includes a demonstration web interface, with source code, that users can launch to (1) visualize the network and (2) review the expression levels of associated genes in the original data. PathCORE-T creates and displays the network of globally co-occurring pathways based on features observed in a machine learning analysis of gene expression data.ConclusionsThe PathCORE-T framework identifies transcriptionally co-occurring pathways from the results of unsupervised analysis of gene expression data and visualizes the relationships between pathways as a network. PathCORE-T recapitulated previously described pathway-pathway relationships and suggested experimentally testable additional hypotheses that remain to be explored.


2021 ◽  
Author(s):  
Dipankar Baisya ◽  
Adithya Ramesh ◽  
Cory Schwartz ◽  
Stefano Lonardi ◽  
Ian Wheeldon

AbstractGenome-wide functional genetic screens have been successful in discovering genotype-phenotype relationships and in engineering new phenotypes. While broadly applied in mammalian cell lines and in E. coli, use in non-conventional microorganisms has been limited, in part, due to the inability to accurately design high activity CRISPR guides in such species. Here, we develop an experimental-computational approach to sgRNA design that is specific to an organism of choice, in this case the oleaginous yeast Yarrowia lipolytica. A negative selection screen in the absence of non-homologous end-joining, the dominant DNA repair mechanism, was used to generate single guide RNA (sgRNA) activity profiles for both SpCas9 and LbCas12a. This genome-wide data served as input to a deep learning algorithm, DeepGuide, that is able to accurately predict guide activity. DeepGuide uses unsupervised learning to obtain a compressed representation of the genome, followed by supervised learning to map sgRNA sequence, genomic context, and epigenetic features with guide activity. Experimental validation, both genome-wide and with a subset of selected genes, confirms DeepGuide’s ability to accurately predict high activity sgRNAs. DeepGuide provides an organism specific predictor of CRISPR guide activity that could be broadly applied to fungal species, prokaryotes, and other non-conventional organisms.


Sign in / Sign up

Export Citation Format

Share Document