scholarly journals Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tanglong Yuan ◽  
Nana Yan ◽  
Tianyi Fei ◽  
Jitan Zheng ◽  
Juan Meng ◽  
...  

AbstractEfficient and precise base editors (BEs) for C-to-G transversion are highly desirable. However, the sequence context affecting editing outcome largely remains unclear. Here we report engineered C-to-G BEs of high efficiency and fidelity, with the sequence context predictable via machine-learning methods. By changing the species origin and relative position of uracil-DNA glycosylase and deaminase, together with codon optimization, we obtain optimized C-to-G BEs (OPTI-CGBEs) for efficient C-to-G transversion. The motif preference of OPTI-CGBEs for editing 100 endogenous sites is determined in HEK293T cells. Using a sgRNA library comprising 41,388 sequences, we develop a deep-learning model that accurately predicts the OPTI-CGBE editing outcome for targeted sites with specific sequence context. These OPTI-CGBEs are further shown to be capable of efficient base editing in mouse embryos for generating Tyr-edited offspring. Thus, these engineered CGBEs are useful for efficient and precise base editing, with outcome predictable based on sequence context of targeted sites.

2018 ◽  
Author(s):  
Parth Patel ◽  
Sandra Mathioni ◽  
Atul Kakrana ◽  
Hagit Shatkay ◽  
Blake C. Meyers

Summary and keywordsLittle is known about the characteristics and function of reproductive phased, secondary, small interfering RNAs (phasiRNAs) in the Poaceae, despite the availability of significant genomic resources, experimental data, and a growing number of computational tools. We utilized machine-learning methods to identify sequence-based and structural features that distinguish phasiRNAs in rice and maize from other small RNAs (sRNAs).We developed Random Forest classifiers that can distinguish reproductive phasiRNAs from other sRNAs in complex sets of sequencing data, utilizing sequence-based (k-mers) and features describing position-specific sequence biases.The classification performance attained is >80% in accuracy, sensitivity, specificity, and positive predicted value. Feature selection identified important features in both ends of phasiRNAs. We demonstrated that phasiRNAs have strand specificity and position-specific nucleotide biases potentially influencing AGO sorting; we also predicted targets to infer functions of phasiRNAs, and computationally-assessed their sequence characteristics relative to other sRNAs.Our results demonstrate that machine-learning methods effectively identify phasiRNAs despite the lack of characteristic features typically present in precursor loci of other small RNAs, such as sequence conservation or structural motifs. The 5’-end features we identified provide insights into AGO-phasiRNA interactions; we describe a hypothetical model of competition for AGO loading between phasiRNAs of different nucleotide compositions.


2020 ◽  
Vol 31 (10) ◽  
pp. 1222-1235
Author(s):  
Abhishek Sheetal ◽  
Zhiyu Feng ◽  
Krishna Savani

How can we nudge people to not engage in unethical behaviors, such as hoarding and violating social-distancing guidelines, during the COVID-19 pandemic? Because past research on antecedents of unethical behavior has not provided a clear answer, we turned to machine learning to generate novel hypotheses. We trained a deep-learning model to predict whether or not World Values Survey respondents perceived unethical behaviors as justifiable, on the basis of their responses to 708 other items. The model identified optimism about the future of humanity as one of the top predictors of unethicality. A preregistered correlational study ( N = 218 U.S. residents) conceptually replicated this finding. A preregistered experiment ( N = 294 U.S. residents) provided causal support: Participants who read a scenario conveying optimism about the COVID-19 pandemic were less willing to justify hoarding and violating social-distancing guidelines than participants who read a scenario conveying pessimism. The findings suggest that optimism can help reduce unethicality, and they document the utility of machine-learning methods for generating novel hypotheses.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Shixiang Zhang ◽  
Shuaiqi Huang ◽  
Hongkai Wu ◽  
Zicong Yang ◽  
Yinda Chen

Melanoma is considered to be one of the most dangerous human malignancy, which is diagnosed visually or by dermoscopic analysis and histopathological examination. However, as these traditional methods are based on human experience and implemented manually, there have been great limitations for general usability in current clinical practice. In this paper, a novel hybrid machine learning approach is proposed to identify melanoma for skin healthcare in various cases. The proposed approach consists of classic machine learning methods, including convolutional neural networks (CNNs), EfficientNet, and XGBoost supervised machine learning. In the proposed approach, a deep learning model is trained directly from raw pixels and image labels for classification of skin lesions. Then, solely based on modeling of various features from patients, an XGBoost model is adopted to predict skin cancer. Following that, a diagnostic system which composed of the deep learning model and XGBoost model is developed to further improve the prediction efficiency and accuracy. Different from experience-based methods and solely image-based machine learning methods, the proposed approach is developed based on the theory of deep learning and feature engineering. Experiments show that the hybrid model outperforms single model like the traditional deep learning model or XGBoost model. Moreover, the data-driven-based characteristics can help the proposed approach develop a guideline for image analysis in other medical applications.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kai-Yao Huang ◽  
Justin Bo-Kai Hsu ◽  
Tzong-Yi Lee

Abstract Succinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/.


JAMIA Open ◽  
2020 ◽  
Vol 3 (2) ◽  
pp. 252-260 ◽  
Author(s):  
Armando D Bedoya ◽  
Joseph Futoma ◽  
Meredith E Clement ◽  
Kristin Corey ◽  
Nathan Brajer ◽  
...  

Abstract Objective Determine if deep learning detects sepsis earlier and more accurately than other models. To evaluate model performance using implementation-oriented metrics that simulate clinical practice. Materials and Methods We trained internally and temporally validated a deep learning model (multi-output Gaussian process and recurrent neural network [MGP–RNN]) to detect sepsis using encounters from adult hospitalized patients at a large tertiary academic center. Sepsis was defined as the presence of 2 or more systemic inflammatory response syndrome (SIRS) criteria, a blood culture order, and at least one element of end-organ failure. The training dataset included demographics, comorbidities, vital signs, medication administrations, and labs from October 1, 2014 to December 1, 2015, while the temporal validation dataset was from March 1, 2018 to August 31, 2018. Comparisons were made to 3 machine learning methods, random forest (RF), Cox regression (CR), and penalized logistic regression (PLR), and 3 clinical scores used to detect sepsis, SIRS, quick Sequential Organ Failure Assessment (qSOFA), and National Early Warning Score (NEWS). Traditional discrimination statistics such as the C-statistic as well as metrics aligned with operational implementation were assessed. Results The training set and internal validation included 42 979 encounters, while the temporal validation set included 39 786 encounters. The C-statistic for predicting sepsis within 4 h of onset was 0.88 for the MGP–RNN compared to 0.836 for RF, 0.849 for CR, 0.822 for PLR, 0.756 for SIRS, 0.619 for NEWS, and 0.481 for qSOFA. MGP–RNN detected sepsis a median of 5 h in advance. Temporal validation assessment continued to show the MGP–RNN outperform all 7 clinical risk score and machine learning comparisons. Conclusions We developed and validated a novel deep learning model to detect sepsis. Using our data elements and feature set, our modeling approach outperformed other machine learning methods and clinical scores.


Sign in / Sign up

Export Citation Format

Share Document