Predicting efficiency of writing short sequences into the genome using prime editing

Mapping Intimacies ◽

10.1101/2021.11.10.468024 ◽

2021 ◽

Author(s):

Jonas Koeppel ◽

Elin Madli Peets ◽

Juliane Weller ◽

Ananth Pallaseni ◽

Fabio Liberante ◽

...

Keyword(s):

Machine Learning ◽

Insertion Sequence ◽

Nucleotide Composition ◽

Sequence Length ◽

Human Cell Lines ◽

Short Sequence ◽

Machine Learning Model ◽

Dna Insertion ◽

Protein Tagging ◽

Insertion Frequency

Any short sequence can be precisely written into a selected genomic target using prime editing. This ability facilitates protein tagging, correction of pathogenic deletions, and many other exciting applications. However, it remains unclear what types of sequences prime editors can efficiently insert, and how to choose optimal reagents for a desired outcome. To characterize features that influence insertion efficiency, we designed a library of 2,666 sequences up to 69 nt in length and measured the frequency of their insertion into four genomic sites in three human cell lines, using different prime editor systems. We discover that insertion sequence length, nucleotide composition and secondary structure all affect insertion rates, and that mismatch repair proficiency is a strong determinant for the shortest insertions. Combining the sequence and repair features into a machine learning model, we can predict insertion frequency for new sequences with R = 0.69. The tools we provide allow users to choose optimal constructs for DNA insertion using prime editing.

Design of Machine Learning Model for Urban Planning and Management Improvement

International Journal of Performability Engineering ◽

10.23940/ijpe.20.06.p14.958967 ◽

2020 ◽

Vol 16 (6) ◽

pp. 958 ◽

Cited By ~ 1

Author(s):

Zhou Jiafeng ◽

Liu Tian ◽

Zou Lin

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Learning Model ◽

Planning And Management ◽

Machine Learning Model ◽

Urban Planning And Management ◽

Management Improvement

A Novel Machine Learning Model for Early Operational Anomaly Detection Using LWD/MWD Data

10.2523/iptc-19230-ms ◽

2019 ◽

Author(s):

Mohammed Al-Ghazal ◽

Viranchi Vedpathak

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Learning Model ◽

Machine Learning Model

Machine Learning Accelerated Genetic Algorithms for Computational Materials Search

10.26434/chemrxiv.7411172 ◽

2018 ◽

Author(s):

Steen Lysgaard ◽

Paul C. Jennings ◽

Jens Strabo Hummelshøj ◽

Thomas Bligaard ◽

Tejs Vegge

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Genetic Algorithms ◽

Au Nanoparticles ◽

Learning Model ◽

Energy Calculations ◽

Atomic Distribution ◽

Machine Learning Model ◽

Fold Reduction ◽

Computational Materials

A machine learning model is used as a surrogate fitness evaluator in a genetic algorithm (GA) optimization of the atomic distribution of Pt-Au nanoparticles. The machine learning accelerated genetic algorithm (MLaGA) yields a 50-fold reduction of required energy calculations compared to a traditional GA.

BAND NN: A Deep Learning Framework For Energy Prediction and Geometry Optimization of Organic Small Molecules

10.26434/chemrxiv.9763094 ◽

2019 ◽

Author(s):

Siddhartha Laghuvarapu ◽

Yashaswi Pathak ◽

U. Deva Priyakumar

Keyword(s):

Machine Learning ◽

Density Functional ◽

Computational Cost ◽

Geometry Optimization ◽

Dft Methods ◽

Energy Prediction ◽

Machine Learning Model ◽

Equilibrium Structures ◽

High Level ◽

Non Equilibrium

Recent advances in artificial intelligence along with development of large datasets of energies calculated using quantum mechanical (QM)/density functional theory (DFT) methods have enabled prediction of accurate molecular energies at reasonably low computational cost. However, machine learning models that have been reported so far requires the atomic positions obtained from geometry optimizations using high level QM/DFT methods as input in order to predict the energies, and do not allow for geometry optimization. In this paper, a transferable and molecule-size independent machine learning model (BAND NN) based on a chemically intuitive representation inspired by molecular mechanics force fields is presented. The model predicts the atomization energies of equilibrium and non-equilibrium structures as sum of energy contributions from bonds (B), angles (A), nonbonds (N) and dihedrals (D) at remarkable accuracy. The robustness of the proposed model is further validated by calculations that span over the conformational, configurational and reaction space. The transferability of this model on systems larger than the ones in the dataset is demonstrated by performing calculations on select large molecules. Importantly, employing the BAND NN model, it is possible to perform geometry optimizations starting from non-equilibrium structures along with predicting their energies.

A Novel Amino Acid Sequence-based Computational Approach to Predicting Cell-penetrating Peptides

Current Computer - Aided Drug Design ◽

10.2174/1573409914666180925100355 ◽

2019 ◽

Vol 15 (3) ◽

pp. 206-211 ◽

Cited By ~ 2

Author(s):

Jihui Tang ◽

Jie Ning ◽

Xiaoyan Liu ◽

Baoming Wu ◽

Rongfeng Hu

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Position ◽

Cell Penetrating Peptides ◽

Support Vector ◽

Cell Penetration ◽

Drug Candidates ◽

Machine Learning Model ◽

Cell Penetrating ◽

Novel Method

Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Smart-ML: A System for Machine Learning Model Exploration using Pipeline Graph

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378082 ◽

2020 ◽

Author(s):

Dhaval Patel ◽

Shrey Shrivastava ◽

Wesley Gifford ◽

Stuart Siegel ◽

Jayant Kalagnanam ◽

...

Keyword(s):

Machine Learning ◽

Learning Model ◽

Machine Learning Model

Machine Learning Prediction of SARS-CoV-2 Polymerase Chain Reaction Results with Routine Blood Tests

Laboratory Medicine ◽

10.1093/labmed/lmaa111 ◽

2020 ◽

Author(s):

Thomas Tschoellitsch ◽

Martin Dünser ◽

Carl Böck ◽

Karin Schwarzbauer ◽

Jens Meier

Keyword(s):

Machine Learning ◽

Polymerase Chain Reaction ◽

Characteristic Curve ◽

Cohort Analysis ◽

Rt Pcr ◽

Chain Reaction ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Polymerase Chain

Abstract Objective The diagnosis of COVID-19 is based on the detection of SARS-CoV-2 in respiratory secretions, blood, or stool. Currently, reverse transcription polymerase chain reaction (RT-PCR) is the most commonly used method to test for SARS-CoV-2. Methods In this retrospective cohort analysis, we evaluated whether machine learning could exclude SARS-CoV-2 infection using routinely available laboratory values. A Random Forests algorithm with 1353 unique features was trained to predict the RT-PCR results. Results Out of 12,848 patients undergoing SARS-CoV-2 testing, routine blood tests were simultaneously performed in 1528 patients. The machine learning model could predict SARS-CoV-2 test results with an accuracy of 86% and an area under the receiver operating characteristic curve of 0.90. Conclusion Machine learning methods can reliably predict a negative SARS-CoV-2 RT-PCR test result using standard blood tests.

An efficient machine learning model for malicious activities recognition in water‐based industrial internet of things

Security and Privacy ◽

10.1002/spy2.154 ◽

2021 ◽

Author(s):

Gamal E. I. Selim ◽

Ezz El‐Din Hemdan ◽

Ahmed M. Shehata ◽

Nawal A. El‐Fishawy

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Learning Model ◽

Industrial Internet Of Things ◽

Industrial Internet ◽

Machine Learning Model ◽

Water Based ◽

Efficient Machine

Predicting Fraud Victimization Using Classical Machine Learning

Entropy ◽

10.3390/e23030300 ◽

2021 ◽

Vol 23 (3) ◽

pp. 300

Author(s):

Mark Lokanan ◽

Susan Liu

Keyword(s):

Machine Learning ◽

Financial Literacy ◽

Learning Algorithm ◽

Demographic Characteristics ◽

Financial Knowledge ◽

The Past ◽

Machine Learning Model ◽

Long Time ◽

Regulatory Organization ◽

Investment Fraud

Protecting financial consumers from investment fraud has been a recurring problem in Canada. The purpose of this paper is to predict the demographic characteristics of investors who are likely to be victims of investment fraud. Data for this paper came from the Investment Industry Regulatory Organization of Canada’s (IIROC) database between January of 2009 and December of 2019. In total, 4575 investors were coded as victims of investment fraud. The study employed a machine-learning algorithm to predict the probability of fraud victimization. The machine learning model deployed in this paper predicted the typical demographic profile of fraud victims as investors who classify as female, have poor financial knowledge, know the advisor from the past, and are retired. Investors who are characterized as having limited financial literacy but a long-time relationship with their advisor have reduced probabilities of being victimized. However, male investors with low or moderate-level investment knowledge were more likely to be preyed upon by their investment advisors. While not statistically significant, older adults, in general, are at greater risk of being victimized. The findings from this paper can be used by Canadian self-regulatory organizations and securities commissions to inform their investors’ protection mandates.