Directed Evolution of a Selective and Sensitive Serotonin Biosensor Via Machine Learning

Author(s):  
Elizabeth Unger ◽  
Jacob Pearson Keller ◽  
Michael Altermatt ◽  
Ruqiang Liang ◽  
Zi Yao ◽  
...  
2021 ◽  
Author(s):  
Yutaka Saito ◽  
Misaki Oikawa ◽  
Takumi Sato ◽  
Hikaru Nakazawa ◽  
Tomoyuki Ito ◽  
...  

Machine learning (ML) is becoming an attractive tool in mutagenesis-based protein engineering because of its ability to design a variant library containing proteins with a desired function. However, it remains unclear how ML guides directed evolution in sequence space depending on the composition of training data. Here, we present a ML-guided directed evolution study of an enzyme to investigate the effects of a known "highly positive" variant (i.e., variant known to have high enzyme activity) in training data. We performed two separate series of ML-guided directed evolution of Sortase A with and without a known highly positive variant called 5M in training data. In each series, two rounds of ML were conducted: variants predicted by the first round were experimentally evaluated, and used as additional training data for the second-round prediction. The improvements in enzyme activity were comparable between the two series, both achieving enzyme activity 2.2-2.5 times higher than 5M. Intriguingly, the sequences of the improved variants were largely different between the two series, indicating that ML guided the directed evolution to the distinct regions of sequence space depending on the presence/absence of the highly positive variant in the training data. This suggests that the sequence diversity of improved variants can be expanded not only by conventional ML using the whole training data, but also by ML using a subset of the training data even when it lacks highly positive variants. In summary, this study demonstrates the importance of regulating the composition of training data in ML-guided directed evolution.


2020 ◽  
Author(s):  
Bruce J. Wittmann ◽  
Yisong Yue ◽  
Frances H. Arnold

AbstractDue to screening limitations, in directed evolution (DE) of proteins it is rarely feasible to fully evaluate combinatorial mutant libraries made by mutagenesis at multiple sites. Instead, DE often involves a single-step greedy optimization in which the mutation in the highest-fitness variant identified in each round of single-site mutagenesis is fixed. However, because the effects of a mutation can depend on the presence or absence of other mutations, the efficiency and effectiveness of a single-step greedy walk is influenced by both the starting variant and the order in which beneficial mutations are identified—the process is path-dependent. We recently demonstrated a path-independent machine learning-assisted approach to directed evolution (MLDE) that allows in silico screening of full combinatorial libraries made by simultaneous saturation mutagenesis, thus explicitly capturing the effects of cooperative mutations and bypassing the path-dependence that can limit greedy optimization. Here, we thoroughly investigate and optimize an MLDE workflow by testing a number of design considerations of the MLDE pipeline. Specifically, we (1) test the effects of different encoding strategies on MLDE efficiency, (2) integrate new models and a training procedure more amenable to protein engineering tasks, and (3) incorporate training set design strategies to avoid information-poor low-fitness protein variants (“holes”) in the training data. When applied to an epistatic, hole-filled, four-site combinatorial fitness landscape of protein G domain B1 (GB1), the resulting focused training MLDE (ftMLDE) protocol achieved the global fitness maximum up to 92% of the time at a total screening burden of 470 variants. In contrast, minimal-screening-burden single-step greedy optimization over the GB1 fitness landscape reached the global maximum just 1.2% of the time; ftMLDE matching this minimal screening burden (80 total variants) achieved the global optimum up to 9.6% of the time with a 49% higher expected maximum fitness achieved. To facilitate further development of MLDE, we present the MLDE software package (https://github.com/fhalab/MLDE), which is designed for use by protein engineers without computational or machine learning expertise.


2018 ◽  
Vol 7 (9) ◽  
pp. 2014-2022 ◽  
Author(s):  
Yutaka Saito ◽  
Misaki Oikawa ◽  
Hikaru Nakazawa ◽  
Teppei Niide ◽  
Tomoshi Kameda ◽  
...  

2020 ◽  
Vol 118 (3) ◽  
pp. 339a
Author(s):  
Yutaka Saito ◽  
Misaki Oikawa ◽  
Hikaru Nakazawa ◽  
Takumi Sato ◽  
Tomoshi Kameda ◽  
...  

2021 ◽  
Vol 69 ◽  
pp. 11-18 ◽  
Author(s):  
Bruce J Wittmann ◽  
Kadina E Johnston ◽  
Zachary Wu ◽  
Frances H Arnold

2022 ◽  
Author(s):  
Arjun Gupta ◽  
Sangeeta Agrawal

Globally, nearly a million plastic bottles are produced every minute (1). These non-biodegradable plastic products are composed of Polyethylene terephthalate (PET). In 2016, researchers discovered PETase, an enzyme from the bacteria Ideonella sakaiensis which breaks down PET and nonbiodegradable plastic. However, PETase has low efficiency at high temperatures. In this project, we optimized the rate of PET degradation by PETase by designing new mutant enzymes which could break down PET much faster than PETase, which is currently the gold standard. We used machine learning (ML) guided directed evolution to modify the PETase enzyme to have a higher optimal temperature (Topt), which would allow the enzyme to degrade PET more efficiently. First, we trained three machine learning models to predict Topt with high performance, including Logistic Regression, Linear Regression and Random Forest. We then used Random Forest to perform ML-guided directed evolution. Our algorithm generated hundreds of mutants of PETase and screened them using Random Forest to select mutants with the highest Topt, and then used the top mutants as the enzyme being mutated. After 1000 iterations, we produced a new mutant of PETase with Topt of 71.38℃. We also produced a new mutant enzyme after 29 iterations with Topt of 61.3℃. To ensure these mutant enzymes would remain stable, we predicted their melting temperatures using an external predictor and found the 29-iteration mutant had improved thermostability over PETase. Our research is significant because using our approach and algorithm, scientists can optimize additional enzymes for improved efficiency.


Sign in / Sign up

Export Citation Format

Share Document