scholarly journals CLADE: Cluster learning-assisted directed evolution

Author(s):  
Yuchi Qiu ◽  
Jian Hu ◽  
Guo-Wei Wei

Abstract Directed evolution (DE), a strategy for protein engineering, optimizes protein properties (i.e. fitness) by expensive and time-consuming screen or selection of a large combinatorial sequence space. Machine learning-assisted directed evolution (MLDE) that screens variant properties in silico can reduce the experimental burden. However, the MLDE utilizing small experimentally labeled training data from random sampling renders low global maximal fitness hitting rates. This work introduces a cluster learning-assisted directed evolution (CLADE) framework, particularly designed for systems without high-throughput screening assays, that combines sampling through hierarchical unsupervised clustering and supervised learning to guide protein engineering. Based on general biological information, CLADE splits the genetic combinatorial space into various subspaces with heterogeneous evolutionary traits, which guides the selection of experimental sampling sets and the subsequent building up of supervised learning training sets. By virtually screening two four-site combinatorial fitness landscapes from protein G domain B1 (GB1) and PhoQ, our CLADE consistently showed near 3-fold improvement on global maximal fitness hitting rate than using randomly sampled training data. Our CLADE can be easily applied to various biological systems and customized for systems with different throughput levels to maximize its accuracy and efficiency. It promises a significant impact to protein engineering.

2014 ◽  
Vol 24 (38) ◽  
pp. 97
Author(s):  
Antonio Rico-Sulayes

<p align="justify">This article proposes the architecture for a system that uses previously learned weights to sort query results from unstructured data bases when building specialized dictionaries. A common resource in the construction of dictionaries, unstructured data bases have been especially useful in providing information about lexical items frequencies and examples in use. However, when building specialized dictionaries, whose selection of lexical items does not rely on frequency, the use of these data bases gets restricted to a simple provider of examples. Even in this task, the information unstructured data bases provide may not be very useful when looking for specialized uses of lexical items with various meanings and very long lists of results. In the face of this problem, long lists of hits can be rescored based on a supervised learning model that relies on previously helpful results. The allocation of a vast set of high quality training data for this rescoring system is reported here. Finally, the architecture of sucha system,an unprecedented tool in specialized lexicography, is proposed.</p>


Micromachines ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 734 ◽  
Author(s):  
Lindong Weng ◽  
James E. Spoonamore

Protein engineering—the process of developing useful or valuable proteins—has successfully created a wide range of proteins tailored to specific agricultural, industrial, and biomedical applications. Protein engineering may rely on rational techniques informed by structural models, phylogenic information, or computational methods or it may rely upon random techniques such as chemical mutation, DNA shuffling, error prone polymerase chain reaction (PCR), etc. The increasing capabilities of rational protein design coupled to the rapid production of large variant libraries have seriously challenged the capacity of traditional screening and selection techniques. Similarly, random approaches based on directed evolution, which relies on the Darwinian principles of mutation and selection to steer proteins toward desired traits, also requires the screening of very large libraries of mutants to be truly effective. For either rational or random approaches, the highest possible screening throughput facilitates efficient protein engineering strategies. In the last decade, high-throughput screening (HTS) for protein engineering has been leveraging the emerging technologies of droplet microfluidics. Droplet microfluidics, featuring controlled formation and manipulation of nano- to femtoliter droplets of one fluid phase in another, has presented a new paradigm for screening, providing increased throughput, reduced reagent volume, and scalability. We review here the recent droplet microfluidics-based HTS systems developed for protein engineering, particularly directed evolution. The current review can also serve as a tutorial guide for protein engineers and molecular biologists who need a droplet microfluidics-based HTS system for their specific applications but may not have prior knowledge about microfluidics. In the end, several challenges and opportunities are identified to motivate the continued innovation of microfluidics with implications for protein engineering.


2021 ◽  
Vol 12 ◽  
Author(s):  
Deniz Akdemir ◽  
Simon Rio ◽  
Julio Isidro y Sánchez

A major barrier to the wider use of supervised learning in emerging applications, such as genomic selection, is the lack of sufficient and representative labeled data to train prediction models. The amount and quality of labeled training data in many applications is usually limited and therefore careful selection of the training examples to be labeled can be useful for improving the accuracies in predictive learning tasks. In this paper, we present an R package, TrainSel, which provides flexible, efficient, and easy-to-use tools that can be used for the selection of training populations (STP). We illustrate its use, performance, and potentials in four different supervised learning applications within and outside of the plant breeding area.


2019 ◽  
Author(s):  
Huifang Xu ◽  
Weinan Liang ◽  
Linlin Ning ◽  
Yuanyuan Jiang ◽  
Wenxia Yang ◽  
...  

P450 fatty acid decarboxylases (FADCs) have recently been attracting considerable attention owing to their one-step direct production of industrially important 1-alkenes from biologically abundant feedstock free fatty acids under mild conditions. However, attempts to improve the catalytic activity of FADCs have met with little success. Protein engineering has been limited to selected residues and small mutant libraries due to lack of an effective high-throughput screening (HTS) method. Here, we devise a catalase-deficient <i>Escherichia coli</i> host strain and report an HTS approach based on colorimetric detection of H<sub>2</sub>O<sub>2</sub>-consumption activity of FADCs. Directed evolution enabled by this method has led to effective identification for the first time of improved FADC variants for medium-chain 1-alkene production from both DNA shuffling and random mutagenesis libraries. Advantageously, this screening method can be extended to other enzymes that stoichiometrically utilize H<sub>2</sub>O<sub>2</sub> as co-substrate.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1807
Author(s):  
Sascha Grollmisch ◽  
Estefanía Cano

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Aliaksei Vasilevich ◽  
Aurélie Carlier ◽  
David A. Winkler ◽  
Shantanu Singh ◽  
Jan de Boer

AbstractNatural evolution tackles optimization by producing many genetic variants and exposing these variants to selective pressure, resulting in the survival of the fittest. We use high throughput screening of large libraries of materials with differing surface topographies to probe the interactions of implantable device coatings with cells and tissues. However, the vast size of possible parameter design space precludes a brute force approach to screening all topographical possibilities. Here, we took inspiration from Nature to optimize materials surface topographies using evolutionary algorithms. We show that successive cycles of material design, production, fitness assessment, selection, and mutation results in optimization of biomaterials designs. Starting from a small selection of topographically designed surfaces that upregulate expression of an osteogenic marker, we used genetic crossover and random mutagenesis to generate new generations of topographies.


Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

For the past few years, deep learning (DL) robustness (i.e. the ability to maintain the same decision when inputs are subject to perturbations) has become a question of paramount importance, in particular in settings where misclassification can have dramatic consequences. To address this question, authors have proposed different approaches, such as adding regularizers or training using noisy examples. In this paper we introduce a regularizer based on the Laplacian of similarity graphs obtained from the representation of training data at each layer of the DL architecture. This regularizer penalizes large changes (across consecutive layers in the architecture) in the distance between examples of different classes, and as such enforces smooth variations of the class boundaries. We provide theoretical justification for this regularizer and demonstrate its effectiveness to improve robustness on classical supervised learning vision datasets for various types of perturbations. We also show it can be combined with existing methods to increase overall robustness.


Sign in / Sign up

Export Citation Format

Share Document