scholarly journals Expanding functional protein sequence space using generative adversarial networks

2019 ◽  
Author(s):  
Donatas Repecka ◽  
Vykintas Jauniskis ◽  
Laurynas Karpus ◽  
Elzbieta Rembeza ◽  
Jan Zrimec ◽  
...  

ABSTRACTDe novo protein design for catalysis of any desired chemical reaction is a long standing goal in protein engineering, due to the broad spectrum of technological, scientific and medical applications. Currently, mapping protein sequence to protein function is, however, neither computationionally nor experimentally tangible 1,2. Here we developed ProteinGAN, a specialised variant of the generative adversarial network 3 that is able to ‘learn’ natural protein sequence diversity and enables the generation of functional protein sequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino acid sequence space and creates new, highly diverse sequence variants with natural-like physical properties. Using malate dehydrogenase as a template enzyme, we show that 24% of the ProteinGAN-generated and experimentally tested sequences are soluble and display wild-type level catalytic activity in the tested conditions in vitro, even in highly mutated (>100 mutations) sequences. ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse novel functional proteins within the allowed biological constraints of the sequence space.

2019 ◽  
Author(s):  
Mostafa Karimi ◽  
Shaowen Zhu ◽  
Yue Cao ◽  
Yang Shen

AbstractMotivationFacing data quickly accumulating on protein sequence and structure, this study is addressing the following question: to what extent could current data alone reveal deep insights into the sequence-structure relationship, such that new sequences can be designed accordingly for novel structure folds?ResultsWe have developed novel deep generative models, constructed low-dimensional and generalizable representation of fold space, exploited sequence data with and without paired structures, and developed ultra-fast fold predictor as an oracle providing feedback. The resulting semi-supervised gcWGAN is assessed with the oracle over 100 novel folds not in the training set and found to generate more yields and cover 3.6 times more target folds compared to a competing data-driven method (cVAE). Assessed with structure predictor over representative novel folds (including one not even part of basis folds), gcWGAN designs are found to have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. gcWGAN explores uncharted sequence space to design proteins by learning from current sequence-structure data. The ultra fast data-driven model can be a powerful addition to principle-driven design methods through generating seed designs or tailoring sequence space.AvailabilityData and source codes will be available upon [email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Tileli Amimeur ◽  
Jeremy M. Shaver ◽  
Randal R. Ketchem ◽  
J. Alex Taylor ◽  
Rutilio H. Clark ◽  
...  

ABSTRACTWe demonstrate the use of a Generative Adversarial Network (GAN), trained from a set of over 400,000 light and heavy chain human antibody sequences, to learn the rules of human antibody formation. The resulting model surpasses common in silico techniques by capturing residue diversity throughout the variable region, and is capable of generating extremely large, diverse libraries of novel antibodies that mimic somatically hypermutated human repertoire response. This method permits us to rationally design de novo humanoid antibody libraries with explicit control over various properties of our discovery library. Through transfer learning, we are able to bias the GAN to generate molecules with key properties of interest such as improved stability and developability, lower predicted MHC Class II binding, and specific complementarity-determining region (CDR) characteristics. These approaches also provide a mechanism to better study the complex relationships between antibody sequence and molecular behavior, both in vitro and in vivo. We validate our method by successfully expressing a proof-of-concept library of nearly 100,000 GAN-generated antibodies via phage display. We present the sequences and homology-model structures of example generated antibodies expressed in stable CHO pools and evaluated across multiple biophysical properties. The creation of discovery libraries using our in silico approach allows for the control of pharmaceutical properties such that these therapeutic antibodies can provide a more rapid and cost-effective response to biological threats.


2021 ◽  
Vol 17 (11) ◽  
pp. e1009555
Author(s):  
Nina G. Bozhanova ◽  
Joel M. Harp ◽  
Brian J. Bender ◽  
Alexey S. Gavrikov ◽  
Dmitry A. Gorbachev ◽  
...  

The use of unnatural fluorogenic molecules widely expands the pallet of available genetically encoded fluorescent imaging tools through the design of fluorogen activating proteins (FAPs). While there is already a handful of such probes available, each of them went through laborious cycles of in vitro screening and selection. Computational modeling approaches are evolving incredibly fast right now and are demonstrating great results in many applications, including de novo protein design. It suggests that the easier task of fine-tuning the fluorogen-binding properties of an already functional protein in silico should be readily achievable. To test this hypothesis, we used Rosetta for computational ligand docking followed by protein binding pocket redesign to further improve the previously described FAP DiB1 that is capable of binding to a BODIPY-like dye M739. Despite an inaccurate initial docking of the chromophore, the incorporated mutations nevertheless improved multiple photophysical parameters as well as the overall performance of the tag. The designed protein, DiB-RM, shows higher brightness, localization precision, and apparent photostability in protein-PAINT super-resolution imaging compared to its parental variant DiB1. Moreover, DiB-RM can be cleaved to obtain an efficient split system with enhanced performance compared to a parental DiB-split system. The possible reasons for the inaccurate ligand binding pose prediction and its consequence on the outcome of the design experiment are further discussed.


2018 ◽  
Vol 35 (14) ◽  
pp. 2492-2494
Author(s):  
Tania Cuppens ◽  
Thomas E Ludwig ◽  
Pascal Trouvé ◽  
Emmanuelle Genin

Abstract Summary When analyzing sequence data, genetic variants are considered one by one, taking no account of whether or not they are found in the same individual. However, variant combinations might be key players in some diseases as variants that are neutral on their own can become deleterious when associated together. GEMPROT is a new analysis tool that allows, from a phased vcf file, to visualize the consequences of the genetic variants on the protein. At the level of an individual, the program shows the variants on each of the two protein sequences and the Pfam functional protein domains. When data on several individuals are available, GEMPROT lists the haplotypes found in the sample and can compare the haplotype distributions between different sub-groups of individuals. By offering a global visualization of the gene with the genetic variants present, GEMPROT makes it possible to better understand the impact of combinations of genetic variants on the protein sequence. Availability and implementation GEMPROT is freely available at https://github.com/TaniaCuppens/GEMPROT. An on-line version is also available at http://med-laennec.univ-brest.fr/GEMPROT/. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Sean A. Higgins ◽  
Sorel Ouonkap ◽  
David F. Savage

ABSTRACTComprehensive and programmable protein mutagenesis is critical for understanding structure-function relationships and improving protein function. However, current techniques enabling comprehensive protein mutagenesis are based on PCR and require in vitro reactions involving specialized protocols and reagents. This has complicated efforts to rapidly and reliably produce desired comprehensive protein libraries. Here we demonstrate that plasmid recombineering is a simple and robust in vivo method for the generation of protein mutants for both comprehensive library generation as well as programmable targeting of sequence space. Using the fluorescent protein iLOV as a model target, we build a complete mutagenesis library and find it to be specific and unbiased, detecting 99.8% of our intended mutations. We then develop a thermostability screen and utilize our comprehensive mutation data to rapidly construct a targeted and multiplexed library that identifies significantly improved variants, thus demonstrating rapid protein engineering in a simple one-pot protocol.


Science ◽  
2020 ◽  
Vol 370 (6521) ◽  
pp. 1208-1214 ◽  
Author(s):  
Thomas W. Linsky ◽  
Renan Vergara ◽  
Nuria Codina ◽  
Jorgen W. Nelson ◽  
Matthew J. Walker ◽  
...  

We developed a de novo protein design strategy to swiftly engineer decoys for neutralizing pathogens that exploit extracellular host proteins to infect the cell. Our pipeline allowed the design, validation, and optimization of de novo human angiotensin-converting enzyme 2 (hACE2) decoys to neutralize severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The best monovalent decoy, CTC-445.2, bound with low nanomolar affinity and high specificity to the receptor-binding domain (RBD) of the spike protein. Cryo–electron microscopy (cryo-EM) showed that the design is accurate and can simultaneously bind to all three RBDs of a single spike protein. Because the decoy replicates the spike protein target interface in hACE2, it is intrinsically resilient to viral mutational escape. A bivalent decoy, CTC-445.2d, showed ~10-fold improvement in binding. CTC-445.2d potently neutralized SARS-CoV-2 infection of cells in vitro, and a single intranasal prophylactic dose of decoy protected Syrian hamsters from a subsequent lethal SARS-CoV-2 challenge.


2018 ◽  
Author(s):  
Patrick Willems ◽  
Alison Horne ◽  
Sofie Goormachtig ◽  
Ive De Smet ◽  
Alexander Botzki ◽  
...  

SUMMARYPosttranslational modifications (PTMs) of proteins are central in any kind of cellular signaling. Modern mass spectrometry technologies enable comprehensive identification and quantification of various PTMs. Given the increased number and types of mapped protein modifications, a database is necessary that simultaneouly integrates and compares site-specific information for different PTMs, especially in plants for which the available PTM data are poorly catalogued. Here, we present the Plant PTM Viewer (http://www.psb.ugent.be/PlantPTMViewer), an integrative PTM resource that comprises approximately 200,000 PTM sites for 17 types of protein modifications in plant proteins from five different species. The Plant PTM Viewer provides the user with a protein sequence overview in which the experimentally evidenced PTMs are highlighted together with functional protein domains or active site residues. The PTM sequence search tool can query PTM combinations in specific protein sequences, whereas the PTM BLAST tool searches for modified protein sequences to detect conserved PTMs in homologous sequences. Taken together, these tools facilitate to assume the role and potential interplay of PTMs in specific proteins or within a broader systems biology context. The Plant PTM Viewer is an open repository that allows submission of mass spectrometry-based PTM data to remain at pace with future PTM plant studies.


Molecules ◽  
2020 ◽  
Vol 25 (14) ◽  
pp. 3250 ◽  
Author(s):  
Eugene Lin ◽  
Chieh-Hsin Lin ◽  
Hsien-Yuan Lane

A growing body of evidence now suggests that artificial intelligence and machine learning techniques can serve as an indispensable foundation for the process of drug design and discovery. In light of latest advancements in computing technologies, deep learning algorithms are being created during the development of clinically useful drugs for treatment of a number of diseases. In this review, we focus on the latest developments for three particular arenas in drug design and discovery research using deep learning approaches, such as generative adversarial network (GAN) frameworks. Firstly, we review drug design and discovery studies that leverage various GAN techniques to assess one main application such as molecular de novo design in drug design and discovery. In addition, we describe various GAN models to fulfill the dimension reduction task of single-cell data in the preclinical stage of the drug development pipeline. Furthermore, we depict several studies in de novo peptide and protein design using GAN frameworks. Moreover, we outline the limitations in regard to the previous drug design and discovery studies using GAN models. Finally, we present a discussion of directions and challenges for future research.


2020 ◽  
Vol 74 (9) ◽  
pp. 704-709
Author(s):  
Michael A. Nash

Protein sequences inhabit a discrete set in macromolecular space with incredible capacity to treat human disease. Despite our ability to program and manipulate protein sequences, the vast majority of protein development efforts are still done heuristically without a unified set of guiding principles. This article highlights work in understanding biophysical stability and function of proteins, developing new biophysical measurement tools and building high-throughput screening platforms to explore functional protein sequences. We highlight two primary areas. First, molecular biomechanics is a subfield concerned with the response of proteins to mechanical forces, and how we can leverage mechanical force to control protein function. The second subfield investigates the use of polymers and hydrogels in protein engineering and directed evolution in pursuit of new molecular systems with therapeutic applications. These two subdisciplines complement each other by shedding light onto sequence and structural features that can be used to impart stability into therapeutic proteins.


Sign in / Sign up

Export Citation Format

Share Document