RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network

The ability to perform de novo protein design will allow researchers to expand the variety of available proteins. By designing synthetic structures computationally, they can utilise more structures than those available in the Protein Data Bank, design structures that are not found in nature, or direct the design of proteins to acquire a specific desired structure. While some researchers attempt to design proteins from first physical and thermodynamic principals, we decided to attempt to test whether it is possible to perform de novo helical protein design of just the backbone statistically using machine learning by building a model that uses a long short-term memory (LSTM) architecture. The LSTM model used only the φ and ψ angles of each residue from an augmented dataset of only helical protein structures. Though the network’s generated backbone structures were not perfect, they were idealised and evaluated post generation where the non-ideal structures were filtered out and the adequate structures kept. The results were successful in developing a logical, rigid, compact, helical protein backbone topology. This paper is a proof of concept that shows it is possible to generate a novel helical backbone topology using an LSTM neural network architecture using only the φ and ψ angles as features. The next step is to attempt to use these backbone topologies and sequence design them to form complete protein structures.

Download Full-text

RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network

F1000Research ◽

10.12688/f1000research.22907.3 ◽

2020 ◽

Vol 9 ◽

pp. 298 ◽

Cited By ~ 1

Author(s):

Sari Sabban ◽

Mikhail Markovsky

Keyword(s):

Neural Network ◽

Protein Design ◽

Short Term Memory ◽

De Novo ◽

Protein Structures ◽

Protein Backbone ◽

Short Term ◽

Term Memory ◽

Helical Protein ◽

Long Short Term Memory

The ability to perform de novo protein design will allow researchers to expand the variety of available proteins. By designing synthetic structures computationally, they can utilise more structures than those available in the Protein Data Bank, design structures that are not found in nature, or direct the design of proteins to acquire a specific desired structure. While some researchers attempt to design proteins from first physical and thermodynamic principals, we decided to attempt to test whether it is possible to perform de novo helical protein design of just the backbone statistically using machine learning by building a model that uses a long short-term memory (LSTM) architecture. The LSTM model used only the φ and ψ angles of each residue from an augmented dataset of only helical protein structures. Though the network’s generated backbone structures were not perfect, they were idealised and evaluated post generation where the non-ideal structures were filtered out and the adequate structures kept. The results were successful in developing a logical, rigid, compact, helical protein backbone topology. This paper is a proof of concept that shows it is possible to generate a novel helical backbone topology using an LSTM neural network architecture using only the φ and ψ angles as features. The next step is to attempt to use these backbone topologies and sequence design them to form complete protein structures.

Download Full-text

RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative adversarial neural network

F1000Research ◽

10.12688/f1000research.22907.1 ◽

2020 ◽

Vol 9 ◽

pp. 298 ◽

Cited By ~ 1

Author(s):

Sari Sabban ◽

Mikhail Markovsky

Keyword(s):

Protein Design ◽

Short Term Memory ◽

De Novo ◽

Protein Structures ◽

Protein Backbone ◽

Short Term ◽

Generative Adversarial Network ◽

Term Memory ◽

Helical Protein ◽

Long Short Term Memory

The ability to perform de novo protein design will allow researchers to expand the variety of available proteins. By designing synthetic structures computationally, they can utilise more structures than those available in the Protein Data Bank, design structures that are not found in nature, or direct the design of proteins to acquire a specific desired structure. While some researchers attempt to design proteins from first physical and thermodynamic principals, we decided to attempt to test whether it is possible to perform de novo helical protein design ofjust the backbone statistically using machine learning by building a model that uses a long short-term memory (LSTM) generative adversarial network (GAN) architecture. The LSTM-based GAN model used only theφandψangles of each residue from an augmented dataset of only helical protein structures. Though the network’s generated backbone structures were not perfect, they were idealised and evaluated post generation where the non-ideal structures were filtered out and the adequate structures kept. The results were successful in developing a logical, rigid, compact,helical protein backbone topology. This paper is a proof of concept that shows it is possible to generate a novel helical backbone topology using an LSTM-GAN architecture using only theφandψangles as features. The next step is to attempt to use these backbone topologies and sequence design them to form complete protein structures.

Download Full-text

DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

10.26434/chemrxiv.11626098 ◽

2020 ◽

Author(s):

Yifei Qi ◽

John Z.H. Zhang

Keyword(s):

Neural Network ◽

Protein Design ◽

Protein Sequence ◽

Protein Structures ◽

Three Dimensional ◽

Search Space ◽

Computational Protein Design ◽

Data Sets ◽

Protein Backbone ◽

Natural Amino Acids

Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, using deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 51.56±0.20% in a 5-fold cross validation on the training set and 54.45% and 50.06% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared to that of the approach that simply uses the top-k predictions and therefore enables higher sequence identity in redesigning three proteins with Rosetta. The network and the data sets are available on a web server at <a href="http://protein.org.cn/densecpd.html">http://protein.org.cn/densecpd.html</a>. The results of this study may benefit the further development of computational protein design methods.

Download Full-text

DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

10.26434/chemrxiv.11626098.v1 ◽

2020 ◽

Author(s):

Yifei Qi ◽

John Z.H. Zhang

Keyword(s):

Neural Network ◽

Protein Design ◽

Protein Sequence ◽

Protein Structures ◽

Three Dimensional ◽

Search Space ◽

Computational Protein Design ◽

Data Sets ◽

Protein Backbone ◽

Natural Amino Acids

Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, using deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 51.56±0.20% in a 5-fold cross validation on the training set and 54.45% and 50.06% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared to that of the approach that simply uses the top-k predictions and therefore enables higher sequence identity in redesigning three proteins with Rosetta. The network and the data sets are available on a web server at <a href="http://protein.org.cn/densecpd.html">http://protein.org.cn/densecpd.html</a>. The results of this study may benefit the further development of computational protein design methods.

Download Full-text

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

10.1101/032821 ◽

2015 ◽

Cited By ~ 4

Author(s):

Daniel Quang ◽

Xiaohui Xie

Keyword(s):

Neural Network ◽

Dna Sequences ◽

Short Term Memory ◽

De Novo ◽

Short Term ◽

Noncoding Dna ◽

Regulatory Motifs ◽

Precision Recall Curve ◽

Long Short Term Memory

Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for noncoding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of noncoding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is noncoding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting noncoding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory "grammar" to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models.

Download Full-text

Membrane-spanning α-helical barrels as tractable protein-design targets

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2016.0213 ◽

2017 ◽

Vol 372 (1726) ◽

pp. 20160213 ◽

Cited By ~ 11

Author(s):

Ai Niitsu ◽

Jack W. Heal ◽

Kerstin Fauland ◽

Andrew R. Thomson ◽

Derek N. Woolfson

Keyword(s):

Protein Structure ◽

Protein Design ◽

De Novo ◽

Protein Structures ◽

Coiled Coil ◽

Deep Understanding ◽

Data Bank ◽

Water Soluble ◽

Limiting Factor ◽

Membrane Spanning

The rational ( de novo ) design of membrane-spanning proteins lags behind that for water-soluble globular proteins. This is due to gaps in our knowledge of membrane-protein structure, and experimental difficulties in studying such proteins compared to water-soluble counterparts. One limiting factor is the small number of experimentally determined three-dimensional structures for transmembrane proteins. By contrast, many tens of thousands of globular protein structures provide a rich source of ‘scaffolds’ for protein design, and the means to garner sequence-to-structure relationships to guide the design process. The α-helical coiled coil is a protein-structure element found in both globular and membrane proteins, where it cements a variety of helix–helix interactions and helical bundles. Our deep understanding of coiled coils has enabled a large number of successful de novo designs. For one class, the α-helical barrels—that is, symmetric bundles of five or more helices with central accessible channels—there are both water-soluble and membrane-spanning examples. Recent computational designs of water-soluble α-helical barrels with five to seven helices have advanced the design field considerably. Here we identify and classify analogous and more complicated membrane-spanning α-helical barrels from the Protein Data Bank. These provide tantalizing but tractable targets for protein engineering and de novo protein design. This article is part of the themed issue ‘Membrane pores: from structure and assembly, to medicine and technology’.

Download Full-text

Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design

10.26434/chemrxiv.7990910.v2 ◽

2019 ◽

Author(s):

Niclas Ståhl ◽

Göran Falkman ◽

Alexander Karlsson ◽

Gunnar Mathiason ◽

Jonas Boström

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

De Novo ◽

De Novo Drug Design ◽

Generative Process ◽

New Methods ◽

Multiparameter Optimization ◽

Long Short Term Memory ◽

New Compounds

In medicinal chemistry programs it is key to design and make compounds that are efficacious and safe. This is a long, complex and difficult multi-parameter optimization process, often including several properties with orthogonal trends. New methods for the automated design of compounds against profiles of multiple properties are thus of great value. Here we present a fragment-based reinforcement learning approach based on an actor-critic model, for the generation of novel molecules with optimal properties. The actor and the critic are both modelled with bidirectional long short-term memory (LSTM) networks. The AI method learns how to generate new compounds with desired properties by starting from an initial set of lead molecules and then improve these by replacing some of their fragments. A balanced binary tree based on the similarity of fragments is used in the generative process to bias the output towards structurally similar molecules. The method is demonstrated by a case study showing that 93% of the generated molecules are chemically valid, and a third satisfy the targeted objectives, while there were none in the initial set.

Download Full-text

Estimation of municipal solid waste amount based on one-dimension convolutional neural network and long short-term memory with attention mechanism model: A case study of Shanghai

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.148088 ◽

2021 ◽

Vol 791 ◽

pp. 148088

Author(s):

Kunsen Lin ◽

Youcai Zhao ◽

Lu Tian ◽

Chunlong Zhao ◽

Meilan Zhang ◽

...

Keyword(s):

Neural Network ◽

Municipal Solid Waste ◽

Convolutional Neural Network ◽

Short Term Memory ◽

One Dimension ◽

Short Term ◽

Term Memory ◽

Mechanism Model ◽

Long Short Term Memory

Download Full-text

Electricity Consumption Forecasting Based on a Bidirectional Long-Short-Term Memory Artificial Neural Network

Sustainability ◽

10.3390/su13010104 ◽

2020 ◽

Vol 13 (1) ◽

pp. 104

Author(s):

Dana-Mihaela Petroșanu ◽

Alexandru Pîrjan

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Short Term Memory ◽

Electricity Consumption ◽

Short Term ◽

Term Memory ◽

Storage Room ◽

Long Short Term Memory ◽

Commercial Center ◽

Business And Management

The accurate forecasting of the hourly month-ahead electricity consumption represents a very important aspect for non-household electricity consumers and system operators, and at the same time represents a key factor in what regards energy efficiency and achieving sustainable economic, business, and management operations. In this context, we have devised, developed, and validated within the paper an hourly month ahead electricity consumption forecasting method. This method is based on a bidirectional long-short-term memory (BiLSTM) artificial neural network (ANN) enhanced with a multiple simultaneously decreasing delays approach coupled with function fitting neural networks (FITNETs). The developed method targets the hourly month-ahead total electricity consumption at the level of a commercial center-type consumer and for the hourly month ahead consumption of its refrigerator storage room. The developed approach offers excellent forecasting results, highlighted by the validation stage’s results along with the registered performance metrics, namely 0.0495 for the root mean square error (RMSE) performance metric for the total hourly month-ahead electricity consumption and 0.0284 for the refrigerator storage room. We aimed for and managed to attain an hourly month-ahead consumed electricity prediction without experiencing a significant drop in the forecasting accuracy that usually tends to occur after the first two weeks, therefore achieving a reliable method that satisfies the contractor’s needs, being able to enhance his/her activity from the economic, business, and management perspectives. Even if the devised, developed, and validated forecasting solution for the hourly consumption targets a commercial center-type consumer, based on its accuracy, this solution can also represent a useful tool for other non-household electricity consumers due to its generalization capability.

Download Full-text