scholarly journals A billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction

2021 ◽  
Author(s):  
Philippe Auguste Robert ◽  
Rahmad Akbar ◽  
Robert Frank ◽  
Milena Pavlović ◽  
Michael Widrich ◽  
...  

Machine learning (ML) is a key technology to enable accurate prediction of antibody-antigen binding, a prerequisite for in silico vaccine and antibody design. Two orthogonal problems hinder the current application of ML to antibody-specificity prediction and the benchmarking thereof: (i) The lack of a unified formalized mapping of immunological antibody specificity prediction problems into ML notation and (ii) the unavailability of large-scale training datasets. Here, we developed the Absolut! software suite that allows the parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We show that Absolut!-generated datasets recapitulate critical biological sequence and structural features that render antibody-antigen binding prediction challenging. To demonstrate the immediate, high-throughput, and large-scale applicability of Absolut!, we have created an online database of 1 billion antibody-antigen structures, the extension of which is only constrained by moderate computational resources. We translated immunological antibody specificity prediction problems into ML tasks and used our database to investigate paratope-epitope binding prediction accuracy as a function of structural information encoding, dataset size, and ML method, which is unfeasible with existing experimental data. Furthermore, we found that in silico investigated conditions, predicted to increase antibody specificity prediction accuracy, align with and extend conclusions drawn from experimental antibody-antigen structural data. In summary, the Absolut! framework enables the development and benchmarking of ML strategies for biotherapeutics discovery and design.

2021 ◽  
Vol 15 (8) ◽  
pp. 878-888
Author(s):  
Yang Liu ◽  
Xia-hui Ouyang ◽  
Zhi-Xiong Xiao ◽  
Le Zhang ◽  
Yang Cao

Background: T lymphocyte achieves an immune response by recognizing antigen peptides (also known as T cell epitopes) through major histocompatibility complex (MHC) molecules. The immunogenicity of T cell epitopes depends on their source and stability in combination with MHC molecules. The binding of the peptide to MHC is the most selective step, so predicting the binding affinity of the peptide to MHC is the principal step in predicting T cell epitopes. The identification of epitopes is of great significance in the research of vaccine design and T cell immune response. Objective: The traditional method for identifying epitopes is to synthesize and test the binding activity of peptide by experimental methods, which is not only time-consuming, but also expensive. In silico methods for predicting peptide-MHC binding emerge to pre-select candidate peptides for experimental testing, which greatly saves time and costs. By summarizing and analyzing these methods, we hope to have a better insight and provide guidance for future directions. Methods: Up to now, a number of methods have been developed to predict the binding ability of peptides to MHC based on various principles. Some of them employ matrix models or machine learning models based on the sequence characteristic embedded in peptides or MHC to predict the binding ability of peptides to MHC. Some others utilize the three-dimensional structural information of peptides or MHC, for example, by extracting three-dimensional structural information to construct a feature matrix or machine learning model, or directly using protein structure prediction, molecular docking to predict the binding mode of peptides and MHC. Results: Although the methods in predicting peptide-MHC binding based on the feature matrix or machine learning model can achieve high-throughput prediction, the accuracy of which depends heavily on the sequence characteristic of confirmed binding peptides. In addition, it cannot provide insights into the mechanism of antigen specificity. Therefore, such methods have certain limitations in practical applications. Methods in predicting peptide-MHC binding based on structural prediction or molecular docking are computationally intensive compared to the methods based on feature matrix or machine learning model and the challenge is how to predict a reliable structural model. Conclusion: This paper reviews the principles, advantages and disadvantages of the methods of peptide-MHC binding prediction and discussed the future directions to achieve more accurate predictions.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Rudolf A. Römer ◽  
Navodya S. Römer ◽  
A. Katrine Wallis

AbstractThe worldwide CoVid-19 pandemic has led to an unprecedented push across the whole of the scientific community to develop a potent antiviral drug and vaccine as soon as possible. Existing academic, governmental and industrial institutions and companies have engaged in large-scale screening of existing drugs, in vitro, in vivo and in silico. Here, we are using in silico modelling of possible SARS-CoV-2 drug targets, as deposited on the Protein Databank (PDB), and ascertain their dynamics, flexibility and rigidity. For example, for the SARS-CoV-2 spike protein—using its complete homo-trimer configuration with 2905 residues—our method identifies a large-scale opening and closing of the S1 subunit through movement of the S$${}^\text{B}$$ B domain. We compute the full structural information of this process, allowing for docking studies with possible drug structures. In a dedicated database, we present similarly detailed results for the further, nearly 300, thus far resolved SARS-CoV-2-related protein structures in the PDB.


2017 ◽  
Vol 5 (2) ◽  
pp. 216-236 ◽  
Author(s):  
Jie Jiang ◽  
Lele Yu ◽  
Jiawei Jiang ◽  
Yuhong Liu ◽  
Bin Cui

Abstract Machine Learning (ML) techniques now are ubiquitous tools to extract structural information from data collections. With the increasing volume of data, large-scale ML applications require an efficient implementation to accelerate the performance. Existing systems parallelize algorithms through either data parallelism or model parallelism. But data parallelism cannot obtain good statistical efficiency due to the conflicting updates to parameters while the performance is damaged by global barriers in model parallel methods. In this paper, we propose a new system, named Angel, to facilitate the development of large-scale ML applications in production environment. By allowing concurrent updates to model across different groups and scheduling the updates in each group, Angel can achieve a good balance between hardware efficiency and statistical efficiency. Besides, Angel reduces the network latency by overlapping the parameter pulling and update computing and also utilizes the sparseness of data to avoid the pulling of unnecessary parameters. We also enhance the usability of Angel by providing a set of efficient tools to integrate with application pipelines and provisioning efficient fault tolerance mechanisms. We conduct extensive experiments to demonstrate the superiority of Angel.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shitao Zhao ◽  
Michiaki Hamada

Abstract Background Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture. Results Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions. Conclusions Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.


2021 ◽  
Author(s):  
Chao Ye ◽  
Wenxing Hu ◽  
Bruno Gaeta

DNA sequencing technologies are providing new insights into the immune response by allowing the large scale sequencing of rearranged immunoglobulin gene present in an individual, however the applications of this approach are limited by the lack of methods for determining the antigen(s) that an immunoglobulin encoded by a given sequence binds to. Computational methods for predicting antibody-antigen interactions that leverage structure prediction and docking have been proposed, however these methods require knowledge of the 3D structures. As a step towards the development of a machine learning method suitable for predicting antibody-antigen binding affinities from sequence data, a weighted nearest neighbor machine learning approach was applied to the problem. A prediction program was coded in Python and evaluated using cross-validation on a dataset of 600 antibodies interacting with 50 antigens. The classification predicting accuracy was around 76% for this dataset. These results provide a useful frame of reference as well as protocols and considerations for machine learning and dataset creation in this area. Both the dataset (in csv format) and the machine learning program (coded in python) are freely available for download.


2021 ◽  
Vol 14 (10) ◽  
pp. 968
Author(s):  
Chunlai Tam ◽  
Ashutosh Kumar ◽  
Kam Y. J. Zhang

Modeling the binding pose of an antibody is a prerequisite to structure-based affinity maturation and design. Without knowing a reliable binding pose, the subsequent structural simulation is largely futile. In this study, we have developed a method of machine learning-guided re-ranking of antigen binding poses of nanobodies, the single-domain antibody which has drawn much interest recently in antibody drug development. We performed a large-scale self-docking experiment of nanobody–antigen complexes. By training a decision tree classifier through mapping a feature set consisting of energy, contact and interface property descriptors to a measure of their docking quality of the refined poses, significant improvement in the median ranking of native-like nanobody poses by was achieved eightfold compared with ClusPro and an established deep 3D CNN classifier of native protein–protein interaction. We further interpreted our model by identifying features that showed relatively important contributions to the prediction performance. This study demonstrated a useful method in improving our current ability in pose prediction of nanobodies.


2020 ◽  
Author(s):  
Rudolf A. Römer ◽  
Navodya S. Römer ◽  
A. Katrine Wallis

ABSTRACTThe worldwide CoVid-19 pandemic has led to an unprecedented push across the whole of the scientific community to develop a potent antiviral drug and vaccine as soon as possible. Existing academic, governmental and industrial institutions and companies have engaged in large-scale screening of existing drugs, in vitro, in vivo and in silico. Here, we are using in silico modelling of SARS-CoV-2 drug targets, i.e. SARS-CoV-2 protein structures as deposited on the Protein Databank (PDB). We study their flexibility, rigidity and mobility, an important first step in trying to ascertain their dynamics for further drug-related docking studies. We are using a recent protein flexibility modelling approach, combining protein structural rigidity with possible motion consistent with chemical bonds and sterics. For example, for the SARS-CoV-2 spike protein in the open configuration, our method identifies a possible further opening and closing of the S1 subunit through movement of SB domain. With full structural information of this process available, docking studies with possible drug structures are then possible in silico. In our study, we present full results for the more than 200 thus far published SARS-CoV-2-related protein structures in the PDB.


Sign in / Sign up

Export Citation Format

Share Document