scholarly journals Modeling the trajectory of SARS-CoV-2 spike protein evolution in continuous latent space using a neural network and Gaussian process

2021 ◽  
Author(s):  
Samuel King ◽  
Xinyi E. Chen ◽  
Sarah W. S. Ng ◽  
Kimia Rostin ◽  
Tylo Roberts ◽  
...  

AbstractViral vaccines can lose their efficacy as the genomes of targeted viruses rapidly evolve, resulting in new variants that may evade vaccine-induced immunity. This process is apparent in the emergence of new SARS-CoV-2 variants which have the potential to undermine vaccination efforts and cause further outbreaks. Predictive vaccinology points to a future of pandemic preparedness in which vaccines can be developed preemptively based in part on predictive models of viral evolution. Thus, modeling the trajectory of SARS-CoV-2 spike protein evolution could have value for mRNA vaccine development. Traditionally, in silico sequence evolution has been modeled discretely, while there has been limited investigation into continuous models. Here we present the Viral Predictor for mRNA Evolution (VPRE), an open-source software tool which learns from mutational patterns in viral proteins and models their most statistically likely evolutionary trajectories. We trained a variational autoencoder with real-time and simulated SARS-CoV-2 genome data from Australia to encode discrete spike protein sequences into continuous numerical variables. To simulate evolution along a phylogenetic path, we trained a Gaussian process model with the numerical variables to project spike protein evolution up to five months in advance. Our predictions mapped primarily to a sequence that differed by a single amino acid from the most reported spike protein in Australia within the prediction timeframe, indicating the utility of deep learning and continuous latent spaces for modeling viral protein evolution. VPRE can be readily adapted to investigate and predict the evolution of viruses other than SARS-CoV-2 in temporal, geographic, and lineage-specific pathways.

2018 ◽  
Author(s):  
Caitlin C. Bannan ◽  
David Mobley ◽  
A. Geoff Skillman

<div>A variety of fields would benefit from accurate pK<sub>a</sub> predictions, especially drug design due to the affect a change in ionization state can have on a molecules physiochemical properties.</div><div>Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic pK<sub>a</sub>s of 24 drug like small molecules.</div><div>We recently built a general model for predicting pK<sub>a</sub>s using a Gaussian process regression trained using physical and chemical features of each ionizable group.</div><div>Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton.</div><div>These features are fed into a Scikit-learn Gaussian process to predict microscopic pK<sub>a</sub>s which are then used to analytically determine macroscopic pK<sub>a</sub>s.</div><div>Our Gaussian process is trained on a set of 2,700 macroscopic pK<sub>a</sub>s from monoprotic and select diprotic molecules.</div><div>Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge.</div><div>Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic.</div><div>Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. </div><div>Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy.</div><div>The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable. </div>


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Mikail Dogan ◽  
Lina Kozhaya ◽  
Lindsey Placek ◽  
Courtney Gunter ◽  
Mesut Yigit ◽  
...  

AbstractDevelopment of antibody protection during SARS-CoV-2 infection is a pressing question for public health and for vaccine development. We developed highly sensitive SARS-CoV-2-specific antibody and neutralization assays. SARS-CoV-2 Spike protein or Nucleocapsid protein specific IgG antibodies at titers more than 1:100,000 were detectable in all PCR+ subjects (n = 115) and were absent in the negative controls. Other isotype antibodies (IgA, IgG1-4) were also detected. SARS-CoV-2 neutralization was determined in COVID-19 and convalescent plasma at up to 10,000-fold dilution, using Spike protein pseudotyped lentiviruses, which were also blocked by neutralizing antibodies (NAbs). Hospitalized patients had up to 3000-fold higher antibody and neutralization titers compared to outpatients or convalescent plasma donors. Interestingly, some COVID-19 patients also possessed NAbs against SARS-CoV Spike protein pseudovirus. Together these results demonstrate the high specificity and sensitivity of our assays, which may impact understanding the quality or duration of the antibody response during COVID-19 and in determining the effectiveness of potential vaccines.


Energies ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4392
Author(s):  
Jia Zhou ◽  
Hany Abdel-Khalik ◽  
Paul Talbot ◽  
Cristian Rabiti

This manuscript develops a workflow, driven by data analytics algorithms, to support the optimization of the economic performance of an Integrated Energy System. The goal is to determine the optimum mix of capacities from a set of different energy producers (e.g., nuclear, gas, wind and solar). A stochastic-based optimizer is employed, based on Gaussian Process Modeling, which requires numerous samples for its training. Each sample represents a time series describing the demand, load, or other operational and economic profiles for various types of energy producers. These samples are synthetically generated using a reduced order modeling algorithm that reads a limited set of historical data, such as demand and load data from past years. Numerous data analysis methods are employed to construct the reduced order models, including, for example, the Auto Regressive Moving Average, Fourier series decomposition, and the peak detection algorithm. All these algorithms are designed to detrend the data and extract features that can be employed to generate synthetic time histories that preserve the statistical properties of the original limited historical data. The optimization cost function is based on an economic model that assesses the effective cost of energy based on two figures of merit: the specific cash flow stream for each energy producer and the total Net Present Value. An initial guess for the optimal capacities is obtained using the screening curve method. The results of the Gaussian Process model-based optimization are assessed using an exhaustive Monte Carlo search, with the results indicating reasonable optimization results. The workflow has been implemented inside the Idaho National Laboratory’s Risk Analysis and Virtual Environment (RAVEN) framework. The main contribution of this study addresses several challenges in the current optimization methods of the energy portfolios in IES: First, the feasibility of generating the synthetic time series of the periodic peak data; Second, the computational burden of the conventional stochastic optimization of the energy portfolio, associated with the need for repeated executions of system models; Third, the inadequacies of previous studies in terms of the comparisons of the impact of the economic parameters. The proposed workflow can provide a scientifically defendable strategy to support decision-making in the electricity market and to help energy distributors develop a better understanding of the performance of integrated energy systems.


2021 ◽  
Vol 147 (2) ◽  
pp. AB152
Author(s):  
Crystal Richardson ◽  
Mayuresh Abhyankar ◽  
Jillian Bracaglia ◽  
Sayeh Agah ◽  
Zachary Schuhmacher ◽  
...  

Author(s):  
Daniel Blatter ◽  
Anandaroop Ray ◽  
Kerry Key

Summary Bayesian inversion of electromagnetic data produces crucial uncertainty information on inferred subsurface resistivity. Due to their high computational cost, however, Bayesian inverse methods have largely been restricted to computationally expedient 1D resistivity models. In this study, we successfully demonstrate, for the first time, a fully 2D, trans-dimensional Bayesian inversion of magnetotelluric data. We render this problem tractable from a computational standpoint by using a stochastic interpolation algorithm known as a Gaussian process to achieve a parsimonious parametrization of the model vis-a-vis the dense parameter grids used in numerical forward modeling codes. The Gaussian process links a trans-dimensional, parallel tempered Markov chain Monte Carlo sampler, which explores the parsimonious model space, to MARE2DEM, an adaptive finite element forward solver. MARE2DEM computes the model response using a dense parameter mesh with resistivity assigned via the Gaussian process model. We demonstrate the new trans-dimensional Gaussian process sampler by inverting both synthetic and field magnetotelluric data for 2D models of electrical resistivity, with the field data example converging within 10 days on 148 cores, a non-negligible but tractable computational cost. For a field data inversion, our algorithm achieves a parameter reduction of over 32x compared to the fixed parameter grid used for the MARE2DEM regularized inversion. Resistivity probability distributions computed from the ensemble of models produced by the inversion yield credible intervals and interquartile plots that quantitatively show the non-linear 2D uncertainty in model structure. This uncertainty could then be propagated to other physical properties that impact resistivity including bulk composition, porosity and pore-fluid content.


Science ◽  
2021 ◽  
Vol 371 (6526) ◽  
pp. 284-288 ◽  
Author(s):  
Brian Hie ◽  
Ellen D. Zhong ◽  
Bonnie Berger ◽  
Bryan Bryson

The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence’s grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution.


2011 ◽  
Vol 328-330 ◽  
pp. 524-529
Author(s):  
Jun Yan Ma ◽  
Xiao Ping Liao ◽  
Wei Xia ◽  
Xue Lian Yan

As a powerful modeling tool, Gaussian process (GP) employs a Bayesian statistics approach and adopts a highly nonlinear regression technique for general scientific and engineering tasks. In the first step of constructing Gaussian process model is to estimate the best value of the hyperparameter which turned to be used in the second step where a satisfactory nonlinear model was fitted. In this paper, a modified Wolfe line search approach for hyperparameters estimation by maximizing the marginal likelihood based on conjugate gradient method is proposed. And then we analyze parameter correlation according to the value of hyperparameters to control the warpage which is a main defect for a thin shell structure part in injection molding.


Sign in / Sign up

Export Citation Format

Share Document