VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder

Molecular similarity is an elusive but core “unsupervised” cheminformatics concept, yet different “fingerprint” encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are “better” than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a “bowtie”-shaped artificial neural network. In the middle is a “bottleneck layer” or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.

Download Full-text

VAE-Sim: a novel molecular similarity measure based on a variational autoencoder

10.1101/2020.06.26.172908 ◽

2020 ◽

Author(s):

Soumitra Samanta ◽

Steve O’Hagan ◽

Neil Swainston ◽

Timothy J. Roberts ◽

Douglas B. Kell

Keyword(s):

Molecular Similarity ◽

A Priori ◽

Molecular Structures ◽

Maximum Common Substructure ◽

Latent Vector ◽

Novel Approach ◽

Variational Autoencoder ◽

Common Substructure ◽

Similarity Problem ◽

Better Than

AbstractMolecular similarity is an elusive but core ‘unsupervised’ cheminformatics concept, yet different ‘fingerprint’ encodings of molecular structures return very different similarity values even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none is ‘better’ than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a ‘bowtie’-shaped artificial neural network. In the middle is a ‘bottleneck layer’ or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over 6 million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.

Download Full-text

From Isovist to Spatial Perception: Wayfinding in Historic Quarter

Environment-Behaviour Proceedings Journal ◽

10.21834/e-bpj.v1i3.317 ◽

2016 ◽

Vol 1 (3) ◽

Author(s):

Chih-Hung Chen ◽

Ting-Ju Lin ◽

Chih-Yu Chen

Keyword(s):

Social Interactions ◽

Spatial Perception ◽

Publishing House ◽

A Priori ◽

Relative Area ◽

Social Stimuli ◽

Historical District ◽

International Publishing ◽

A Priori Analysis ◽

Better Than

Based on the assumption that human behaviours are mainly affected by physical and animate environments, this empirical research takes the changeful and complex historical district in Tainan to observe wayfinding behaviours. An a priori analysis of the isovist fields is conducted to identify spatial characteristics. Three measures, the relative area, convexity, and circularity, are applied to scrutinize the possible stopping points, change of speed, and route choices. Accordingly, an experiment is carried out to observe spatial behaviours and different influences of social stimuli. Results show that social interactions afford groups and pairs to perform better than individual observers in wayfinding.© 2016. The Authors. Published for AMER ABRA by e-International Publishing House, Ltd., UK. Peer–review under responsibility of AMER (Association of Malaysian Environment-Behaviour Researchers), ABRA (Association of Behavioural Researchers on Asians) and cE-Bs (Centre for Environment-Behaviour Studies, Faculty of Architecture, Planning & Surveying, Universiti Teknologi MARA, MalaysiaKeywords: wayfinding; isovist; spatial perception and social stimuli; historic quarter

Download Full-text

Deep Learning Methods for Classification of Certain Abnormalities in Echocardiography

Electronics ◽

10.3390/electronics10040495 ◽

2021 ◽

Vol 10 (4) ◽

pp. 495

Author(s):

Imayanmosha Wahlang ◽

Arnab Kumar Maji ◽

Goutam Saha ◽

Prasun Chakrabarti ◽

Michal Jasinski ◽

...

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Support Vector ◽

Variational Autoencoder ◽

Different Types ◽

Static Images ◽

Long Short Term Memory ◽

2D And 3D ◽

Better Than

This article experiments with deep learning methodologies in echocardiogram (echo), a promising and vigorously researched technique in the preponderance field. This paper involves two different kinds of classification in the echo. Firstly, classification into normal (absence of abnormalities) or abnormal (presence of abnormalities) has been done, using 2D echo images, 3D Doppler images, and videographic images. Secondly, based on different types of regurgitation, namely, Mitral Regurgitation (MR), Aortic Regurgitation (AR), Tricuspid Regurgitation (TR), and a combination of the three types of regurgitation are classified using videographic echo images. Two deep-learning methodologies are used for these purposes, a Recurrent Neural Network (RNN) based methodology (Long Short Term Memory (LSTM)) and an Autoencoder based methodology (Variational AutoEncoder (VAE)). The use of videographic images distinguished this work from the existing work using SVM (Support Vector Machine) and also application of deep-learning methodologies is the first of many in this particular field. It was found that deep-learning methodologies perform better than SVM methodology in normal or abnormal classification. Overall, VAE performs better in 2D and 3D Doppler images (static images) while LSTM performs better in the case of videographic images.

Download Full-text

Enhancing quantum annealing performance for the molecular similarity problem

Quantum Information Processing ◽

10.1007/s11128-017-1586-y ◽

2017 ◽

Vol 16 (5) ◽

Cited By ~ 15

Author(s):

Maritza Hernandez ◽

Maliheh Aramon

Keyword(s):

Molecular Similarity ◽

Quantum Annealing ◽

Similarity Problem

Download Full-text

Sensorimotor priors in nonstationary environments

Journal of Neurophysiology ◽

10.1152/jn.00605.2012 ◽

2013 ◽

Vol 109 (5) ◽

pp. 1259-1267 ◽

Cited By ~ 15

Author(s):

Devika Narain ◽

Robert J. van Beers ◽

Jeroen B. J. Smeets ◽

Eli Brenner

Keyword(s):

Environmental Variables ◽

A Priori ◽

Past Research ◽

Systematic Change ◽

Recent Experience ◽

Environmental Variance ◽

Adaptive Models ◽

Priori Information ◽

Human Nervous System ◽

Better Than

In the course of its interaction with the world, the human nervous system must constantly estimate various variables in the surrounding environment. Past research indicates that environmental variables may be represented as probabilistic distributions of a priori information (priors). Priors for environmental variables that do not change much over time have been widely studied. Little is known, however, about how priors develop in environments with nonstationary statistics. We examine whether humans change their reliance on the prior based on recent changes in environmental variance. Through experimentation, we obtain an online estimate of the human sensorimotor prior (prediction) and then compare it to similar online predictions made by various nonadaptive and adaptive models. Simulations show that models that rapidly adapt to nonstationary components in the environments predict the stimuli better than models that do not take the changing statistics of the environment into consideration. We found that adaptive models best predict participants' responses in most cases. However, we find no support for the idea that this is a consequence of increased reliance on recent experience just after the occurrence of a systematic change in the environment.

Download Full-text

A regional CO2 observing system simulation experiment for the ASCENDS satellite mission

Atmospheric Chemistry and Physics ◽

10.5194/acp-14-12897-2014 ◽

2014 ◽

Vol 14 (23) ◽

pp. 12897-12914 ◽

Cited By ~ 5

Author(s):

J. S. Wang ◽

S. R. Kawa ◽

J. Eluszkiewicz ◽

D. F. Baker ◽

M. Mountain ◽

...

Keyword(s):

Measurement Errors ◽

Dispersion Model ◽

A Priori ◽

Wrf Model ◽

Particle Dispersion ◽

Instrument Design ◽

Satellite Mission ◽

Model Framework ◽

Satellite Systems ◽

Novel Approach

Abstract. Top–down estimates of the spatiotemporal variations in emissions and uptake of CO2 will benefit from the increasing measurement density brought by recent and future additions to the suite of in situ and remote CO2 measurement platforms. In particular, the planned NASA Active Sensing of CO2 Emissions over Nights, Days, and Seasons (ASCENDS) satellite mission will provide greater coverage in cloudy regions, at high latitudes, and at night than passive satellite systems, as well as high precision and accuracy. In a novel approach to quantifying the ability of satellite column measurements to constrain CO2 fluxes, we use a portable library of footprints (surface influence functions) generated by the Stochastic Time-Inverted Lagrangian Transport (STILT) model in combination with the Weather Research and Forecasting (WRF) model in a regional Bayesian synthesis inversion. The regional Lagrangian particle dispersion model framework is well suited to make use of ASCENDS observations to constrain weekly fluxes in North America at a high resolution, in this case at 1° latitude × 1° longitude. We consider random measurement errors only, modeled as a function of the mission and instrument design specifications along with realistic atmospheric and surface conditions. We find that the ASCENDS observations could potentially reduce flux uncertainties substantially at biome and finer scales. At the grid scale and weekly resolution, the largest uncertainty reductions, on the order of 50%, occur where and when there is good coverage by observations with low measurement errors and the a priori uncertainties are large. Uncertainty reductions are smaller for a 1.57 μm candidate wavelength than for a 2.05 μm wavelength, and are smaller for the higher of the two measurement error levels that we consider (1.0 ppm vs. 0.5 ppm clear-sky error at Railroad Valley, Nevada). Uncertainty reductions at the annual biome scale range from ~40% to ~75% across our four instrument design cases and from ~65% to ~85% for the continent as a whole. Tests suggest that the quantitative results are moderately sensitive to assumptions regarding a priori uncertainties and boundary conditions. The a posteriori flux uncertainties we obtain, ranging from 0.01 to 0.06 Pg C yr−1 across the biomes, would meet requirements for improved understanding of long-term carbon sinks suggested by a previous study.

Download Full-text

Monitoring distributed computing beyond the traditional time-series histogram

EPJ Web of Conferences ◽

10.1051/epjconf/202024503036 ◽

2020 ◽

Vol 245 ◽

pp. 03036

Author(s):

M S Doidge ◽

P. A. Love ◽

J Thornton

Keyword(s):

Time Series ◽

Distributed Computing ◽

Real Time ◽

A Priori ◽

Time Data ◽

Use Of Time ◽

Computing Services ◽

Monitoring Tools ◽

Novel Approach ◽

Current Monitoring

In this work we describe a novel approach to monitor the operation of distributed computing services. Current monitoring tools are dominated by the use of time-series histograms showing the evolution of various metrics. These can quickly overwhelm or confuse the viewer due to the large number of similar looking graphs. We propose a supplementary approach through the sonification of real-time data streamed directly from a variety of distributed computing services. The real-time nature of this method allows operations staff to quickly detect problems and identify that a problem is still ongoing, avoiding the case of investigating an issue a-priori when it may already have been resolved. In this paper we present details of the system architecture and provide a recipe for deployment suitable for both site and experiment teams.

Download Full-text

Dissimilar Ligands Bind in a Similar Fashion: A Guide to Ligand Binding-Mode Prediction with Application to CELPP Studies

International Journal of Molecular Sciences ◽

10.3390/ijms222212320 ◽

2021 ◽

Vol 22 (22) ◽

pp. 12320

Author(s):

Xianjin Xu ◽

Xiaoqin Zou

Keyword(s):

Molecular Similarity ◽

Complex Structure ◽

Binding Mode ◽

Molecular Structures ◽

Complex Structures ◽

Binding Modes ◽

Ligand Complex ◽

Systematic Analysis ◽

Binding Mode Prediction ◽

Similarity Principle

The molecular similarity principle has achieved great successes in the field of drug design/discovery. Existing studies have focused on similar ligands, while the behaviors of dissimilar ligands remain unknown. In this study, we developed an intercomparison strategy in order to compare the binding modes of ligands with different molecular structures. A systematic analysis of a newly constructed protein–ligand complex structure dataset showed that ligands with similar structures tended to share a similar binding mode, which is consistent with the Molecular Similarity Principle. More importantly, the results revealed that dissimilar ligands can also bind in a similar fashion. This finding may open another avenue for drug discovery. Furthermore, a template-guiding method was introduced for predicting protein–ligand complex structures. With the use of dissimilar ligands as templates, our method significantly outperformed the traditional molecular docking methods. The newly developed template-guiding method was further applied to recent CELPP studies.

Download Full-text

Design of chemical libraries with potentially bioactive molecules applying a maximum common substructure concept

Molecular Diversity ◽

10.1007/s11030-009-9187-z ◽

2009 ◽

Vol 14 (2) ◽

pp. 401-408 ◽

Cited By ~ 47

Author(s):

Michael Lisurek ◽

Bernd Rupp ◽

Jörg Wichard ◽

Martin Neuenschwander ◽

Jens Peter von Kries ◽

...

Keyword(s):

Bioactive Molecules ◽

Chemical Libraries ◽

Maximum Common Substructure ◽

Common Substructure

Download Full-text

Comparative Study of the Grid-Scale and Subgrid-Scale Velocity Fields - A Priori Test

GANIT Journal of Bangladesh Mathematical Society ◽

10.3329/ganit.v30i0.8499 ◽

1970 ◽

Vol 30 ◽

pp. 19-31

Author(s):

M Ashraf Uddin ◽

M Matiar Rahman ◽

M Saiful Islam Mallik

Keyword(s):

Isotropic Turbulence ◽

A Priori ◽

Velocity Fields ◽

Subgrid Scale ◽

Homogeneous Isotropic Turbulence ◽

Eddy Simulation ◽

Large Eddy ◽

Spatial Spectra ◽

Sharp Cutoff ◽

Better Than

Generation of grid-scale (GS) and subgrid-scale (SGS) velocity fields is performed by direct filtering of DNS (Direct Numerical Simulation) data at a low Reynolds number in homogeneous isotropic turbulence in order to assess the spectral accuracy as well as the performance of filter functions for LES (Large Eddy Simulation). The filtering is performed using three classical filter functions: Gaussian, Tophat and Sharp cutoff filters and in all three cases the results are compared with three different filter widths for LES. Comparing the distributions of GS and SGS velocities, and the decay of turbulence with those from DNS fields through out the whole calculation we have found that among the three filter functions, the performance of Sharp cutoff filter is better than that of the other two filter functions in terms of both spatial spectra and the distribution of velocities. Furthermore, it is shown that the accuracy of the filtering approach does not depend only on the filter functions but also on the filter widths for LES. GANIT J. Bangladesh Math. Soc. (ISSN 1606-3694) 30 (2010) 19-31 DOI: http://dx.doi.org/10.3329/ganit.v30i0.8499

Download Full-text