tanimoto similarity
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 16)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Jiazhen He ◽  
Eva Nittinger ◽  
Christian Tyrchan ◽  
Werngard Czechtizky ◽  
Atanas Patronov ◽  
...  

Abstract Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a typical and widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of solutions. There are more options to modify a starting molecule to achieve desirable properties, e.g. one can simultaneously modify the molecule at different places including changing the scaffold. This study trains the same Transformer architecture on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general transformations are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while keeping the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.


2021 ◽  
Author(s):  
Jiazhen He ◽  
Eva Nittinger ◽  
Christian Tyrchan ◽  
Werngard Czechtizky ◽  
Atanas Patronov ◽  
...  

Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a typical and widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of solutions. There are more options to modify a starting molecule to achieve desirable properties, e.g. one can simultaneously modify the molecule at different places including changing the scaffold. This study trains the same Transformer architecture on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general transformations are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while keeping the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.


2021 ◽  
Author(s):  
Hemn Barzan Abdalla

Abstract The increasing demand for information and rapid growth of big data has dramatically increased textual data. The amount of different kinds of data has led to the overloading of information. For obtaining useful text information, the classification of texts is considered an imperative task. This paper develops a technique for text classification in big data using the MapReduce model. The goal is to design a hybrid optimization algorithm for classifying the text. Here, the pre-processing is done with the steaming process and stop word removal. In addition, the Extraction of imperative features is performed wherein SentiWordNet features, contextual features, and thematic features are generated. Furthermore, the selection of optimal features is performed using Tanimoto similarity. The Tanimoto similarity method estimates the similarity between the features and selects the relevant features with higher feature selection accuracy. After that, a deep residual network is utilized for dynamic text classification. The Adam algorithm trains the deep residual network. In addition, the dynamic learning is performed with the proposed Rider invasive weed optimization (RIWO)-based deep residual network along with fuzzy theory. The proposed RIWO algorithm combines Invasive weed optimization (IWO) and the Rider optimization algorithm (ROA). The method mentioned above is solved under the MapReduce framework. The proposed RIWO-based deep residual network outperformed other techniques with the highest True positive rate (TPR) of 85%, True negative rate (TNR) of 94%, and accuracy of 88.7%.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Kohulan Rajan ◽  
Achim Zielesny ◽  
Christoph Steinbeck

AbstractChemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds.


Marine Drugs ◽  
2020 ◽  
Vol 18 (11) ◽  
pp. 582
Author(s):  
Steve O’Hagan ◽  
Douglas B. Kell

It is known that at least some fluorophores can act as ‘surrogate’ substrates for solute carriers (SLCs) involved in pharmaceutical drug uptake, and this promiscuity is taken to reflect at least a certain structural similarity. As part of a comprehensive study seeking the ‘natural’ substrates of ‘orphan’ transporters that also serve to take up pharmaceutical drugs into cells, we have noted that many drugs bear structural similarities to natural products. A cursory inspection of common fluorophores indicates that they too are surprisingly ‘drug-like’, and they also enter at least some cells. Some are also known to be substrates of efflux transporters. Consequently, we sought to assess the structural similarity of common fluorophores to marketed drugs, endogenous mammalian metabolites, and natural products. We used a set of some 150 fluorophores along with standard fingerprinting methods and the Tanimoto similarity metric. Results: The great majority of fluorophores tested exhibited significant similarity (Tanimoto similarity > 0.75) to at least one drug, as judged via descriptor properties (especially their aromaticity, for identifiable reasons that we explain), by molecular fingerprints, by visual inspection, and via the “quantitative estimate of drug likeness” technique. It is concluded that this set of fluorophores does overlap with a significant part of both the drug space and natural products space. Consequently, fluorophores do indeed offer a much wider opportunity than had possibly been realised to be used as surrogate uptake molecules in the competitive or trans-stimulation assay of membrane transporter activities.


Coronaviruses ◽  
2020 ◽  
Vol 01 ◽  
Author(s):  
Abhishek Sengupta ◽  
Pooja Vijayaraghavan ◽  
Priyansh Srivastava ◽  
Lovely Gupta ◽  
Chaitanya Chandwani ◽  
...  

Background: Several therapeutic possibilities have been explored against Severe Acute Respiratory Syndrome-2 (SARS-CoV-2), such as convalescent plasma (CP), intravenous immunoglobulin (IVIG) and monoclonal antibodies. Compounds such as hydroxychloroquine have also been found to have fatal drawbacks. Repurposing of existing antiviral drugs can be an effective strategy, which could fasten up the process of drug discovery. Objective: The present study is designed to predict the computational efficacy of pre-existing antiviral drugs as inhibitors for the Nsp10-Nsp16 complex protein of SARS-CoV-2. Method: Twenty-six known antiviral drugs along with their similar structures based on Tanimoto similarity were screened towards Nsp10-Nsp16 complex’s active site. Result: Our study reports competitive binding of 1-[3-[2-(2-Ethoxyphenoxy) ethylamino]-2-hydroxypropyl] -9H-carbazol-4- ol against AdoMet binding site in Nsp10-Nsp16 complex. Formation of the stable ligand-receptor complex with 1-[3-[2-(2- Ethoxyphenoxy) ethylamino]-2-hydroxypropyl] -9H-carbazol-4-ol could functionally inhibit the Nsp10-Nsp16 complex, thereby making the SARS-CoV-2 vulnerable to host immuno-surveillance mechanisms. Conclusion: We conclude that these computational hits can display positive results in in-vitro trials against SARS-CoV-2.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 100
Author(s):  
Martin Vogt ◽  
Jürgen Bajorath

The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.


2020 ◽  
Author(s):  
Sanghyeok Lee ◽  
Sangjin Ahn ◽  
Mi-hyun Kim

Abstract Background: 3D similarity is useful to predict the profiles of unprecedented molecular frameworks, 2D dissimilar to known compounds. Basically, when comparing compound pairs, 3D similarity of the pairs depends on conformational sampling of compounds, alignment method, chosen descriptors, and metric to show limited discriminative power. In addition to four factors, 3D chemocentric target prediction of an unknown compound requires compound - target associations. The associations for the target prediction replace compound-to-compound comparison with compound-to-target comparison. Results: Quantitative comparison of query compounds to target classes (one-to-group) could be acquired using two type similarity distributions: one is from maximum likelihood (ML) estimation of queries and another is from Gaussian mixture model (GMM) of target classes. While Jaccard-Tanimoto similarity of query-to-ligand pairs could be transformed into query distribution through ML estimation, the similarity of ligand pairs within each target class could be transformed into the representative distribution of a target class through GMM, hyperparameterized through expectation-maximization (EM) algorithm. To quantify the discriminativeness of a query ligand against target classes, Kullback-Leibler (K-L) divergence was calculated between two distributions.Conclusions: Stratified sampled 14K ligands from four target classes, estrogen receptor alpha (ESR), vitamin D receptor (VDR), cyclooxygenase-2 (COX2), and cathepsin D (CTSD) presented whether or not each query can be a representative ligand of each target through compared K-L divergence value. The feasibility index, Fm and the probability, from K-L divergence could summarize 3D chemocentric relationship between target classes.


Sign in / Sign up

Export Citation Format

Share Document