Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance

<div>The generated database GDB17 enumerates 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogens following simple chemical stability and synthetic feasibility rules, however medicinal chemistry criteria are not taken into account. Here we applied rules inspired by medicinal chemistry to exclude problematic functional groups and complex molecules from GDB17, and sampled the resulting subset evenly across molecular size, stereochemistry and polarity to form GDBMedChem as a compact collection of 10 million small molecules.</div><div><br></div><div>This collection has reduced complexity and better synthetic accessibility than the entire GDB17 but retains higher sp 3 - carbon fraction and natural product likeness scores compared to known drugs. GDBMedChem molecules are more diverse and very different from known molecules in terms of substructures and represent an unprecedented source of diversity for drug design. GDBMedChem is available for 3D-visualization, similarity searching and for download at http://gdb.unibe.ch.</div>

Download Full-text

A Similarity Searching System for Biological Phenotype Images Using Deep Convolutional Encoder-decoder Architecture

Current Bioinformatics ◽

10.2174/1574893614666190204150109 ◽

2019 ◽

Vol 14 (7) ◽

pp. 628-639 ◽

Cited By ~ 10

Author(s):

Bizhi Wu ◽

Hangxiao Zhang ◽

Limei Lin ◽

Huiyuan Wang ◽

Yubang Gao ◽

...

Keyword(s):

Neural Network ◽

Retrieval System ◽

Sequence Similarity ◽

Local Alignment ◽

Similarity Searching ◽

Loss Of Function ◽

Biological Images ◽

The Neural Network ◽

Convolutional Autoencoder ◽

Biological Phenotype

Background: The BLAST (Basic Local Alignment Search Tool) algorithm has been widely used for sequence similarity searching. Analogously, the public phenotype images must be efficiently retrieved using biological images as queries and identify the phenotype with high similarity. Due to the accumulation of genotype-phenotype-mapping data, a system of searching for similar phenotypes is not available due to the bottleneck of image processing. Objective: In this study, we focus on the identification of similar query phenotypic images by searching the biological phenotype database, including information about loss-of-function and gain-of-function. Methods: We propose a deep convolutional autoencoder architecture to segment the biological phenotypic images and develop a phenotype retrieval system to enable a better understanding of genotype–phenotype correlation. Results: This study shows how deep convolutional autoencoder architecture can be trained on images from biological phenotypes to achieve state-of-the-art performance in a phenotypic images retrieval system. Conclusion: Taken together, the phenotype analysis system can provide further information on the correlation between genotype and phenotype. Additionally, it is obvious that the neural network model of image segmentation and the phenotype retrieval system is equally suitable for any species, which has enough phenotype images to train the neural network.

Download Full-text

LINGO-DL: a text-based approach for molecular similarity searching

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-021-00383-9 ◽

2021 ◽

Author(s):

Ammar Abdo ◽

Maude Pupin

Keyword(s):

Molecular Similarity ◽

Similarity Searching

Download Full-text

Group Fusion Among Wild Toque Macaques: an Extreme Case of Inter-Group Resource Competition

Behaviour ◽

10.1163/156853987x00152 ◽

1987 ◽

Vol 100 (1-4) ◽

pp. 247-289 ◽

Cited By ~ 16

Author(s):

Wolfgang P.J. Dittus

Keyword(s):

Resource Competition ◽

Extreme Case ◽

Group Fusion

Download Full-text

Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

Nucleic Acids Research ◽

10.1093/nar/gkw1207 ◽

2016 ◽

Vol 45 (7) ◽

pp. e46-e46 ◽

Cited By ~ 9

Author(s):

William R. Pearson ◽

Weizhong Li ◽

Rodrigo Lopez

Keyword(s):

Sequence Similarity ◽

Iterative Sequence ◽

Similarity Searching

Download Full-text

Similarity Searching in Files of Three-Dimensional Structures: Evaluation of Similarity Coefficients and Standardisation Methods for Field-Based Similarity Searching

SAR and QSAR in Environmental Research ◽

10.1080/10629369508233998 ◽

1995 ◽

Vol 3 (2) ◽

pp. 101-130 ◽

Cited By ~ 17

Author(s):

D. B. Turner ◽

P. Willett ◽

A. M. Ferguson ◽

T. W. Heritage

Keyword(s):

Three Dimensional ◽

Similarity Searching ◽

Similarity Coefficients

Download Full-text

Application of 3D Zernike descriptors to shape-based ligand similarity searching

Journal of Cheminformatics ◽

10.1186/1758-2946-1-19 ◽

2009 ◽

Vol 1 (1) ◽

Cited By ~ 36

Author(s):

Vishwesh Venkatraman ◽

Padmasini Ramji Chakravarthy ◽

Daisuke Kihara

Keyword(s):

Similarity Searching ◽

Ligand Similarity

Download Full-text

Protein domain identification and improved sequence similarity searching using PSI-BLAST

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.10175 ◽

2002 ◽

Vol 48 (4) ◽

pp. 672-681 ◽

Cited By ~ 37

Author(s):

Richard A. George ◽

Jaap Heringa

Keyword(s):

Sequence Similarity ◽

Protein Domain ◽

Similarity Searching ◽

Domain Identification

Download Full-text

Mapping the Space of Chemical Reactions using Attention-Based Neural Networks

10.26434/chemrxiv.9897365.v3 ◽

2020 ◽

Author(s):

Philippe Schwaller ◽

Daniel Probst ◽

Alain C. Vaucher ◽

Vishnu H Nair ◽

David Kreutter ◽

...

Keyword(s):

Neural Networks ◽

Reaction Center ◽

Chemical Reactions ◽

Chemical Reaction ◽

Classification Accuracy ◽

Similarity Searching ◽

Reaction Space ◽

Fine Grained ◽

Visual Clustering ◽

Better Than

<div><div><div><p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. Reaction classes facilitate communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center and the distinction between reactants and reagents. In this work, we show that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints which capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The unprecedented insights into chemical reaction space enabled by our learned fingerprints is illustrated by an interactive reaction atlas providing visual clustering and similarity searching. </p><p><br></p><p>Code: https://github.com/rxn4chemistry/rxnfp</p><p>Tutorials: https://rxn4chemistry.github.io/rxnfp/</p><p>Interactive reaction atlas: https://rxn4chemistry.github.io/rxnfp//tmaps/tmap_ft_10k.html</p></div></div></div>

Download Full-text

Activity-relevant similarity values for fingerprints and implications for similarity searching

F1000Research ◽

10.12688/f1000research.8357.2 ◽

2016 ◽

Vol 5 ◽

pp. 591 ◽

Cited By ~ 11

Author(s):

Swarit Jasial ◽

Ye Hu ◽

Martin Vogt ◽

Jürgen Bajorath

Keyword(s):

Characteristic Feature ◽

Success Rate ◽

Similarity Search ◽

Unsolved Problem ◽

Biological Activities ◽

Similarity Searching ◽

Search Performance ◽

General Activity ◽

Active Compounds ◽

Scientific Foundation

A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance.

Download Full-text