One Molecular Fingerprint to Rule them All: Drugs, Biomolecules, and the Metabolome

10.26434/chemrxiv.11994630.v1 ◽

2020 ◽

Author(s):

Alice Capecchi ◽

Daniel Probst ◽

Jean-Louis Reymond

Keyword(s):

Small Molecules ◽

Similarity Search ◽

Nearest Neighbor ◽

Chemical Space ◽

Source Code ◽

Atom Pair ◽

Molecular Fingerprint ◽

Molecular Fingerprints ◽

Large Molecules ◽

Different Types

Background: Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules. Results: Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii of r = 1 and r = 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70 % of which are indistinguishable from their nearest neighbor using substructure fingerprints. Conclusion: MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available at <a href="https://github.com/reymond-group/map4">https://github.com/reymond-group/map4</a> and interactive MAP4 similarity search tools and TMAPs for various databases are accessible at <a href="http://map-search.gdb.tools/">http://map-search.gdb.tools/</a> and <a href="http://tm.gdb.tools/map4/">http://tm.gdb.tools/map4/</a>.<a href="http://tm.gdb.tools/map4/"></a>

Download Full-text

grünifai: interactive multiparameter optimization of molecules in a continuous vector space

Bioinformatics ◽

10.1093/bioinformatics/btaa271 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4093-4094

Author(s):

Robin Winter ◽

Joren Retel ◽

Frank Noé ◽

Djork-Arné Clevert ◽

Andreas Steffen

Keyword(s):

Small Molecules ◽

In Silico ◽

Particle Swarm Optimization Algorithm ◽

Chemical Space ◽

Source Code ◽

Optimization Method ◽

Swarm Optimization ◽

Multiparameter Optimization ◽

In Silico Models ◽

Discovery Project

Abstract Summary Optimizing small molecules in a drug discovery project is a notoriously difficult task as multiple molecular properties have to be considered and balanced at the same time. In this work, we present our novel interactive in silico compound optimization platform termed grünifai to support the ideation of the next generation of compounds under the constraints of a multiparameter objective. grünifai integrates adjustable in silico models, a continuous representation of the chemical space, a scalable particle swarm optimization algorithm and the possibility to actively steer the compound optimization through providing feedback on generated intermediate structures. Availability and implementation Source code and documentation are freely available under an MIT license and are openly available on GitHub (https://github.com/jrwnter/gruenifai). The backend, including the optimization method and distribution on multiple GPU nodes is written in Python 3. The frontend is written in ReactJS.

Download Full-text

A Probabilistic Molecular Fingerprint for Big Data Settings

10.26434/chemrxiv.7176350.v1 ◽

2018 ◽

Author(s):

Daniel Probst ◽

Jean-Louis Reymond

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Molecular Fingerprint ◽

Molecular Fingerprints ◽

Approximate Nearest Neighbor ◽

Neighbor Search ◽

Large Databases ◽

Nearest Neighbor Searches ◽

Extended Connectivity

Background: Among the various molecular fingerprints available to describe small organic molecules, ECFP4 (extended connectivity fingerprint, up to four bonds) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥1,024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality. <a></a><a></a> Results: Herein we report a new fingerprint, called MHFP6 (MinHash fingerprint, up to six bonds), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. Furthermore, MHFP6 outperforms ECFP4 in approximate nearest neighbor searches by two orders of magnitude in terms of speed, while decreasing the error rate. Conclusion<a></a><a>: MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub (</a><a href="https://github.com/reymond-group/mhfp">https://github.com/reymond-group/mhfp</a>).<a></a>

Download Full-text

A Probabilistic Molecular Fingerprint for Big Data Settings

10.26434/chemrxiv.7176350 ◽

2018 ◽

Author(s):

Daniel Probst ◽

Jean-Louis Reymond

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Molecular Fingerprint ◽

Molecular Fingerprints ◽

Approximate Nearest Neighbor ◽

Neighbor Search ◽

Large Databases ◽

Nearest Neighbor Searches ◽

Extended Connectivity

Background: Among the various molecular fingerprints available to describe small organic molecules, ECFP4 (extended connectivity fingerprint, up to four bonds) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥1,024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality. <a></a><a></a> Results: Herein we report a new fingerprint, called MHFP6 (MinHash fingerprint, up to six bonds), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. Furthermore, MHFP6 outperforms ECFP4 in approximate nearest neighbor searches by two orders of magnitude in terms of speed, while decreasing the error rate. Conclusion<a></a><a>: MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub (</a><a href="https://github.com/reymond-group/mhfp">https://github.com/reymond-group/mhfp</a>).<a></a>

Download Full-text

SimilarityLab: Molecular Similarity for SAR Exploration and Target Prediction on the Web

Processes ◽

10.3390/pr9091520 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1520

Author(s):

Steven Shave ◽

Manfred Auer

Keyword(s):

Drug Discovery ◽

Chemical Synthesis ◽

Small Molecules ◽

Similarity Measure ◽

Molecular Similarity ◽

Chemical Space ◽

Source Code ◽

Target Prediction ◽

Structure Activity ◽

The Web

Exploration of chemical space around hit, experimental, and known active compounds is an important step in the early stages of drug discovery. In academia, where access to chemical synthesis efforts is restricted in comparison to the pharma-industry, hits from primary screens are typically followed up through purchase and testing of similar compounds, before further funding is sought to begin medicinal chemistry efforts. Rapid exploration of druglike similars and structure–activity relationship profiles can be achieved through our new webservice SimilarityLab. In addition to searching for commercially available molecules similar to a query compound, SimilarityLab also enables the search of compounds with recorded activities, generating consensus counts of activities, which enables target and off-target prediction. In contrast to other online offerings utilizing the USRCAT similarity measure, SimilarityLab’s set of commercially available small molecules is consistently updated, currently containing over 12.7 million unique small molecules, and not relying on published databases which may be many years out of date. This ensures researchers have access to up-to-date chemistries and synthetic processes enabling greater diversity and access to a wider area of commercial chemical space. All source code is available in the SimilarityLab source repository.

Download Full-text

PubChem and ChEMBL Beyond Lipinski

10.26434/chemrxiv.7650071.v1 ◽

2019 ◽

Author(s):

Jean-Louis Reymond ◽

Mahendra Awale ◽

Daniel Probst ◽

Alice Capecchi

Keyword(s):

Nearest Neighbor ◽

Molecular Shape ◽

Biological Properties ◽

Web Based ◽

Large Molecules ◽

Interactive Tools ◽

Small Molecule Drugs ◽

Pubchem Database ◽

Nearest Neighbor Searches ◽

Insight Into

Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.

Download Full-text

THE EXACT SOLUTION OF SOME LATTICE STATISTICS MODELS WITH FOUR STATES PER SITE

Canadian Journal of Physics ◽

10.1139/p64-142 ◽

1964 ◽

Vol 42 (8) ◽

pp. 1564-1572 ◽

Cited By ~ 32

Author(s):

D. D. Betts

Keyword(s):

Partition Function ◽

Nearest Neighbor ◽

Quaternary Alloy ◽

Interesting Property ◽

Two Dimensions ◽

Lattice Statistics ◽

Critical Temperatures ◽

Different Types ◽

Interacting Systems ◽

Statistical Mechanical

Statistical mechanical ensembles of interacting systems localized at the sites of a regular lattice and each having four possible states are considered. A set of lattice functions is introduced which permits a considerable simplification of the partition function for general nearest-neighbor interactions. The particular case of the Potts four-state ferromagnet model is solved exactly in two dimensions. The order–disorder problem for a certain quaternary alloy model is also solved exactly on a square net. The quaternary alloy model has the interesting property that it has two critical temperatures and exhibits two different types of long-range order. The partition function for the spin-3/2 Ising model on a square net is expressed in terms of graphs without odd vertices, but has not been solved exactly.

Download Full-text

Effect of combination of pectin substances on viscosity of their aqueous solutions

Proceedings of the Voronezh State University of Engineering Technologies ◽

10.20914/2310-1202-2019-2-133-138 ◽

2019 ◽

Vol 81 (2) ◽

pp. 133-138

Author(s):

Z. N. Khatko ◽

S. A. Titov ◽

A. A. Ashinova ◽

E. M. Kolodina

Keyword(s):

Internal Friction ◽

Aqueous Solutions ◽

Dynamic Viscosity ◽

Food Systems ◽

Large Molecules ◽

Related Substances ◽

Shear Loads ◽

Different Types ◽

The Given ◽

Pectin Substances

Viscosity is one of the characteristic properties of pectin substances, as well as other lyophilic colloids. Pectin molecules are easily associated with each other or with large molecules of related substances. This article contains the results of the study of dynamic viscosity, internal friction, thixotropic index of aqueous solutions (1 % and 4%) of various types of pectin substances and their combinations. The article presents the results of the study of the influence of different types of pectin substances and their combinations on the dynamic viscosity of pectin solutions and their internal friction. The analysis of values of dynamic viscosity and friction force depending on the type of pectin substances and their combinations is given. It is established that in cases where information on dissipative processes in pectin structures at low speeds and shear loads is required, it is necessary to rely on data on internal friction, in others - on the given information on their viscosity. The thixotropic index is calculated. It is established that the internal friction in pectin solutions and their dynamic viscosity depend on the type of pectin substances and their combinations. In pectin solutions, the internal friction is maximum for Apple pectin, and the dynamic viscosity – for a combination of citrus pectins with beet. When combining pectins, both indicators are most important for the combination of citrus with beet. The obtained data on the viscosity, internal friction and thixotropic index of solutions of different types and combinations of pectins make it possible to regulate the rheological properties of food systems with the addition of pectin substances.

Download Full-text

Peptide Similarity Search Based and Virtual Screening Based Strategies to Identify Small Molecules to Inhibit CarD–RNAP Interaction in M. tuberculosis

International Journal of Peptide Research and Therapeutics ◽

10.1007/s10989-018-9716-7 ◽

2018 ◽

Vol 25 (2) ◽

pp. 697-709 ◽

Cited By ~ 4

Author(s):

V. G. Shanmuga Priya ◽

Priya Swaminathan ◽

Uday M. Muddapur ◽

Prayagraj M. Fandilolu ◽

Rishikesh S. Parulekar ◽

...

Keyword(s):

Virtual Screening ◽

Small Molecules ◽

Similarity Search

Download Full-text

Rapid approach to complex boronic acids

Science Advances ◽

10.1126/sciadv.aaw4607 ◽

2019 ◽

Vol 5 (7) ◽

pp. eaaw4607 ◽

Cited By ~ 5

Author(s):

Constantinos G. Neochoritis ◽

Shabnam Shaabani ◽

Maryam Ahmadianmoghaddam ◽

Tryfon Zarganes-Tzitzikas ◽

Li Gao ◽

...

Keyword(s):

Small Molecules ◽

Boronic Acid ◽

Multicomponent Reactions ◽

Chemical Space ◽

Building Blocks ◽

Acid Synthesis ◽

Success Rates ◽

Boronic Acid Derivatives ◽

Low Coverage ◽

Target Synthesis

The compatibility of free boronic acid building blocks in multicomponent reactions to readily create large libraries of diverse and complex small molecules was investigated. Traditionally, boronic acid synthesis is sequential, synthetically demanding, and time-consuming, which leads to high target synthesis times and low coverage of the boronic acid chemical space. We have performed the synthesis of large libraries of boronic acid derivatives based on multiple chemistries and building blocks using acoustic dispensing technology. The synthesis was performed on a nanomole scale with high synthesis success rates. The discovery of a protease inhibitor underscores the usefulness of the approach. Our acoustic dispensing–enabled chemistry paves the way to highly accelerated synthesis and miniaturized reaction scouting, allowing access to unprecedented boronic acid libraries.

Download Full-text