Molecular Diversity Assessment using Chemotypes

Author(s):  
Hugo O. Villar ◽  
Raghav Mandayan ◽  
Mark R. Hansen

Background: Many techniques to design chemical libraries for screening have been put forward over time. General use libraries are still important when screening against novel targets and their design has relied on the use of molecular descriptors, while chemotype or scaffold analysis has been used less often. Objective: We describe a simple method to assess chemical diversity based on counts of the chemotypes that offers an alternative to model chemical diversity based on computed molecular properties. We show how chemotype counts can be used to evaluate the diversity of a library and compare diversity selection algorithms. We demonstrate an efficient compound selection algorithm based on chemotype analysis. Methods: We use automated chemotype perception algorithms and compare them to traditional techniques for diversity analysis to check their effectiveness in designing diverse libraries for screening. Results: The best type of molecular fingerprints for diversity selection in our analysis are extended circular fingerprints, but they can be outperformed by the use of a chemotype diversity algorithm, which can be more intuitive than traditional techniques based on molecular descriptors. Chemotype based algorithms retrieve a larger share of the chemotypes contained in a library when picking a subset of the chemicals in a collection. Conclusions: chemotype analysis offers an alternative for the generation of a general-purpose screening library as it maximizes the number of chemotypes present in a subset with the smallest number of compounds. The application of methods based on chemotype analysis that do not resort to the use of molecular descriptors are a very promising but seldom explored area of chemoinformatics.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sofia Kapsiani ◽  
Brendan J. Howlin

AbstractAgeing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds.


Author(s):  
Will Schreiber ◽  
John Kuo

Abstract The current paper describes a computer model designed to analyze the moisture transport in the unmelted, porous soil neighboring a convecting melt. The time-dependent fluid and heat flow in the soil melt is simulated implicitly using the SIMPLE method generalized to predict viscous fluid motion and heat transfer on boundary-fitted, non-orthogonal coordinates which adapt with time. TOUGH2, a general-purpose computer code for multiphase fluid and heat flow developed by K. Pruess at Lawrence Berkekey Laboratory, has been modified for use on time-adaptive, boundary-fitted coordinates to predict heat transfer, moisture and air transport, and pressure distribution in the porous, unmelted soil. The soil melt model is coupled with the modified TOUGH2 model via an interface (moving boundary) whose shape is determined implicitly with the progression of time. The computer model’s utility is demonstrated in the present study with a special two-dimensional study. A soil initially at 20°C and partially-saturated with either a 0.2 or 0.5 relative liquid saturation is contained in a box two meters wide by ten meters high with impermeable bottom and sides. The upper surface of the soil is exposed to a 20°C atmosphere to which vapor and air can escape. Computation begins when the soil, which melts at 1700°C, is heated from one side (maintained at constant temperatures ranging from 1700°C to 4000°C). Heat from the hot wall causes the melt to circulate in such a way that the melt interface grows more rapidly at the top of the box than at the bottom. As the upper portion of the melt approaches the impermeable wall it creates a bottle neck for moisture release from the soil’s lower regions. The pressure history of the trapped moisture is examined as a means for predicting the potential for moisture penetration into the melt. The melt’s interface movement and moisture transport in the unmelted, porous soil are also examined.


2019 ◽  
Vol 20 (17) ◽  
pp. 4106 ◽  
Author(s):  
Wang ◽  
Xiao ◽  
Chen ◽  
Wang

Drug-induced liver injury (DILI) is a major factor in the development of drugs and the safety of drugs. If the DILI cannot be effectively predicted during the development of the drug, it will cause the drug to be withdrawn from markets. Therefore, DILI is crucial at the early stages of drug research. This work presents a 2-class ensemble classifier model for predicting DILI, with 2D molecular descriptors and fingerprints on a dataset of 450 compounds. The purpose of our study is to investigate which are the key molecular fingerprints that may cause DILI risk, and then to obtain a reliable ensemble model to predict DILI risk with these key factors. Experimental results suggested that 8 molecular fingerprints are very critical for predicting DILI, and also obtained the best ratio of molecular fingerprints to molecular descriptors. The result of the 5-fold cross-validation of the ensemble vote classifier method obtain an accuracy of 77.25%, and the accuracy of the test set was 81.67%. This model could be used for drug‐induced liver injury prediction.


ChemMedChem ◽  
2014 ◽  
Vol 9 (10) ◽  
pp. 2309-2326 ◽  
Author(s):  
Dragos Horvath ◽  
Michael Lisurek ◽  
Bernd Rupp ◽  
Ronald Kühne ◽  
Edgar Specker ◽  
...  

1976 ◽  
Vol 3 (4) ◽  
pp. 449-452
Author(s):  
Janet M. Bradbury ◽  
Christine A. Oriel ◽  
F. T. W. Jordan

General purpose rubber stoppers proved to be a simple, rapid, and satisfactory method of transferring mycoplasma colonies from agar to microscope slides for immunofluorescence.


2019 ◽  
Author(s):  
Matheus Souza ◽  
Henrique Cota Freitas ◽  
Frédéric Pétrot

Due to their performance impact on program execution, cache replacement policies in set-associative caches have been studied in great depth. Currently, most general-purpose processors are multi-core, and among the very large corpus of research, and much to our surprise, we could not find any replacement policy that does actually take into account information relative to the sharing state of a cache way. Therefore, in this paper we propose to add, as a complement to the classical time-based related way-selection algorithms, an information relative to the sharing state and number of sharers of the ways. We propose several approaches to take this information into account, and our simulations show that LRU-based replacement policies can be slightly improved by them. Also, a much simpler policy, MRU, can be improved by our strategies, presenting up to 3.5× more IPC than baseline, and up to 82% less cache misses.


Author(s):  
Kevin Yang ◽  
Kyle Swanson ◽  
Wengong Jin ◽  
Connor Coley ◽  
philipp eiden ◽  
...  

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 15 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.


1996 ◽  
Vol 1 (3) ◽  
pp. 123-130 ◽  
Author(s):  
Kevin R. Oldenburg ◽  
Kham T. Vo ◽  
Beatrice Ruhland ◽  
Peter J. Schatz ◽  
Zhengyu Yuan

Combinatorial chemistry has opened a new realm of chemical diversity in the search for useful therapeutics as well as the ability to generate chemical libraries of hundreds of thousands to millions of discrete compounds. For the biologist, the goal is to screen these large libraries quickly and to obtain as much information in the primary screen as possible. Ideally, a primary screen would not only identify potential lead compounds but also yield information about the specificity, toxicity, and potency of that compound. Toward this end, a primary screen has been developed in which two organisms are cocultured, either bacteria, yeast, or mammalian cells, in the presence of a combinatorial library. For example, bacteria and yeast are cocultured either in liquid or in agar. When exposed to compounds from the combinatorial library, individual compounds are found which inhibit bacterial growth antibacterialls, inhibit yeast growth (antifungals), or inhibit both (potential toxins). This screening method is simple, rapid, and eliminates many of the false positives usually encountered in antimicrobial screening.


2005 ◽  
Vol 12 (1) ◽  
pp. 9-23 ◽  
Author(s):  
Goangseup Zi ◽  
Hao Chen ◽  
Jingxiao Xu ◽  
Ted Belytschko

A method for modelling arbitrary growth of dynamic cracks without remeshing is presented. The method is based on a local partition of unity. It is combined with level sets, so that the discontinuities can be represented entirely in terms of nodal data. This leads to a simple method with clean data structures that can easily be incorporated in general purpose software. Results for a mixed-mode dynamic fracture problem are given to demonstrate the method.


Sign in / Sign up

Export Citation Format

Share Document