Machine learning and molecular design of self-assembling -conjugated oligopeptides

Abstract Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM↔ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM↔ML pipeline, we obtain a powerful machinery (QM↔SP↔ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 16 properties), the new QM↔SP↔ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM↔SP↔ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials. The QM↔SP↔ML is also housed at the following website: https://github.com/TABeau/QM-SP-ML.

Download Full-text

V-dock: fast generation of novel drug-like molecules using machine-learning-based docking score and molecular optimization

10.33774/chemrxiv-2021-75t5k ◽

2021 ◽

Author(s):

Jieun Choi ◽

Juyong Lee

Keyword(s):

Machine Learning ◽

Molecular Design ◽

Biological Data ◽

Ligand Docking ◽

Molecular Property ◽

Optimization Approach ◽

Generation Process ◽

Activity Data ◽

Docking Score ◽

Novel Drug

In this work, we propose a novel drug-like molecular design workflow by combining an efficient global molecular property optimization, protein-ligand molecular docking, and machine learning. Computational drug design algorithms aim to find novel molecules satisfying various drug-like properties and have a strong binding affinity between a protein and a ligand. To accomplish this goal, various computational molecular generation methods have been developed with recent advances in deep learning and the increase of biological data. However, most existing methods heavily depend on experimental activity data, which are not available for many targets. Thus, when the number of available activity data is limited, protein-ligand docking calculations should be used. However, performing a docking calculation during molecular generation on the fly requires considerable computational resources. To address this problem, we used machine-learning models predicting docking energy to accelerate the molecular generation process. We combined this ML-assisted docking score prediction model with the efficient global molecular property optimization approach, MolFinder. We call this design approach V-dock. Using the V-dock approach, we quickly generated many molecules with high docking scores for a target protein and desirable drug-like and bespoke properties, such as similarity to a reference molecule.

Download Full-text

Self-assembling Biomaterials: Molecular Design, Characterization and Application in Biology and Medicine

Focus on Catalysts ◽

10.1016/j.focat.2018.08.055 ◽

2018 ◽

Vol 2018 (9) ◽

pp. 7 ◽

Cited By ~ 1

Keyword(s):

Molecular Design ◽

Self Assembling

Download Full-text

Intelligent Molecular Identification for High Performance Organosulfide Capture Using Active Machine Learning Algorithm

10.26434/chemrxiv-2021-hczsl ◽

2021 ◽

Author(s):

Yuxiang Chen ◽

Chuanlei Liu ◽

Yang An ◽

Yue Lou ◽

Yang Zhao ◽

...

Keyword(s):

Machine Learning ◽

Similarity Search ◽

Data Science ◽

Molecular Similarity ◽

Molecular Design ◽

Supervised Machine Learning ◽

Training Dataset ◽

Methyl Mercaptan ◽

Model Framework ◽

Modeling Framework

Machine learning and computer-aided approaches significantly accelerate molecular design and discovery in scientific and industrial fields increasingly relying on data science for efficiency. The typical method used is supervised learning which needs huge datasets. Semi-supervised machine learning approaches are effective to train unlabeled data with improved modeling performance, whereas they are limited by the accumulation of prediction errors. Here, to screen solvents for removal of methyl mercaptan, a type of organosulfur impurities in natural gas, we constructed a computational framework by integrating molecular similarity search and active learning methods, namely, molecular active selection machine learning (MASML). This new model framework identifies the optimal molecules set by molecular similarity search and iterative addition to the training dataset. Among all 126,068 compounds in the initial dataset, 3 molecules were identified to be promising for methyl mercaptan (MeSH) capture, including benzylamine (BZA), p-methoxybenzylamine (PZM), and N,N-diethyltrimethylenediamine (DEAPA). Further experiments confirmed the effectiveness of our modeling framework in efficient molecular design and identification for capturing methyl mercaptan, in which DEAPA presents a Henry's law constant 89.4% lower than that of methyl diethanolamine (MDEA).

Download Full-text

Inverse molecular design using machine learning: Generative models for matter engineering

Science ◽

10.1126/science.aat2663 ◽

2018 ◽

Vol 361 (6400) ◽

pp. 360-365 ◽

Cited By ~ 297

Author(s):

Benjamin Sanchez-Lengeling ◽

Alán Aspuru-Guzik

Keyword(s):

Machine Learning ◽

Technological Progress ◽

Rational Design ◽

Molecular Design ◽

Generative Models ◽

New Materials ◽

Large Space ◽

Starting Point ◽

Rapid Pace ◽

Synthetic Routes

The discovery of new materials can bring enormous societal and technological progress. In this context, exploring completely the large space of potential materials is computationally intractable. Here, we review methods for achieving inverse design, which aims to discover tailored materials from the starting point of a particular desired functionality. Recent advances from the rapidly growing field of artificial intelligence, mostly from the subfield of machine learning, have resulted in a fertile exchange of ideas, where approaches to inverse molecular design are being proposed and employed at a rapid pace. Among these, deep generative models have been applied to numerous classes of materials: rational design of prospective drugs, synthetic routes to organic compounds, and optimization of photovoltaics and redox flow batteries, as well as a variety of other solid-state materials.

Download Full-text

Molecular Design and Applications of Self-Assembling Surfactant-Like Peptides

Journal of Nanomaterials ◽

10.1155/2013/469261 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 9

Author(s):

Chengkang Tang ◽

Feng Qiu ◽

Xiaojun Zhao

Keyword(s):

Drug Delivery ◽

Tissue Engineering ◽

Molecular Design ◽

Structural Characteristics ◽

Protein Stabilization ◽

The Self ◽

Geometrical Shape ◽

Self Assembling ◽

Hydrophobic Moiety ◽

Potential Applications

Self-assembling surfactant-like peptides have been explored as emerging nanobiomaterials in recent years. These peptides are usually amphiphilic, typically possessing a hydrophobic moiety and a hydrophilic moiety. The structural characteristics can promote many peptide molecules to self-assemble into various nanostructures. Furthermore, properties of peptide molecules such as charge distribution and geometrical shape could also alter the formation of the self-assembling nanostructures. Based on their diverse self-assembling behaviours and nanostructures, self-assembling surfactant-like peptides exhibit great potentials in many fields, including membrane protein stabilization, drug delivery, and tissue engineering. This review mainly focuses on recent advances in studying self-assembling surfactant-like peptides, introducing their designs and the potential applications in nanobiotechnology.

Download Full-text

Explaining and avoiding failures modes in goal-directed generation

10.33774/chemrxiv-2021-4m6b3-v2 ◽

2021 ◽

Author(s):

Maxime Langevin ◽

Rodolphe Vuilleumier ◽

Marc Bianciotto

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Optimization Model ◽

In Silico ◽

Molecular Design ◽

Data Distribution ◽

Learning Models ◽

Control Models ◽

Machine Learning Models

Despite growing interest and success in automated in-silico molecular design, doubts remain regarding the ability of goal-directed generation algorithms to perform unbiased exploration of novel chemical spaces. A specific phenomenon has recently been highlighted: goal-directed generation guided with machine learning models produce molecules with high scores according to the optimization model, but low scores according to control models, even when trained on the same data distribution and the same target. In this work, we show that this worrisome behavior is actually due to issues with the predictive models and not the goal-directed generation algorithms. We show that with appropriate predictive models, this issue can be resolved, and molecules generated have high scores according to both the optimization and the control models.

Download Full-text

Self-assembling Peptide Discovery: Overcoming Human Bias With Machine Learning

10.21203/rs.3.rs-505801/v1 ◽

2021 ◽

Author(s):

Rohit Batra ◽

Troy Loeffler ◽

Henry Chan ◽

Srilok Sriniva ◽

Honggang Cui ◽

...

Keyword(s):

Machine Learning ◽

Self Assembly ◽

Search Space ◽

Sequence Length ◽

Monte Carlo Tree Search ◽

Self Assembling ◽

Naturally Occurring ◽

Peptide Materials ◽

Dynamics Simulations ◽

Better Than

Abstract Peptide materials have a wide array of functions from tissue engineering, surface coatings to catalysis and sensing. This class of biopolymer is composed of a sequence, comprised of 20 naturally occurring amino acids whose arrangement dictate the peptide functionality. While it is highly desirable to tailor the amino acid sequence, a small increase in their sequence length leads to dramatic increase in the possible candidates (e.g., from tripeptide = 20^3 or 8,000 peptides to a pentapeptide = 20^5 or 3.2 M). Traditionally, peptide design is guided by the use of structural propensity tables, hydrophobicity scales, or other desired properties and typically yields <10 peptides per study, barely scraping the surface of the search space. These approaches, driven by human expertise and intuition, are not easily scalable and are riddled with human bias. Here, we introduce a machine learning workflow that combines Monte Carlo tree search and random forest, with molecular dynamics simulations to develop a fully autonomous computational search engine (named, AI-expert) to discover peptide sequences with high potential for self-assembly (as a representative target functionality). We demonstrate the efficacy of the AI-expert to efficiently search large spaces of tripeptides and pentapeptides. Subsequent experiments on the proposed peptide sequences are performed to compare the predictability of the AI-expert with those of human experts. The AI performs on-par or better than human experts and suggests several non-intuitive sequences with high self-assembly propensity, outlining its potential to overcome human bias and accelerate peptide discovery.

Download Full-text

Reorganization energies of flexible organic molecules as a challenging target for machine learning enhanced virtual screening

10.26434/chemrxiv-2021-qn823 ◽

2021 ◽

Author(s):

Ke Chen ◽

Christian Kunkel ◽

Karsten Reuter ◽

Johannes T. Margraf

Keyword(s):

Machine Learning ◽

Virtual Screening ◽

Organic Semiconductors ◽

Organic Molecules ◽

Predictive Accuracy ◽

Molecular Design ◽

Reorganization Energy ◽

Charge Carrier Mobility ◽

Design Rule ◽

Reorganization Energies

The molecular reorganization energy $\lambda$ strongly influences the charge carrier mobility of organic semiconductors and is therefore an important target for molecular design. Machine learning (ML) models generally have the potential to strongly accelerate this design process (e.g. in virtual screening studies) by providing fast and accurate estimates of molecular properties. While such models are well established for simple properties (e.g. the atomization energy), $\lambda$ poses a significant challenge in this context. In this paper, we address the questions of how ML models for $\lambda$ can be improved and what their benefit is in high-throughput virtual screening (HTVS) studies. We find that, while improved predictive accuracy can be obtained relative to a semiempirical baseline model, the improvement in molecular discovery is somewhat marginal. In particular, the ML enhanced screenings are more effective in identifying promising candidates but lead to a less diverse sample. We further use substructure analysis to derive a general design rule for organic molecules with low $\lambda$ from the HTVS results.

Download Full-text

Self-Assembling Insurance Claim Models Using Regularized Regression and Machine Learning

SSRN Electronic Journal ◽

10.2139/ssrn.3241906 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gráinne McGuire ◽

Greg Taylor ◽

Hugh Miller

Keyword(s):

Machine Learning ◽

Regularized Regression ◽

Insurance Claim ◽

Self Assembling

Download Full-text