scholarly journals Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

2015 ◽  
Vol 6 (12) ◽  
pp. 2326-2331 ◽  
Author(s):  
Katja Hansen ◽  
Franziska Biegler ◽  
Raghunathan Ramakrishnan ◽  
Wiktor Pronobis ◽  
O. Anatole von Lilienfeld ◽  
...  
2019 ◽  
Author(s):  
Seoin Back ◽  
Kevin Tran ◽  
Zachary Ulissi

<div> <div> <div> <div><p>Developing active and stable oxygen evolution catalysts is a key to enabling various future energy technologies and the state-of-the-art catalyst is Ir-containing oxide materials. Understanding oxygen chemistry on oxide materials is significantly more complicated than studying transition metal catalysts for two reasons: the most stable surface coverage under reaction conditions is extremely important but difficult to understand without many detailed calculations, and there are many possible active sites and configurations on O* or OH* covered surfaces. We have developed an automated and high-throughput approach to solve this problem and predict OER overpotentials for arbitrary oxide surfaces. We demonstrate this for a number of previously-unstudied IrO2 and IrO3 polymorphs and their facets. We discovered that low index surfaces of IrO2 other than rutile (110) are more active than the most stable rutile (110), and we identified promising active sites of IrO2 and IrO3 that outperform rutile (110) by 0.2 V in theoretical overpotential. Based on findings from DFT calculations, we pro- vide catalyst design strategies to improve catalytic activity of Ir based catalysts and demonstrate a machine learning model capable of predicting surface coverages and site activity. This work highlights the importance of investigating unexplored chemical space to design promising catalysts.<br></p></div></div></div></div><div><div><div> </div> </div> </div>


2020 ◽  
Vol 20 (14) ◽  
pp. 1375-1388 ◽  
Author(s):  
Patnala Ganga Raju Achary

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.


Metabolites ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 445
Author(s):  
Morena M. Tinte ◽  
Kekeletso H. Chele ◽  
Justin J. J. van der Hooft ◽  
Fidele Tugizimana

Plants are constantly challenged by changing environmental conditions that include abiotic stresses. These are limiting their development and productivity and are subsequently threatening our food security, especially when considering the pressure of the increasing global population. Thus, there is an urgent need for the next generation of crops with high productivity and resilience to climate change. The dawn of a new era characterized by the emergence of fourth industrial revolution (4IR) technologies has redefined the ideological boundaries of research and applications in plant sciences. Recent technological advances and machine learning (ML)-based computational tools and omics data analysis approaches are allowing scientists to derive comprehensive metabolic descriptions and models for the target plant species under specific conditions. Such accurate metabolic descriptions are imperatively essential for devising a roadmap for the next generation of crops that are resilient to environmental deterioration. By synthesizing the recent literature and collating data on metabolomics studies on plant responses to abiotic stresses, in the context of the 4IR era, we point out the opportunities and challenges offered by omics science, analytical intelligence, computational tools and big data analytics. Specifically, we highlight technological advancements in (plant) metabolomics workflows and the use of machine learning and computational tools to decipher the dynamics in the chemical space that define plant responses to abiotic stress conditions.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Janna Hastings ◽  
Martin Glauer ◽  
Adel Memariani ◽  
Fabian Neuhaus ◽  
Till Mossakowski

AbstractChemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Weishun Zhong ◽  
Jacob M. Gold ◽  
Sarah Marzen ◽  
Jeremy L. England ◽  
Nicole Yunger Halpern

AbstractDiverse many-body systems, from soap bubbles to suspensions to polymers, learn and remember patterns in the drives that push them far from equilibrium. This learning may be leveraged for computation, memory, and engineering. Until now, many-body learning has been detected with thermodynamic properties, such as work absorption and strain. We progress beyond these macroscopic properties first defined for equilibrium contexts: We quantify statistical mechanical learning using representation learning, a machine-learning model in which information squeezes through a bottleneck. By calculating properties of the bottleneck, we measure four facets of many-body systems’ learning: classification ability, memory capacity, discrimination ability, and novelty detection. Numerical simulations of a classical spin glass illustrate our technique. This toolkit exposes self-organization that eludes detection by thermodynamic measures: Our toolkit more reliably and more precisely detects and quantifies learning by matter while providing a unifying framework for many-body learning.


2018 ◽  
Author(s):  
Khader Shameer ◽  
Kipp W. Johnson ◽  
Benjamin S. Glicksberg ◽  
Rachel Hodos ◽  
Ben Readhead ◽  
...  

ABSTRACTDrug repositioning, i.e. identifying new uses for existing drugs and research compounds, is a cost-effective drug discovery strategy that is continuing to grow in popularity. Prioritizing and identifying drugs capable of being repositioned may improve the productivity and success rate of the drug discovery cycle, especially if the drug has already proven to be safe in humans. In previous work, we have shown that drugs that have been successfully repositioned have different chemical properties than those that have not. Hence, there is an opportunity to use machine learning to prioritize drug-like molecules as candidates for future repositioning studies. We have developed a feature engineering and machine learning that leverages data from publicly available drug discovery resources: RepurposeDB and DrugBank. ChemVec is the chemoinformatics-based feature engineering strategy designed to compile molecular features representing the chemical space of all drug molecules in the study. ChemVec was trained through a variety of supervised classification algorithms (Naïve Bayes, Random Forest, Support Vector Machines and an ensemble model combining the three algorithms). Models were created using various combinations of datasets as Connectivity Map based model, DrugBank Approved compounds based model, and DrugBank full set of compounds; of which RandomForest trained using Connectivity Map based data performed the best (AUC=0.674). Briefly, our study represents a novel approach to evaluate a small molecule for drug repositioning opportunity and may further improve discovery of pleiotropic drugs, or those to treat multiple indications.


2020 ◽  
Vol 1 (2) ◽  
Author(s):  
Jacob M. Remington ◽  
Jonathon B. Ferrell ◽  
Marlo Zorman ◽  
Adam Petrucci ◽  
Severin T. Schneebeli ◽  
...  

ABSTRACT Recent advances in computer hardware and software, particularly the availability of machine learning (ML) libraries, allow the introduction of data-based topics such as ML into the biophysical curriculum for undergraduate and graduate levels. However, there are many practical challenges of teaching ML to advanced level students in biophysics majors, who often do not have a rich computational background. Aiming to overcome such challenges, we present an educational study, including the design of course topics, pedagogic tools, and assessments of student learning, to develop the new methodology to incorporate the basis of ML in an existing biophysical elective course and engage students in exercises to solve problems in an interdisciplinary field. In general, we observed that students had ample curiosity to learn and apply ML algorithms to predict molecular properties. Notably, feedback from the students suggests that care must be taken to ensure student preparations for understanding the data-driven concepts and fundamental coding aspects required for using ML algorithms. This work establishes a framework for future teaching approaches that unite ML and any existing course in the biophysical curriculum, while also pinpointing the critical challenges that educators and students will likely face.


2019 ◽  
Vol 123 (20) ◽  
pp. 4500-4511 ◽  
Author(s):  
Benjamin G. Peyton ◽  
T. Daniel Crawford

2019 ◽  
Vol 100 (3) ◽  
Author(s):  
Christian Hoffmann ◽  
Roberto Menichetti ◽  
Kiran H. Kanekal ◽  
Tristan Bereau

Sign in / Sign up

Export Citation Format

Share Document