Are Learned Molecular Representations Ready for Prime Time?

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 15 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

Download Full-text

Are Learned Molecular Representations Ready for Prime Time?

10.26434/chemrxiv.7940594.v2 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kevin Yang ◽

Kyle Swanson ◽

Wengong Jin ◽

Connor Coley ◽

philipp eiden ◽

...

Keyword(s):

Neural Networks ◽

Molecular Descriptors ◽

Recent Literature ◽

Chemical Space ◽

Prime Time ◽

Graph Structure ◽

Molecular Fingerprints ◽

Industry Research ◽

Proposed Model ◽

Wide Range

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

Download Full-text

Analyzing Learned Molecular Representations for Property Prediction

10.26434/chemrxiv.7940594 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kevin Yang ◽

Kyle Swanson ◽

Wengong Jin ◽

Connor Coley ◽

philipp eiden ◽

...

Keyword(s):

Neural Networks ◽

Molecular Descriptors ◽

Recent Literature ◽

Chemical Space ◽

Graph Structure ◽

Molecular Fingerprints ◽

Property Prediction ◽

Industry Research ◽

Proposed Model ◽

Wide Range

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

Download Full-text

Analyzing Learned Molecular Representations for Property Prediction

10.26434/chemrxiv.7940594.v3 ◽

2019 ◽

Author(s):

Kevin Yang ◽

Kyle Swanson ◽

Wengong Jin ◽

Connor Coley ◽

philipp eiden ◽

...

Keyword(s):

Neural Networks ◽

Molecular Descriptors ◽

Recent Literature ◽

Chemical Space ◽

Graph Structure ◽

Molecular Fingerprints ◽

Property Prediction ◽

Industry Research ◽

Proposed Model ◽

Wide Range

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

Download Full-text

Object Detection with Low Capacity GPU Systems Using Improved Faster R-CNN

Applied Sciences ◽

10.3390/app10010083 ◽

2019 ◽

Vol 10 (1) ◽

pp. 83 ◽

Cited By ~ 5

Author(s):

Atakan Körez ◽

Necaattin Barışçı

Keyword(s):

Neural Networks ◽

Object Detection ◽

Convolutional Neural Networks ◽

Graphics Processing Unit ◽

Satellite Image ◽

Traffic Monitoring ◽

Batch Size ◽

Processing Unit ◽

Proposed Model ◽

Wide Range

Object detection in remote sensing images has been frequently used in a wide range of areas such as land planning, city monitoring, traffic monitoring, and agricultural applications. It is essential in the field of aerial and satellite image analysis but it is also a challenge. To overcome this challenging problem, there are many object detection models using convolutional neural networks (CNN). The deformable convolutional structure has been introduced to eliminate the disadvantage of the fixed grid structure of the convolutional neural networks. In this study, a multi-scale Faster R-CNN method based on deformable convolution is proposed for single/low graphics processing unit (GPU) systems. Weight standardization (WS) is used instead of batch normalization (BN) to make the proposed model more efficient for a small batch size (1 img/per GPU) on single GPU systems. Experiments were conducted on the publicly available 10-class geospatial object detection (NWPU-VHR 10) dataset to evaluate the object detection performance of the proposed model. Experiment results show that our model achieved a 92.3 mAP. This is a 1.7% mAP increase when compared to the best results in the models using the same dataset.

Download Full-text

Intrinsic Stacking Interactions of Natural and Artificial Nucleobases

10.26434/chemrxiv.11400405 ◽

2019 ◽

Author(s):

Drew P. Harding ◽

Laura J. Kingsley ◽

Glen Spraggon ◽

Steven Wheeler

Keyword(s):

Gas Phase ◽

Electrostatic Interactions ◽

Density Functional ◽

Molecular Descriptors ◽

Stacking Interactions ◽

Interaction Energies ◽

Functional Theory ◽

Binding Partner ◽

Heavy Atoms ◽

Wide Range

The intrinsic (gas-phase) stacking energies of natural and artificial nucleobases were explored using density functional theory (DFT) and correlated ab initio methods. Ranking the stacking strength of natural nucleobase dimers revealed a preference in binding partner similar to that seen from experiments, namely G > C > A > T > U. Decomposition of these interaction energies using symmetry-adapted perturbation theory (SAPT) showed that these dispersion dominated interactions are modulated by electrostatics. Artificial nucleobases showed a similar stacking preference for natural nucleobases and were also modulated by electrostatic interactions. A robust predictive multivariate model was developed that quantitively predicts the maximum stacking interaction between natural and a wide range of artificial nucleobases using molecular descriptors based on computed electrostatic potentials (ESPs) and the number of heavy atoms. This model should find utility in designing artificial nucleobase analogs that exhibit stacking interactions comparable to those of natural nucleobases. Further analysis of the descriptors in this model unveil the origin of superior stacking abilities of certain nucleobases, including cytosine and guanine.

Download Full-text

General Cyclopropane Assembly via Enantioselective Redox-Active Carbene Transfer to Aliphatic Olefins

10.26434/chemrxiv.7436795 ◽

2018 ◽

Author(s):

Marc Montesinos-Magraner ◽

Matteo Costantini ◽

Rodrigo Ramirez-Contreras ◽

Michael E. Muratore ◽

Magnus J. Johansson ◽

...

Keyword(s):

Total Synthesis ◽

Asymmetric Synthesis ◽

Chemical Space ◽

Synthetic Approach ◽

Asymmetric Cyclopropanation ◽

Redox Active ◽

Wide Range ◽

Leaving Group ◽

Stereoelectronic Properties ◽

Carbene Transfer

Asymmetric cyclopropane synthesis currently requires bespoke strategies, methods, substrates and reagents, even when targeting similar compounds. This limits the speed and chemical space available for discovery campaigns. Here we introduce a practical and versatile diazocompound, and we demonstrate its performance in the first unified asymmetric synthesis of functionalized cyclopropanes. We found that the redox-active leaving group in this reagent enhances the reactivity and selectivity of geminal carbene transfer. This effect enabled the asymmetric cyclopropanation of a wide range of olefins including unactivated aliphatic alkenes, enabling the 3-step total synthesis of (–)-dictyopterene A. This unified synthetic approach delivers high enantioselectivities that are independent of the stereoelectronic properties of the functional groups transferred. Our results demonstrate that orthogonally-differentiated diazocompounds are viable and advantageous equivalents of single-carbon chirons<i>.</i>

Download Full-text

Studies on the EC50 of Natural Monoterpenes as Fungal Inhibitors with Quantitative Structure-Activity Relationships (QSARs)

The Natural Products Journal ◽

10.2174/2210315509666190117150153 ◽

2020 ◽

Vol 10 (1) ◽

pp. 44-60

Author(s):

Mohamed E.I. Badawy ◽

Entsar I. Rabea ◽

Samir A.M. Abdelgaleil

Keyword(s):

Biological Activity ◽

Plant Pathogens ◽

Molecular Descriptors ◽

Microbial Pathogens ◽

Quantitative Structure ◽

Qsar Study ◽

Qsar Analysis ◽

Pharmacophore Modelling ◽

Structure Activity ◽

Wide Range

Background:Monoterpenes are the main constituents of the essential oils obtained from plants. These natural products offered wide spectra of biological activity and extensively tested against microbial pathogens and other agricultural pests.Methods:Antifungal activity of 10 monoterpenes, including two hydrocarbons (camphene and (S)- limonene) and eight oxygenated hydrocarbons ((R)-camphor, (R)-carvone, (S)-fenchone, geraniol, (R)-linalool, (+)-menthol, menthone, and thymol), was determined against fungi of Alternaria alternata, Botrytis cinerea, Botryodiplodia theobromae, Fusarium graminearum, Phoma exigua, Phytophthora infestans, and Sclerotinia sclerotiorum by the mycelia radial growth technique. Subsequently, Quantitative Structure-Activity Relationship (QSAR) analysis using different molecular descriptors with multiple regression analysis based on systematic search and LOOCV technique was performed. Moreover, pharmacophore modelling was carried out using LigandScout software to evaluate the common features essential for the activity and the hypothetical geometries adopted by these ligands in their most active forms.Results:The results showed that the antifungal activities were high, but depended on the chemical structure and the type of microorganism. Thymol showed the highest effect against all fungi tested with respective EC50 in the range of 10-86 mg/L. The QSAR study proved that the molecular descriptors HBA, MR, Pz, tPSA, and Vp were correlated positively with the biological activity in all of the best models with a correlation coefficient (r) ≥ 0.98 and cross-validated values (Q2) ≥ 0.77.Conclusion:The results of this work offer the opportunity to choose monoterpenes with preferential antimicrobial activity against a wide range of plant pathogens.

Download Full-text

What If the ‘Anthropocene’ Is Not Formalized as a New Geological Series/Epoch?

Quaternary ◽

10.3390/quat1030024 ◽

2018 ◽

Vol 1 (3) ◽

pp. 24 ◽

Cited By ~ 5

Author(s):

Valentí Rull

Keyword(s):

Present State ◽

Executive Committee ◽

Working Group ◽

Recent Literature ◽

Special Issue ◽

Environmental Sciences ◽

International Union ◽

Quaternary Stratigraphy ◽

Wide Range ◽

Geological Sciences

In the coming years, the Anthropocene Working Group (AWG) will submit its proposal on the ‘Anthropocene’ to the Subcommission of Quaternary Stratigraphy (SQS) and the International Commission on Stratigraphy (ICS) for approval. If approved, the proposal will be sent to the Executive Committee of the International Union of Geological Sciences (IUGS) for ratification. If the proposal is approved and ratified, then the ‘Anthropocene’ will be formalized. Currently, the ‘Anthropocene’ is a broadly used term and concept in a wide range of scientific and non-scientific situations, and, for many, the official acceptance of this term is only a matter of time. However, the AWG proposal, in its present state, seems to not fully meet the requirements for a new chronostratigraphic unit. This essay asks what could happen if the current ‘Anthropocene’ proposal is not formalized by the ICS/IUGS. The possible stratigraphic alternatives are evaluated on the basis of the more recent literature and the personal opinions of distinguished AWG, SQS, and ICS members. The eventual impact on environmental sciences and on non-scientific sectors, where the ‘Anthropocene’ seems already firmly rooted and de facto accepted as a new geological epoch, are also discussed. This essay is intended as the editorial introduction to a Quaternary special issue on the topic.

Download Full-text

Data augmentation for computed tomography angiography via synthetic image generation and neural domain adaptation

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0015 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Malte Seemann ◽

Lennart Bargsten ◽

Alexander Schlaefer

Keyword(s):

Computed Tomography ◽

Neural Networks ◽

Deep Learning ◽

Medical Imaging ◽

Computed Tomography Angiography ◽

Data Augmentation ◽

Domain Adaptation ◽

Synthetic Image ◽

Wide Range ◽

The Impact

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.

Download Full-text

Neural Network Approach for Global Solar Irradiance Prediction at Extremely Short-Time-Intervals Using Particle Swarm Optimization Algorithm

Energies ◽

10.3390/en14041213 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1213

Author(s):

Ahmed Aljanad ◽

Nadia M. L. Tan ◽

Vassilios G. Agelidis ◽

Hussain Shareef

Keyword(s):

Neural Networks ◽

Particle Swarm Optimization ◽

Solar Irradiance ◽

Particle Swarm ◽

Time Interval ◽

Time Intervals ◽

Swarm Optimization ◽

Backpropagation Neural Networks ◽

Proposed Model ◽

Short Time

Hourly global solar irradiance (GSR) data are required for sizing, planning, and modeling of solar photovoltaic farms. However, operating and controlling such farms exposed to varying environmental conditions, such as fast passing clouds, necessitates GSR data to be available for very short time intervals. Classical backpropagation neural networks do not perform satisfactorily when predicting parameters within short intervals. This paper proposes a hybrid backpropagation neural networks based on particle swarm optimization. The particle swarm algorithm is used as an optimization algorithm within the backpropagation neural networks to optimize the number of hidden layers and neurons used and its learning rate. The proposed model can be used as a reliable model in predicting changes in the solar irradiance during short time interval in tropical regions such as Malaysia and other regions. Actual global solar irradiance data of 5-s and 1-min intervals, recorded by weather stations, are applied to train and test the proposed algorithm. Moreover, to ensure the adaptability and robustness of the proposed technique, two different cases are evaluated using 1-day and 3-days profiles, for two different time intervals of 1-min and 5-s each. A set of statistical error indices have been introduced to evaluate the performance of the proposed algorithm. From the results obtained, the 3-days profile’s performance evaluation of the BPNN-PSO are 1.7078 of RMSE, 0.7537 of MAE, 0.0292 of MSE, and 31.4348 of MAPE (%), at 5-s time interval, where the obtained results of 1-min interval are 0.6566 of RMSE, 0.2754 of MAE, 0.0043 of MSE, and 1.4732 of MAPE (%). The results revealed that proposed model outperformed the standalone backpropagation neural networks method in predicting global solar irradiance values for extremely short-time intervals. In addition to that, the proposed model exhibited high level of predictability compared to other existing models.

Download Full-text