scholarly journals V-dock: fast generation of novel drug-like molecules using machine-learning-based docking score and molecular optimization

Author(s):  
Jieun Choi ◽  
Juyong Lee

In this work, we propose a novel drug-like molecular design workflow by combining an efficient global molecular property optimization, protein-ligand molecular docking, and machine learning. Computational drug design algorithms aim to find novel molecules satisfying various drug-like properties and have a strong binding affinity between a protein and a ligand. To accomplish this goal, various computational molecular generation methods have been developed with recent advances in deep learning and the increase of biological data. However, most existing methods heavily depend on experimental activity data, which are not available for many targets. Thus, when the number of available activity data is limited, protein-ligand docking calculations should be used. However, performing a docking calculation during molecular generation on the fly requires considerable computational resources. To address this problem, we used machine-learning models predicting docking energy to accelerate the molecular generation process. We combined this ML-assisted docking score prediction model with the efficient global molecular property optimization approach, MolFinder. We call this design approach V-dock. Using the V-dock approach, we quickly generated many molecules with high docking scores for a target protein and desirable drug-like and bespoke properties, such as similarity to a reference molecule.

2021 ◽  
Vol 22 (21) ◽  
pp. 11635
Author(s):  
Jieun Choi ◽  
Juyong Lee

We propose a computational workflow to design novel drug-like molecules by combining the global optimization of molecular properties and protein-ligand docking with machine learning. However, most existing methods depend heavily on experimental data, and many targets do not have sufficient data to train reliable activity prediction models. To overcome this limitation, protein-ligand docking calculations must be performed using the limited data available. Such docking calculations during molecular generation require considerable computational time, preventing extensive exploration of the chemical space. To address this problem, we trained a machine-learning-based model that predicted the docking energy using SMILES to accelerate the molecular generation process. Docking scores could be accurately predicted using only a SMILES string. We combined this docking score prediction model with the global molecular property optimization approach, MolFinder, to find novel molecules exhibiting the desired properties with high values of predicted docking scores. We named this design approach V-dock. Using V-dock, we efficiently generated many novel molecules with high docking scores for a target protein, a similarity to the reference molecule, and desirable drug-like and bespoke properties, such as QED. The predicted docking scores of the generated molecules were verified by correlating them with the actual docking scores.


2021 ◽  
Author(s):  
Austė Kanapeckaitė ◽  
Neringa Burokienė

Abstract At present, heart failure (HF) treatment only targets the symptoms based on the left ventricle dysfunction severity; however, the lack of systemic ‘omics’ studies and available biological data to uncover the heterogeneous underlying mechanisms signifies the need to shift the analytical paradigm towards network-centric and data mining approaches. This study, for the first time, aimed to investigate how bulk and single cell RNA-sequencing as well as the proteomics analysis of the human heart tissue can be integrated to uncover HF-specific networks and potential therapeutic targets or biomarkers. We also aimed to address the issue of dealing with a limited number of samples and to show how appropriate statistical models, enrichment with other datasets as well as machine learning-guided analysis can aid in such cases. Furthermore, we elucidated specific gene expression profiles using transcriptomic and mined data from public databases. This was achieved using the two-step machine learning algorithm to predict the likelihood of the therapeutic target or biomarker tractability based on a novel scoring system, which has also been introduced in this study. The described methodology could be very useful for the target or biomarker selection and evaluation during the pre-clinical therapeutics development stage as well as disease progression monitoring. In addition, the present study sheds new light into the complex aetiology of HF, differentiating between subtle changes in dilated cardiomyopathies (DCs) and ischemic cardiomyopathies (ICs) on the single cell, proteome and whole transcriptome level, demonstrating that HF might be dependent on the involvement of not only the cardiomyocytes but also on other cell populations. Identified tissue remodelling and inflammatory processes can be beneficial when selecting targeted pharmacological management for DCs or ICs, respectively.


Author(s):  
Moretti Emilio ◽  
Tappia Elena ◽  
Limère Veronique ◽  
Melacini Marco

AbstractAs a large number of companies are resorting to increased product variety and customization, a growing attention is being put on the design and management of part feeding systems. Recent works have proved the effectiveness of hybrid feeding policies, which consist in using multiple feeding policies in the same assembly system. In this context, the assembly line feeding problem (ALFP) refers to the selection of a suitable feeding policy for each part. In literature, the ALFP is addressed either by developing optimization models or by categorizing the parts and assigning these categories to policies based on some characteristics of both the parts and the assembly system. This paper presents a new approach for selecting a suitable feeding policy for each part, based on supervised machine learning. The developed approach is applied to an industrial case and its performance is compared with the one resulting from an optimization approach. The application to the industrial case allows deepening the existing trade-off between efficiency (i.e., amount of data to be collected and dedicated resources) and quality of the ALFP solution (i.e., closeness to the optimal solution), discussing the managerial implications of different ALFP solution approaches and showing the potential value stemming from machine learning application.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Margot Gunning ◽  
Paul Pavlidis

AbstractDiscovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.


Algorithms ◽  
2019 ◽  
Vol 12 (5) ◽  
pp. 99 ◽  
Author(s):  
Kleopatra Pirpinia ◽  
Peter A. N. Bosman ◽  
Jan-Jakob Sonke ◽  
Marcel van Herk ◽  
Tanja Alderliesten

Current state-of-the-art medical deformable image registration (DIR) methods optimize a weighted sum of key objectives of interest. Having a pre-determined weight combination that leads to high-quality results for any instance of a specific DIR problem (i.e., a class solution) would facilitate clinical application of DIR. However, such a combination can vary widely for each instance and is currently often manually determined. A multi-objective optimization approach for DIR removes the need for manual tuning, providing a set of high-quality trade-off solutions. Here, we investigate machine learning for a multi-objective class solution, i.e., not a single weight combination, but a set thereof, that, when used on any instance of a specific DIR problem, approximates such a set of trade-off solutions. To this end, we employed a multi-objective evolutionary algorithm to learn sets of weight combinations for three breast DIR problems of increasing difficulty: 10 prone-prone cases, 4 prone-supine cases with limited deformations and 6 prone-supine cases with larger deformations and image artefacts. Clinically-acceptable results were obtained for the first two problems. Therefore, for DIR problems with limited deformations, a multi-objective class solution can be machine learned and used to compute straightforwardly multiple high-quality DIR outcomes, potentially leading to more efficient use of DIR in clinical practice.


Information ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 233 ◽  
Author(s):  
Zuleika Nascimento ◽  
Djamel Sadok

Network traffic classification aims to identify categories of traffic or applications of network packets or flows. It is an area that continues to gain attention by researchers due to the necessity of understanding the composition of network traffics, which changes over time, to ensure the network Quality of Service (QoS). Among the different methods of network traffic classification, the payload-based one (DPI) is the most accurate, but presents some drawbacks, such as the inability of classifying encrypted data, the concerns regarding the users’ privacy, the high computational costs, and ambiguity when multiple signatures might match. For that reason, machine learning methods have been proposed to overcome these issues. This work proposes a Multi-Objective Divide and Conquer (MODC) model for network traffic classification, by combining, into a hybrid model, supervised and unsupervised machine learning algorithms, based on the divide and conquer strategy. Additionally, it is a flexible model since it allows network administrators to choose between a set of parameters (pareto-optimal solutions), led by a multi-objective optimization process, by prioritizing flow or byte accuracies. Our method achieved 94.14% of average flow accuracy for the analyzed dataset, outperforming the six DPI-based tools investigated, including two commercial ones, and other machine learning-based methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Mohammad Nahid Hossain ◽  
Mohammad Helal Uddin ◽  
K. Thapa ◽  
Md Abdullah Al Zubaer ◽  
Md Shafiqul Islam ◽  
...  

Cognitive impairment has a significantly negative impact on global healthcare and the community. Holding a person’s cognition and mental retention among older adults is improbable with aging. Early detection of cognitive impairment will decline the most significant impact of extended disease to permanent mental damage. This paper aims to develop a machine learning model to detect and differentiate cognitive impairment categories like severe, moderate, mild, and normal by analyzing neurophysical and physical data. Keystroke and smartwatch have been used to extract individuals’ neurophysical and physical data, respectively. An advanced ensemble learning algorithm named Gradient Boosting Machine (GBM) is proposed to classify the cognitive severity level (absence, mild, moderate, and severe) based on the Standardised Mini-Mental State Examination (SMMSE) questionnaire scores. The statistical method “Pearson’s correlation” and the wrapper feature selection technique have been used to analyze and select the best features. Then, we have conducted our proposed algorithm GBM on those features. And the result has shown an accuracy of more than 94%. This paper has added a new dimension to the state-of-the-art to predict cognitive impairment by implementing neurophysical data and physical data together.


Sign in / Sign up

Export Citation Format

Share Document