A Machine Learning Approach to Identify Specific Small Molecule Inhibitors of Secondary Nucleation in alpha-Synuclein Aggregation

2021 ◽  
Author(s):  
Robert I. Horne ◽  
Andrea Possenti ◽  
Sean Chia ◽  
Z. Faidon Brotzakis ◽  
Roxine Staats ◽  
...  

Drug development is an increasingly active area of machine learning application, due to the high attrition rates of conventional drug discovery pipelines. This issue is especially pressing for neurodegenerative diseases where very few disease modifying drugs have been approved, demonstrating a need for novel and efficient approaches to drug discovery in this area. However, whether or not machine learning methods can fulfil this role remains to be demonstrated. To explore this possibility, we describe a machine learning approach to identify specific inhibitors of the proliferation of alpha-synuclein aggregates through secondary nucleation, a process that has been implicated in Parkinson's disease and related synucleinopathies. We use a combination of docking simulations followed by machine learning to first identify initial hit compounds and then explore the chemical space around these compounds. Our results demonstrate that this approach leads to the identification of novel chemical matter with an improved hit rate and potency over conventional similarity search approaches.


2021 ◽  
Author(s):  
george chang ◽  
Nathaniel Woody ◽  
Christopher Keefer

Lipophilicity is a fundamental structural property that influences almost every aspect of drug discovery. Within Pfizer, we have two complementary high-throughput screens for measuring lipophilicity as a distribution coefficient (LogD) – a miniaturized shake-flask method (SFLogD) and a chromatographic method (ELogD). The results from these two assays are not the same (see Figure 1), with each assay being applicable or more reliable in particular chemical spaces. In addition to LogD assays, the ability to predict the LogD value for virtual compounds is equally vital. Here we present an in-silico LogD model, applicable to all chemical spaces, based on the integration of the LogD data from both assays. We developed two approaches towards a single LogD model – a Rule-based and a Machine Learning approach. Ultimately, the Machine Learning LogD model was found to be superior to both internally developed and commercial LogD models.<br>



2010 ◽  
Vol 50 (5) ◽  
pp. 716-731 ◽  
Author(s):  
Shivani Agarwal ◽  
Deepak Dugar ◽  
Shiladitya Sengupta


2021 ◽  
Author(s):  
george chang ◽  
Nathaniel Woody ◽  
Christopher Keefer

Lipophilicity is a fundamental structural property that influences almost every aspect of drug discovery. Within Pfizer, we have two complementary high-throughput screens for measuring lipophilicity as a distribution coefficient (LogD) – a miniaturized shake-flask method (SFLogD) and a chromatographic method (ELogD). The results from these two assays are not the same (see Figure 1), with each assay being applicable or more reliable in particular chemical spaces. In addition to LogD assays, the ability to predict the LogD value for virtual compounds is equally vital. Here we present an in-silico LogD model, applicable to all chemical spaces, based on the integration of the LogD data from both assays. We developed two approaches towards a single LogD model – a Rule-based and a Machine Learning approach. Ultimately, the Machine Learning LogD model was found to be superior to both internally developed and commercial LogD models.<br>



2020 ◽  
Vol 21 (10) ◽  
pp. 3585 ◽  
Author(s):  
Neann Mathai ◽  
Johannes Kirchmair

Computational methods for predicting the macromolecular targets of drugs and drug-like compounds have evolved as a key technology in drug discovery. However, the established validation protocols leave several key questions regarding the performance and scope of methods unaddressed. For example, prediction success rates are commonly reported as averages over all compounds of a test set and do not consider the structural relationship between the individual test compounds and the training instances. In order to obtain a better understanding of the value of ligand-based methods for target prediction, we benchmarked a similarity-based method and a random forest based machine learning approach (both employing 2D molecular fingerprints) under three testing scenarios: a standard testing scenario with external data, a standard time-split scenario, and a scenario that is designed to most closely resemble real-world conditions. In addition, we deconvoluted the results based on the distances of the individual test molecules from the training data. We found that, surprisingly, the similarity-based approach generally outperformed the machine learning approach in all testing scenarios, even in cases where queries were structurally clearly distinct from the instances in the training (or reference) data, and despite a much higher coverage of the known target space.



2021 ◽  
Author(s):  
Moses P Cook ◽  
Bessi Qorri ◽  
Amruth Baskar ◽  
Jalal Ziauddin ◽  
Luca Pani ◽  
...  

There are many small datasets of significant value in the medical space that are being underutilized. Due to the heterogeneity of complex disorders found in oncology, systems capable of discovering patient subpopulations while elucidating etiologies is of great value as it can indicate leads for innovative drug discovery and development. Here, we report on a machine intelligence-based study that utilized a combination of two small non-small cell lung cancer (NSCLC) datasets consisting of 58 samples of adenocarcinoma (ADC) and squamous cell carcinoma (SCC) and 45 samples from the gene expression analysis of human lung cancer and control samples series (GSE18842). Utilizing a novel machine learning approach, we were able to uncover subpopulations of ADC and SCC while simultaneously extracting which genes, in combination, were significantly involved in defining the subpopulations. An interactive hypothesis-generating interface designed to work with machine learning methods allowed us to explore the hypotheses generated by the unsupervised components of the system. Using these methods, we were able to uncover genes implicated by other methods and accurately discover known subpopulations without being asked, such as different levels of aggressiveness within the SCC and ADC subtypes. Furthermore, PIGX was a novel gene implicated in this study that warrants further study due to its role in breast cancer proliferation. Here we demonstrate the ability to learn from small datasets and reveal well-established properties of NSCLC. These machine learning techniques can reveal the driving factors behind subpopulations of patients altering the approach to drug discovery and development by making precision medicine a reality.



2019 ◽  
Vol 17 (05) ◽  
pp. 1950026
Author(s):  
Pietro Bongini ◽  
Neri Niccolai ◽  
Monica Bianchini

Nowadays, it is well established that most of the human diseases which are not related to pathogen infections have their origin from DNA disorders. Thus, DNA mutations, waiting for the availability of CRISPR-like remedies, will propagate into proteomics, offering the possibility to select natural or synthetic molecules to fight against the effects of malfunctioning proteins. Drug discovery, indeed, is a flourishing field of biotechnological research to improve human health, even though the development of a new drug is increasingly more expensive in spite of the massive use of informatics in Medicinal Chemistry. CRISPR technology adds new alternatives to cure diseases by removing DNA defects responsible of genome-related pathologies. In principle, the same technology, however, could also be exploited to induce protein mutations whose effects are controlled by the presence of suitable ligands. In this paper, a new idea is proposed for the realization of mutated proteins, on the surface of which more spacious transient pockets are formed and, therefore, are more suitable for hosting drugs. In particular, new allosteric sites are obtained by replacing amino-acids with bulky side chains with glycine, Gly, the smallest natural amino-acid. We also present a machine learning approach to evaluate the druggability score of new (or enlarged) pockets. Preliminary experimental results are very promising, showing that 10% of the sites created by the Gly-pipe software are druggable.







Sign in / Sign up

Export Citation Format

Share Document