Co-evolution based machine-learning for predicting functional interactions between human genes

AbstractOver the next decade, more than a million eukaryotic species are expected to be fully sequenced. This has the potential to improve our understanding of genotype and phenotype crosstalk, gene function and interactions, and answer evolutionary questions. Here, we develop a machine-learning approach for utilizing phylogenetic profiles across 1154 eukaryotic species. This method integrates co-evolution across eukaryotic clades to predict functional interactions between human genes and the context for these interactions. We benchmark our approach showing a 14% performance increase (auROC) compared to previous methods. Using this approach, we predict functional annotations for less studied genes. We focus on DNA repair and verify that 9 of the top 50 predicted genes have been identified elsewhere, with others previously prioritized by high-throughput screens. Overall, our approach enables better annotation of function and functional interactions and facilitates the understanding of evolutionary processes underlying co-evolution. The manuscript is accompanied by a webserver available at: https://mlpp.cs.huji.ac.il.

Download Full-text

Providing the ‘Best’ Lipophilicity Assessment in a Drug Discovery Environment

10.26434/chemrxiv.14292485 ◽

2021 ◽

Author(s):

george chang ◽

Nathaniel Woody ◽

Christopher Keefer

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

High Throughput ◽

In Silico ◽

Shake Flask ◽

Chromatographic Method ◽

Learning Approach ◽

Rule Based ◽

Machine Learning Approach ◽

High Throughput Screens

Lipophilicity is a fundamental structural property that influences almost every aspect of drug discovery. Within Pfizer, we have two complementary high-throughput screens for measuring lipophilicity as a distribution coefficient (LogD) – a miniaturized shake-flask method (SFLogD) and a chromatographic method (ELogD). The results from these two assays are not the same (see Figure 1), with each assay being applicable or more reliable in particular chemical spaces. In addition to LogD assays, the ability to predict the LogD value for virtual compounds is equally vital. Here we present an in-silico LogD model, applicable to all chemical spaces, based on the integration of the LogD data from both assays. We developed two approaches towards a single LogD model – a Rule-based and a Machine Learning approach. Ultimately, the Machine Learning LogD model was found to be superior to both internally developed and commercial LogD models.<br>

Download Full-text

A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data

BMC Genomics ◽

10.1186/1471-2164-11-s5-s9 ◽

2010 ◽

Vol 11 (Suppl 5) ◽

pp. S9 ◽

Cited By ~ 25

Author(s):

Pedro R Costa ◽

Marcio L Acencio ◽

Ney Lemke

Keyword(s):

Machine Learning ◽

Learning Approach ◽

Genome Wide ◽

Human Genes ◽

Level Data ◽

Machine Learning Approach

Download Full-text

Providing the ‘Best’ Lipophilicity Assessment in a Drug Discovery Environment

10.26434/chemrxiv.14292485.v1 ◽

2021 ◽

Author(s):

george chang ◽

Nathaniel Woody ◽

Christopher Keefer

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

High Throughput ◽

In Silico ◽

Shake Flask ◽

Chromatographic Method ◽

Learning Approach ◽

Rule Based ◽

Machine Learning Approach ◽

High Throughput Screens

Download Full-text

Beyond modularity: Fine-scale mechanisms and rules for brain network reconfiguration

10.1101/097691 ◽

2017 ◽

Cited By ~ 2

Author(s):

Ankit N. Khambhati ◽

Marcelo G. Mattar ◽

Danielle S. Bassett

Keyword(s):

Machine Learning ◽

Brain Network ◽

Modular Organization ◽

Functional Networks ◽

Learning Approach ◽

Functional Brain ◽

Functional Interactions ◽

Functional Brain Network ◽

Machine Learning Approach ◽

Over Time

AbstractThe human brain is in constant flux, as distinct areas engage in transient communication to support basic behaviors as well as complex cognition. The collection of interactions between cortical and subcortical areas forms a functional brain network whose topology evolves with time. Despite the nontrivial dynamics that are germaine to this networked system, experimental evidence demonstrates that functional interactions organize into putative brain systems that facilitate different facets of cognitive computation. We hypothesize that such dynamic functional networks are organized around a set of rules that constrain their spatial architecture – which brain regions may functionally interact – and their temporal architecture – how these interactions fluctuate over time. To objectively uncover these organizing principles, we apply an unsupervised machine learning approach called nonnegative matrix factorization to time-evolving, resting state functional networks in 20 healthy subjects. This machine-learning approach automatically clusters temporally co-varying functional interactions into subgraphs that represent putative topological modes of dynamic functional architecture. We find that subgraphs are stratified based on both the underlying modular organization and the topographical distance of their strongest interactions: while many subgraphs are largely contained within modules, others span between modules and are expressed differently over time. The relationship between dynamic subgraphs and modular architecture is further highlighted by the ability of time-varying subgraph expression to explain inter-individual differences in module reorganization. Collectively, these results point to the critical role subgraphs play in constraining the topography and topology of functional brain networks. More broadly, this machine learning approach opens a new door for understanding the architecture of dynamic functional networks during both task and rest states, and for probing alterations of that architecture in disease.

Download Full-text