A Machine Learning Approach for the Discovery of Ligand-Specific Functional Mechanisms of GPCRs

G protein-coupled receptors (GPCRs) play a key role in many cellular signaling mechanisms, and must select among multiple coupling possibilities in a ligand-specific manner in order to carry out a myriad of functions in diverse cellular contexts. Much has been learned about the molecular mechanisms of ligand-GPCR complexes from Molecular Dynamics (MD) simulations. However, to explore ligand-specific differences in the response of a GPCR to diverse ligands, as is required to understand ligand bias and functional selectivity, necessitates creating very large amounts of data from the needed large-scale simulations. This becomes a Big Data problem for the high dimensionality analysis of the accumulated trajectories. Here we describe a new machine learning (ML) approach to the problem that is based on transforming the analysis of GPCR function-related, ligand-specific differences encoded in the MD simulation trajectories into a representation recognizable by state-of-the-art deep learning object recognition technology. We illustrate this method by applying it to recognize the pharmacological classification of ligands bound to the 5-HT2A and D2 subtypes of class-A GPCRs from the serotonin and dopamine families. The ML-based approach is shown to perform the classification task with high accuracy, and we identify the molecular determinants of the classifications in the context of GPCR structure and function. This study builds a framework for the efficient computational analysis of MD Big Data collected for the purpose of understanding ligand-specific GPCR activity.

Download Full-text

A Machine Learning Approach for the Discovery of Ligand-specific Functional Mechanisms of GPCRs

10.20944/preprints201904.0232.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ambrose Plante ◽

Derek M. Shore ◽

Giulia Morra ◽

George Khelashvili ◽

Harel Weinstein

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Molecular Mechanisms ◽

Md Simulations ◽

Machine Learning Approach ◽

And Function ◽

Functional Mechanisms ◽

G Protein Coupled ◽

Large Scale Simulations

G protein-coupled receptors (GPCRs) play a key role in many cellular signaling mechanisms, and must select among multiple coupling possibilities in a ligand-specific manner in order to carry out a myriad of functions in diverse cellular contexts. Much has been learned about the molecular mechanisms of ligand-GPCR complexes from Molecular Dynamics (MD) simulations. However, to explore ligand-specific differences in the response of a GPCR to diverse ligands, as is required to understand ligand bias and functional selectivity, necessitates creating very large amounts of data from the needed large-scale simulations. This becomes a Big Data problem for the high dimensionality analysis of the accumulated trajectories. Here we describe a new machine learning (ML) approach to the problem that is based on transforming the analysis of GPCR function-related, ligand-specific differences encoded in the MD simulation trajectories into a representation recognizable by state-of-the-art deep learning object recognition technology. We illustrate this method by applying it to recognize the pharmacological classification of ligands bound to the 5-HT2A and D2 subtypes of class A GPCRs from the serotonin and dopamine families. The ML-based approach is shown to perform the classification task with high accuracy, and we identify the molecular determinants of the classifications in the context of GPCR structure and function. This study builds a framework for the efficient computational analysis of MD Big Data collected for the purpose of understanding ligand-specific GPCR activity.

Download Full-text

Comparing the utility of in vivo transposon mutagenesis approaches in yeast species to infer gene essentiality

10.1101/732552 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anton Levitan ◽

Andrew N. Gale ◽

Emma K. Dallon ◽

Darby W. Kozan ◽

Kyle W. Cunningham ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Yeast Species ◽

Transposon Mutagenesis ◽

Learning Approach ◽

Long Distance ◽

Gene Essentiality ◽

Genome Wide ◽

Machine Learning Approach

ABSTRACTIn vivo transposon mutagenesis, coupled with deep sequencing, enables large-scale genome-wide mutant screens for genes essential in different growth conditions. We analyzed six large-scale studies performed on haploid strains of three yeast species (Saccharomyces cerevisiae, Schizosaccaromyces pombe, and Candida albicans), each mutagenized with two of three different heterologous transposons (AcDs, Hermes, and PiggyBac). Using a machine-learning approach, we evaluated the ability of the data to predict gene essentiality. Important data features included sufficient numbers and distribution of independent insertion events. All transposons showed some bias in insertion site preference because of jackpot events, and preferences for specific insertion sequences and short-distance vs long-distance insertions. For PiggyBac, a stringent target sequence limited the ability to predict essentiality in genes with few or no target sequences. The machine learning approach also robustly predicted gene function in less well-studied species by leveraging cross-species orthologs. Finally, comparisons of isogenic diploid versus haploid S. cerevisiae isolates identified several genes that are haplo-insufficient, while most essential genes, as expected, were recessive. We provide recommendations for the choice of transposons and the inference of gene essentiality in genome-wide studies of eukaryotic haploid microbes such as yeasts, including species that have been less amenable to classical genetic studies.

Download Full-text

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

Essentiality of Machine Learning Algorithms for Big Data Computation

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch011 ◽

2016 ◽

pp. 156-167

Author(s):

Manjunath Thimmasandra Narayanapppa ◽

T. P. Puneeth Kumar ◽

Ravindra S. Hegadi

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Real Time Analysis ◽

Large Scale Data ◽

Computational Environment ◽

Large Scale Data Processing

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

Download Full-text

Machine Learning for Fluid Mechanics

Annual Review of Fluid Mechanics ◽

10.1146/annurev-fluid-010719-060214 ◽

2020 ◽

Vol 52 (1) ◽

pp. 477-508 ◽

Cited By ~ 122

Author(s):

Steven L. Brunton ◽

Bernd R. Noack ◽

Petros Koumoutsakos

Keyword(s):

Machine Learning ◽

Fluid Mechanics ◽

Domain Knowledge ◽

Large Scale ◽

Past History ◽

Fluid Flows ◽

Field Measurements ◽

Industrial Applications ◽

Current Lines ◽

Large Scale Simulations

The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from experiments, field measurements, and large-scale simulations at multiple spatiotemporal scales. Machine learning (ML) offers a wealth of techniques to extract information from data that can be translated into knowledge about the underlying fluid mechanics. Moreover, ML algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of ML for fluid mechanics. We outline fundamental ML methodologies and discuss their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experiments, and simulations. ML provides a powerful information-processing framework that can augment, and possibly even transform, current lines of fluid mechanics research and industrial applications.

Download Full-text

Big Data Prediction in Location-Aware Wireless Caching: A Machine Learning Approach

2019 IEEE Global Communications Conference (GLOBECOM) ◽

10.1109/globecom38437.2019.9014068 ◽

2019 ◽

Author(s):

Yunzhe Qi ◽

Zhong Yang ◽

Zhijin Qin ◽

Yuanwei Liu ◽

Yue Chen

Keyword(s):

Machine Learning ◽

Big Data ◽

Learning Approach ◽

Location Aware ◽

Data Prediction ◽

Machine Learning Approach

Download Full-text

Toward Robust Anxiety Biomarkers: A Machine Learning Approach in a Large-Scale Sample

Biological Psychiatry Cognitive Neuroscience and Neuroimaging ◽

10.1016/j.bpsc.2019.05.018 ◽

2020 ◽

Vol 5 (8) ◽

pp. 799-807 ◽

Cited By ~ 7

Author(s):

Emily A. Boeke ◽

Avram J. Holmes ◽

Elizabeth A. Phelps

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach

Journal of Chemical Theory and Computation ◽

10.1021/acs.jctc.5b00099 ◽

2015 ◽

Vol 11 (5) ◽

pp. 2087-2096 ◽

Cited By ~ 229

Author(s):

Raghunathan Ramakrishnan ◽

Pavlo O. Dral ◽

Matthias Rupp ◽

O. Anatole von Lilienfeld

Keyword(s):

Machine Learning ◽

Big Data ◽

Quantum Chemistry ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Big Data’s Role in Health and Risk Messaging

Oxford Research Encyclopedia of Communication ◽

10.1093/acrefore/9780190228613.013.359 ◽

2017 ◽

Author(s):

Bradford William Hesse

Keyword(s):

Machine Learning ◽

Big Data ◽

Risk Communication ◽

Large Scale ◽

Protein Identification ◽

Machine Learning Algorithms ◽

National Committee ◽

Learning Approaches ◽

Road Map ◽

Data Flows

The presence of large-scale data systems can be felt, consciously or not, in almost every facet of modern life, whether through the simple act of selecting travel options online, purchasing products from online retailers, or navigating through the streets of an unfamiliar neighborhood using global positioning system (GPS) mapping. These systems operate through the momentum of big data, a term introduced by data scientists to describe a data-rich environment enabled by a superconvergence of advanced computer-processing speeds and storage capacities; advanced connectivity between people and devices through the Internet; the ubiquity of smart, mobile devices and wireless sensors; and the creation of accelerated data flows among systems in the global economy. Some researchers have suggested that big data represents the so-called fourth paradigm in science, wherein the first paradigm was marked by the evolution of the experimental method, the second was brought about by the maturation of theory, the third was marked by an evolution of statistical methodology as enabled by computational technology, while the fourth extended the benefits of the first three, but also enabled the application of novel machine-learning approaches to an evidence stream that exists in high volume, high velocity, high variety, and differing levels of veracity. In public health and medicine, the emergence of big data capabilities has followed naturally from the expansion of data streams from genome sequencing, protein identification, environmental surveillance, and passive patient sensing. In 2001, the National Committee on Vital and Health Statistics published a road map for connecting these evidence streams to each other through a national health information infrastructure. Since then, the road map has spurred national investments in electronic health records (EHRs) and motivated the integration of public surveillance data into analytic platforms for health situational awareness. More recently, the boom in consumer-oriented mobile applications and wireless medical sensing devices has opened up the possibility for mining new data flows directly from altruistic patients. In the broader public communication sphere, the ability to mine the digital traces of conversation on social media presents an opportunity to apply advanced machine learning algorithms as a way of tracking the diffusion of risk communication messages. In addition to utilizing big data for improving the scientific knowledge base in risk communication, there will be a need for health communication scientists and practitioners to work as part of interdisciplinary teams to improve the interfaces to these data for professionals and the public. Too much data, presented in disorganized ways, can lead to what some have referred to as “data smog.” Much work will be needed for understanding how to turn big data into knowledge, and just as important, how to turn data-informed knowledge into action.

Download Full-text