Chemoinformatics and Advanced Machine Learning Perspectives

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.

Download Full-text

Graph Mining in Chemoinformatics

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch006 ◽

2011 ◽

pp. 95-128 ◽

Cited By ~ 1

Author(s):

Hiroto Saigo ◽

Koji Tsuda

Keyword(s):

Graph Mining ◽

Chemical Compounds ◽

Quantitative Structure Activity Relationship ◽

Support Vector ◽

Quantitative Structure ◽

Least Squares Regression ◽

Lead Compounds ◽

Least Angle Regression ◽

Attributed Graphs ◽

Vector Machines

In standard QSAR (Quantitative Structure Activity Relationship) approaches, chemical compounds are represented as a set of physicochemical property descriptors, which are then used as numerical features for classification or regression. However, standard descriptors such as structural keys and fingerprints are not comprehensive enough in many cases. Since chemical compounds are naturally represented as attributed graphs, graph mining techniques allow us to create subgraph patterns (i.e., structural motifs) that can be used as additional descriptors. In this chapter, the authors present theoretically motivated QSAR algorithms that can automatically identify informative subgraph patterns. A graph mining subroutine is embedded in the mother algorithm and it is called repeatedly to collect patterns progressively. The authors present three variations that build on support vector machines (SVM), partial least squares regression (PLS) and least angle regression (LARS). In comparison to graph kernels, our methods are more interpretable, thereby allows chemists to identify salient subgraph features to improve the druglikeliness of lead compounds.

Download Full-text

Chemoinformatics on Metabolic Pathways

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch017 ◽

2011 ◽

pp. 318-339

Author(s):

Masahiro Hattori ◽

Masaaki Kotera

Keyword(s):

Metabolic Pathway ◽

Chemical Space ◽

Research Area ◽

Chemical Information ◽

Prediction System ◽

Future Directions ◽

Heterogeneous Information ◽

Research Areas ◽

Living Organisms ◽

Product Pair

Chemical genomics is one of the cutting-edge research areas in the post-genomic era, which requires a sophisticated integration of heterogeneous information, i.e., genomic and chemical information. Enzymes play key roles for dynamic behavior of living organisms, linking information in the chemical space and genomic space. In this chapter, the authors report our recent efforts in this area, including the development of a similarity measure between two chemical compounds, a prediction system of a plausible enzyme for a given substrate and product pair, and two different approaches to predict the fate of a given compound in a metabolic pathway. General problems and possible future directions are also discussed, in hope to attract more activities from many researchers in this research area.

Download Full-text

Learning Methodologies for Detection and Classification of Mutagens

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch014 ◽

2011 ◽

pp. 274-288

Author(s):

Huma Lodhi

Keyword(s):

Machine Learning ◽

Ames Test ◽

Computational Techniques ◽

Biological Method ◽

Challenging Problem ◽

Learning Methods ◽

Machine Learning Methods ◽

Learning Techniques ◽

Dynamic Growth

Predicting mutagenicity is a complex and challenging problem in chemoinformatics. Ames test is a biological method to assess mutagenicity of molecules. The dynamic growth in the repositories of molecules establishes a need to develop and apply effective and efficient computational techniques to solving chemoinformatics problems such as identification and classification of mutagens. Machine learning methods provide effective solutions to chemoinformatics problems. This chapter presents an overview of the learning techniques that have been developed and applied to the problem of identification and classification of mutagens.

Download Full-text

Application of Machine Leaning in Drug Discovery and Development

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch012 ◽

2011 ◽

pp. 235-256 ◽

Cited By ~ 2

Author(s):

Shuxing Zhang

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Model Building ◽

Pharmaceutical Research ◽

Machine Learning Techniques ◽

High Dimensional ◽

Lead Discovery ◽

Drug Discovery And Development ◽

Learning Techniques ◽

Machine Leaning

Machine learning techniques have been widely used in drug discovery and development, particularly in the areas of cheminformatics, bioinformatics and other types of pharmaceutical research. It has been demonstrated they are suitable for large high dimensional data, and the models built with these methods can be used for robust external predictions. However, various problems and challenges still exist, and new approaches are in great need. In this Chapter, the authors will review the current development of machine learning techniques, and especially focus on several machine learning techniques they developed as well as their application to model building, lead discovery via virtual screening, integration with molecular docking, and prediction of off-target properties. The authors will suggest some potential different avenues to unify different disciplines, such as cheminformatics, bioinformatics and systems biology, for the purpose of developing integrated in silico drug discovery and development approaches.

Download Full-text

Advanced PLS Techniques in Chemometrics and Their Applications to Molecular Design

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch008 ◽

2011 ◽

pp. 145-168 ◽

Cited By ~ 5

Author(s):

Kiyoshi Hasegawa ◽

Kimito Funatsu

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Molecular Design ◽

Review Article ◽

Complex Data ◽

Quantitative Structure ◽

Multivariate Statistical ◽

Diagnostic Plots ◽

Statistical Measures ◽

Prediction Regions

In quantitative structure-activity/property relationships (QSAR and QSPR), multivariate statistical methods are commonly used for analysis. Partial least squares (PLS) is of particular interest because it can analyze data with strongly collinear, noisy and numerous X variables, and also simultaneously model several response variables Y. Furthermore, PLS can provide us several prediction regions and diagnostic plots as statistical measures. PLS has evolved or changed for copying with sever demands from complex data X and Y structure. In this review article, the authors picked up four advanced PLS techniques and outlined their algorithms with representative examples. Especially, the authors made efforts to describe how to disclose the embedded inner relations in data and how to use their information for molecular design.

Download Full-text

Optimal Assignment Kernels for ADME in Silico Prediction

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch002 ◽

2011 ◽

pp. 16-34 ◽

Cited By ~ 1

Author(s):

Holger Fröhlich

Keyword(s):

Similarity Measure ◽

Prediction Models ◽

Molecular Graph ◽

Chemical Properties ◽

Machine Learning Techniques ◽

Single Atom ◽

Optimal Assignment ◽

Single Node ◽

Bipartite Matching ◽

Graph Representations

Prediction models for absorption, distribution, metabolic and excretion properties of chemical compounds play a crucial rule in the drug discovery process. Often such models are derived via machine learning techniques. Kernel based learning algorithms, like the well known support vector machine (SVM) have gained a growing interest during the last years for this purpose. One of the key concepts of SVMs is a kernel function, which can be thought of as a special similarity measure. In this Chapter the author describes optimal assignment kernels for multi-labeled molecular graphs. The optimal assignment kernel is based on the idea of a maximal weighted bipartite matching of the atoms of a pair of molecules. At the same time the physico-chemical properties of each single atom are considered as well as the neighborhood in the molecular graph. Later on our similarity measure is extended to deal with reduced graph representations, in which certain structural elements, like rings, donors or acceptors, are condensed in one single node of the graph. Comparisons of the optimal assignment kernel with other graph kernels as well as with classical descriptor based models show a significant improvement in prediction accuracy.

Download Full-text

Brain-like Processing and Classification of Chemical Data

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch015 ◽

2011 ◽

pp. 289-303

Author(s):

Michael Schmuker ◽

Gisbert Schneider

Keyword(s):

Olfactory System ◽

Chemical Space ◽

Effective Means ◽

Principal Component ◽

Chemical Data ◽

Chemical Stimulus ◽

Second Stage ◽

Learning Classifier ◽

Component Scores

The purpose of the olfactory system is to encode and classify odorants. Hence, its circuits have likely evolved to cope with this task in an efficient, quasi-optimal manner. In this chapter the authors present a three-step approach that emulate neurocomputational principles of the olfactory system to encode, transform and classify chemical data. In the first step, the original chemical stimulus space is encoded by virtual receptors. In the second step, the signals from these receptors are decorrelated by correlation-dependent lateral inhibition. The third step mimics olfactory scent perception by a machine learning classifier. The authors observed that the accuracy of scent prediction is significantly improved by decorrelation in the second stage. Moreover, they found that although the data transformation they propose is suited for dimensionality reduction, it is more robust against overdetermined data than principal component scores. The authors successfully used our method to predict bioactivity of drug-like compounds, demonstrating that it can provide an effective means to connect chemical space with biological activity.

Download Full-text

Learning and Prediction of Complex Molecular Structure-Property Relationships

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch013 ◽

2011 ◽

pp. 257-273

Author(s):

Rahul Singh

Keyword(s):

Learning Strategy ◽

Molecular Descriptors ◽

Complex Structure ◽

Complex Nature ◽

Structure Property ◽

Structural Variations ◽

Drug Molecules ◽

Structure Property Relationships ◽

Error Measures ◽

Experimental Process

The problem of modeling and predicting complex structure-property relationships, such as the absorption, distribution, metabolism, and excretion of putative drug molecules is a fundamental one in contemporary drug discovery. An accurate model can not only be used to predict the behavior of a molecule and understand how structural variations may influence molecular property, but also to identify regions of molecular space that hold promise in context of a specific investigation. However, a variety of factors contribute to the difficulty of constructing robust structure activity models for such complex properties. These include conceptual issues related to how well the true bio-chemical property is accounted for by formulation of the specific learning strategy, algorithmic issues associated with determining the proper molecular descriptors, access to small quantities of data, possibly on tens of molecules only, due to the high cost and complexity of the experimental process, and the complex nature of bio-chemical phenomena underlying the data. This chapter attempts to address this problem from the rudiments: the authors first identify and discuss the salient computational issues that span (and complicate) structure-property modeling formulations and present a brief review of the state-of-the-art. The authors then consider a specific problem: that of modeling intestinal drug absorption, where many of the aforementioned factors play a role. In addressing them, their solution uses a novel characterization of molecular space based on the notion of surface-based molecular similarity. This is followed by identifying a statistically relevant set of molecular descriptors, which along with an appropriate machine learning technique, is used to build the structure-property model. The authors propose simultaneous use of both ratio and ordinal error-measures for model construction and validation. The applicability of the approach is demonstrated in a real world case study.

Download Full-text

Protein Homology Analysis for Function Prediction with Parallel Sub-Graph Isomorphism

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch007 ◽

2011 ◽

pp. 129-144

Author(s):

Alper Küçükural ◽

Andras Szilagyi ◽

O. Ugur Sezerman ◽

Yang Zhang

Keyword(s):

Protein Function ◽

Graph Matching ◽

Protein Function Prediction ◽

3D Structure ◽

Graph Isomorphism ◽

Protein Molecule ◽

Function Prediction ◽

Homologous Protein ◽

Np Hard Problem ◽

Speed Up

To annotate the biological function of a protein molecule, it is essential to have information on its 3D structure. Many successful methods for function prediction are based on determining structurally conserved regions because the functional residues are proved to be more conservative than others in protein evolution. Since the 3D conformation of a protein can be represented by a contact map graph, graph matching, algorithms are often employed to identify the conserved residues in weakly homologous protein pairs. However, the general graph matching algorithm is computationally expensive because graph similarity searching is essentially a NP-hard problem. Parallel implementations of the graph matching are often exploited to speed up the process. In this chapter,the authors review theoretical and computational approaches of graph theory and the recently developed graph matching algorithms for protein function prediction.

Download Full-text

Chemoinformatics and Advanced Machine Learning Perspectives
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Prediction of Compound-Protein Interactions with Machine Learning Methods

Graph Mining in Chemoinformatics

Chemoinformatics on Metabolic Pathways

Learning Methodologies for Detection and Classification of Mutagens

Application of Machine Leaning in Drug Discovery and Development

Advanced PLS Techniques in Chemometrics and Their Applications to Molecular Design

Optimal Assignment Kernels for ADME in Silico Prediction

Brain-like Processing and Classification of Chemical Data

Learning and Prediction of Complex Molecular Structure-Property Relationships

Protein Homology Analysis for Function Prediction with Parallel Sub-Graph Isomorphism

Export Citation Format

Chemoinformatics and Advanced Machine Learning PerspectivesLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Prediction of Compound-Protein Interactions with Machine Learning Methods

Graph Mining in Chemoinformatics

Chemoinformatics on Metabolic Pathways

Learning Methodologies for Detection and Classification of Mutagens

Application of Machine Leaning in Drug Discovery and Development

Advanced PLS Techniques in Chemometrics and Their Applications to Molecular Design

Optimal Assignment Kernels for ADME in Silico Prediction

Brain-like Processing and Classification of Chemical Data

Learning and Prediction of Complex Molecular Structure-Property Relationships

Protein Homology Analysis for Function Prediction with Parallel Sub-Graph Isomorphism

Chemoinformatics and Advanced Machine Learning Perspectives
Latest Publications