Advanced Data Mining Technologies in Bioinformatics

Comparative Genome Annotation Systems

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch016 ◽

2011 ◽

pp. 296-313

Author(s):

Kwangmin Choi ◽

Sun Kim

Keyword(s):

Data Mining ◽

Sequence Analysis ◽

Genome Annotation ◽

Genome Comparison ◽

Comparative Genome ◽

A Genome ◽

Multiple Genomes

Understanding the genetic content of a genome is a very important but challenging task. One of the most effective methods to annotate a genome is to compare it to the genomes that are already sequenced and annotated. This chapter is to survey systems that can be used for annotating genomes by comparing multiple genomes and discusses important issues in designing genome comparison systems such as extensibility, scalability, reconfigurability, flexibility, usability, and data mining functionality. We also discuss briefly further issues in developing genome comparison systems where users can perform genome comparison flexibly on the sequence analysis level.

Differential Association Rules

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch014 ◽

2011 ◽

pp. 269-282

Author(s):

Christopher Besemann ◽

Anne Denton ◽

Ajay Yekkirala ◽

Ron Hutchison ◽

Marc Anderson

Keyword(s):

Data Mining ◽

Association Rules ◽

Interaction Networks ◽

Differential Association ◽

Interacting Proteins ◽

Data Mining Techniques

In this chapter, we discuss the use of differential association rules to study the annotations of proteins in one or more interaction networks. Using this technique, we find the differences in the annotations of interacting proteins in a network. We extend the concept to compare annotations of interacting proteins across different definitions of interaction networks. Both cases reveal instances of rules that explain known and unknown characteristics of the network(s). By taking advantage of such data mining techniques, a large number of interesting patterns can be effectively explored that otherwise would not be.

In Silico Recognition of Protein-Protein Interaction

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch013 ◽

2011 ◽

pp. 248-268

Author(s):

Byung-Hoon Park ◽

Phuongan Dam ◽

Chongle Pan ◽

Ying Xu ◽

Al Geist ◽

...

Keyword(s):

Protein Interactions ◽

Regulatory Networks ◽

Machine Learning Techniques ◽

Translation Regulation ◽

Future Research ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Protein Levels ◽

Protein Functions ◽

Model Protein

Protein-protein interactions are fundamental to cellular processes. They are responsible for phenomena like DNA replication, gene transcription, protein translation, regulation of metabolic pathways, immunologic recognition, signal transduction, etc. The identification of interacting proteins is therefore an important prerequisite step in understanding their physiological functions. Due to the invaluable importance to various biophysical activities, reliable computational methods to infer protein-protein interactions from either structural or genome sequences are in heavy demand lately. Successful predictions, for instance, will facilitate a drug design process and the reconstruction of metabolic or regulatory networks. In this chapter, we review: (a) high-throughput experimental methods for identification of protein-protein interactions, (b) existing databases of protein-protein interactions, (c) computational approaches to predicting protein-protein interactions at both residue and protein levels, (d) various statistical and machine learning techniques to model protein-protein interactions, and (e) applications of protein-protein interactions in predicting protein functions. We also discuss intrinsic drawbacks of the existing approaches and future research directions.

A Haplotype Analysis System for Genes Discovery of Common Diseases

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch011 ◽

2011 ◽

pp. 214-230

Author(s):

Takashi Kido

Keyword(s):

Haplotype Analysis ◽

Complex Disease ◽

Association Studies ◽

Genetic Association Studies ◽

Current Theory ◽

Theory And Practice ◽

Discovery Research ◽

Common Diseases ◽

The Common ◽

Analysis System

This chapter introduces computational methods for detecting complex disease loci with haplotype analysis. It argues that the haplotype analysis, which plays a major role in the study of population genetics, can be computationally modeled and systematically implemented as a means for detecting causative genes of complex diseases. In this chapter, the author provides a review of issues on haplotype analysis and proposes the analysis system which integrates a comprehensive spectrum of functions on haplotype analysis for supporting disease association studies. The explanation of the system and some real examples of the haplotype analysis will not only provide researchers with better understanding of current theory and practice of genetic association studies, but also present a computational perspective on the gene discovery research for the common diseases.

Hierarchical Profiling, Scoring and Applications in Bioinformatics

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch002 ◽

2011 ◽

pp. 13-31

Author(s):

Li Liao

Keyword(s):

Biological Data ◽

Classification Methods ◽

Homology Detection ◽

Domain Specific ◽

Wide Range ◽

Recent Developments ◽

Potential Applications ◽

Domain Specific Knowledge ◽

Clustering And Classification ◽

Biological Entities

Recently, clustering and classification methods have seen many applications in bioinformatics. Some are simply straightforward applications of existing techniques, but most have been adapted to cope with peculiar features of the biological data. Many biological data take a form of vectors, whose components correspond to attributes characterizing the biological entities being studied. Comparing these vectors, aka profiles, are a crucial step for most clustering and classification methods. We review the recent developments related to hierarchical profiling where the attributes are not independent, but rather are correlated in a hierarchy. Hierarchical profiling arises in a wide range of bioinformatics problems, including protein homology detection, protein family classification, and metabolic pathway clustering. We discuss in detail several clustering and classification methods where hierarchical correlations are tackled in effective and efficient ways, by incorporation of domain-specific knowledge. Relations to other statistical learning methods and more potential applications are also discussed.

Mining BioLiterature

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch015 ◽

2011 ◽

pp. 283-295

Author(s):

Francisco M. Couto ◽

Mario J. Silva

Keyword(s):

Text Mining ◽

Domain Knowledge ◽

Text Processing ◽

Biological Research ◽

Special Focus ◽

Biological Databases ◽

Protein Annotation ◽

Open Problems ◽

Efficient Integration ◽

Mining Tools

This chapter introduces the use of Text Mining in scientific literature for biological research, with a special focus on automatic gene and protein annotation. This field became recently a major topic in Bioinformatics, motivated by the opportunity brought by tapping the BioLiterature with automatic text processing software. The chapter describes the main approaches adopted and analyzes systems that have been developed for automatically annotating genes or proteins. To illustrate how text-mining tools fit in biological databases curation processes, the chapter presents a tool that assists protein annotation. Besides the promising advances of Text Mining of BioLiterature, many problems need to be addressed. This chapter presents the main open problems in using text-mining tools for automatic annotation of genes and proteins, and discusses how a more efficient integration of existing domain knowledge can improve the performance of these tools.

A Bayesian Framework for Improving Clustering Accuracy of Protein Sequences Based on Association Rules

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch012 ◽

2011 ◽

pp. 231-247

Author(s):

Peng-Yeng yin ◽

Shyong-Jian Shyu ◽

Guan-Shieng Huang ◽

Shuang-Te Liao

Keyword(s):

Data Mining ◽

Association Rules ◽

Phylogenetic Analyses ◽

Clustering Algorithms ◽

Protein Sequences ◽

Bayesian Framework ◽

Biological Data ◽

Data Mining Techniques ◽

Analysis Task ◽

Using Data

With the advent of new sequencing technology for biological data, the number of sequenced proteins stored in public databases has become an explosion. The structural, functional, and phylogenetic analyses of proteins would benefit from exploring databases by using data mining techniques. Clustering algorithms can assign proteins into clusters such that proteins in the same cluster are more similar in homology than those in different clusters. This procedure not only simplifies the analysis task but also enhances the accuracy of the results. Most of the existing protein-clustering algorithms compute the similarity between proteins based on one-to-one pairwise sequence

Paramaterless Clustering Techniques for Gene Expression Analysis

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch009 ◽

2011 ◽

pp. 155-173

Author(s):

Vincent S. Tseng ◽

Ching-Pin Kao

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Analysis ◽

Gene Expression Analysis ◽

In Silico Analysis ◽

Data Sets ◽

Expression Data ◽

Clustering Methods ◽

High Quality ◽

Or Gene

In recent years, clustering analysis has even become a valuable and useful tool for in-silico analysis of microarray or gene expression data. Although a number of clustering methods have been proposed, they are confronted with difficulties in meeting the requirements of automation, high quality, and high efficiency at the same time. In this chapter, we discuss the issue of parameterless clustering technique for gene expression analysis. We introduce two novel, parameterless and efficient clustering methods that fit for analysis of gene expression data. The unique feature of our methods is they incorporate the validation techniques into the clustering process so that high quality results can be obtained. Through experimental evaluation, these methods are shown to outperform other clustering methods greatly in terms of clustering quality, efficiency, and automation on both of synthetic and real data sets.

Algorithmic Aspects of Protein Threading

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch007 ◽

2011 ◽

pp. 118-135

Author(s):

Tatsuya Akutsu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Three Dimensional ◽

Structure Alignment ◽

Protein Structure Alignment ◽

Amino Acid Residues ◽

Optimal Solutions ◽

Score Functions ◽

Protein Threading

This chapter provides an overview of computational problems and techniques for protein threading. Protein threading is one of the most powerful approaches to protein structure prediction, where protein structure prediction is to infer three-dimensional (3-D) protein structure for a given protein sequence. Protein threading can be modeled as an optimization problem. Optimal solutions can be obtained in polynomial time using simple dynamic programming algorithms if profile type score functions are employed. However, this problem is computationally hard (NP-hard) if score functions include pairwise interaction preferences between amino acid residues. Therefore, various algorithms have been developed for finding optimal or near-optimal solutions. This chapter explains the ideas employed in these algorithms. This chapter also gives brief explanations of related problems: protein threading with constraints, comparison of RNA secondary structures and protein structure alignment.

DNA Sequence Visualization

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch004 ◽

2011 ◽

pp. 63-84

Author(s):

Hsuan T. Chang

Keyword(s):

Signal Processing ◽

Feature Extraction ◽

Dna Sequence ◽

Sequence Alignment ◽

Dna Sequences ◽

Graphical Representation ◽

Visualization Process ◽

Potential Applications ◽

Signal Processing Algorithms ◽

Processing Algorithms

This chapter introduces various visualization (i.e., graphical representation) schemes of symbolic DNA sequences, which are basically represented by character strings in conventional sequence databases. Several visualization schemes are reviewed and their characterizations are summarized for comparison. Moreover, further potential applications based on the visualized sequences are discussed. By understanding the visualization process, the researchers will be able to analyze DNA sequences by designing signal processing algorithms for specific purposes such as sequence alignment, feature extraction, and sequence clustering, etc.

Advanced Data Mining Technologies in Bioinformatics
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Comparative Genome Annotation Systems

Differential Association Rules

In Silico Recognition of Protein-Protein Interaction

A Haplotype Analysis System for Genes Discovery of Common Diseases

Hierarchical Profiling, Scoring and Applications in Bioinformatics

Mining BioLiterature

A Bayesian Framework for Improving Clustering Accuracy of Protein Sequences Based on Association Rules

Paramaterless Clustering Techniques for Gene Expression Analysis

Algorithmic Aspects of Protein Threading

DNA Sequence Visualization

Export Citation Format

Advanced Data Mining Technologies in BioinformaticsLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Comparative Genome Annotation Systems

Differential Association Rules

In Silico Recognition of Protein-Protein Interaction

A Haplotype Analysis System for Genes Discovery of Common Diseases

Hierarchical Profiling, Scoring and Applications in Bioinformatics

Mining BioLiterature

A Bayesian Framework for Improving Clustering Accuracy of Protein Sequences Based on Association Rules

Paramaterless Clustering Techniques for Gene Expression Analysis

Algorithmic Aspects of Protein Threading

DNA Sequence Visualization

Advanced Data Mining Technologies in Bioinformatics
Latest Publications