Advanced Data Mining Technologies in Bioinformatics
Latest Publications


TOTAL DOCUMENTS

31
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

Published By IGI Global

9781591408635, 9781591408659

Author(s):  
Kwangmin Choi ◽  
Sun Kim

Understanding the genetic content of a genome is a very important but challenging task. One of the most effective methods to annotate a genome is to compare it to the genomes that are already sequenced and annotated. This chapter is to survey systems that can be used for annotating genomes by comparing multiple genomes and discusses important issues in designing genome comparison systems such as extensibility, scalability, reconfigurability, flexibility, usability, and data mining functionality. We also discuss briefly further issues in developing genome comparison systems where users can perform genome comparison flexibly on the sequence analysis level.


Author(s):  
Christopher Besemann ◽  
Anne Denton ◽  
Ajay Yekkirala ◽  
Ron Hutchison ◽  
Marc Anderson

In this chapter, we discuss the use of differential association rules to study the annotations of proteins in one or more interaction networks. Using this technique, we find the differences in the annotations of interacting proteins in a network. We extend the concept to compare annotations of interacting proteins across different definitions of interaction networks. Both cases reveal instances of rules that explain known and unknown characteristics of the network(s). By taking advantage of such data mining techniques, a large number of interesting patterns can be effectively explored that otherwise would not be.


Author(s):  
Byung-Hoon Park ◽  
Phuongan Dam ◽  
Chongle Pan ◽  
Ying Xu ◽  
Al Geist ◽  
...  

Protein-protein interactions are fundamental to cellular processes. They are responsible for phenomena like DNA replication, gene transcription, protein translation, regulation of metabolic pathways, immunologic recognition, signal transduction, etc. The identification of interacting proteins is therefore an important prerequisite step in understanding their physiological functions. Due to the invaluable importance to various biophysical activities, reliable computational methods to infer protein-protein interactions from either structural or genome sequences are in heavy demand lately. Successful predictions, for instance, will facilitate a drug design process and the reconstruction of metabolic or regulatory networks. In this chapter, we review: (a) high-throughput experimental methods for identification of protein-protein interactions, (b) existing databases of protein-protein interactions, (c) computational approaches to predicting protein-protein interactions at both residue and protein levels, (d) various statistical and machine learning techniques to model protein-protein interactions, and (e) applications of protein-protein interactions in predicting protein functions. We also discuss intrinsic drawbacks of the existing approaches and future research directions.


Author(s):  
Takashi Kido

This chapter introduces computational methods for detecting complex disease loci with haplotype analysis. It argues that the haplotype analysis, which plays a major role in the study of population genetics, can be computationally modeled and systematically implemented as a means for detecting causative genes of complex diseases. In this chapter, the author provides a review of issues on haplotype analysis and proposes the analysis system which integrates a comprehensive spectrum of functions on haplotype analysis for supporting disease association studies. The explanation of the system and some real examples of the haplotype analysis will not only provide researchers with better understanding of current theory and practice of genetic association studies, but also present a computational perspective on the gene discovery research for the common diseases.


Author(s):  
Li Liao

Recently, clustering and classification methods have seen many applications in bioinformatics. Some are simply straightforward applications of existing techniques, but most have been adapted to cope with peculiar features of the biological data. Many biological data take a form of vectors, whose components correspond to attributes characterizing the biological entities being studied. Comparing these vectors, aka profiles, are a crucial step for most clustering and classification methods. We review the recent developments related to hierarchical profiling where the attributes are not independent, but rather are correlated in a hierarchy. Hierarchical profiling arises in a wide range of bioinformatics problems, including protein homology detection, protein family classification, and metabolic pathway clustering. We discuss in detail several clustering and classification methods where hierarchical correlations are tackled in effective and efficient ways, by incorporation of domain-specific knowledge. Relations to other statistical learning methods and more potential applications are also discussed.


Author(s):  
Francisco M. Couto ◽  
Mario J. Silva

This chapter introduces the use of Text Mining in scientific literature for biological research, with a special focus on automatic gene and protein annotation. This field became recently a major topic in Bioinformatics, motivated by the opportunity brought by tapping the BioLiterature with automatic text processing software. The chapter describes the main approaches adopted and analyzes systems that have been developed for automatically annotating genes or proteins. To illustrate how text-mining tools fit in biological databases curation processes, the chapter presents a tool that assists protein annotation. Besides the promising advances of Text Mining of BioLiterature, many problems need to be addressed. This chapter presents the main open problems in using text-mining tools for automatic annotation of genes and proteins, and discusses how a more efficient integration of existing domain knowledge can improve the performance of these tools.


Author(s):  
Peng-Yeng yin ◽  
Shyong-Jian Shyu ◽  
Guan-Shieng Huang ◽  
Shuang-Te Liao

With the advent of new sequencing technology for biological data, the number of sequenced proteins stored in public databases has become an explosion. The structural, functional, and phylogenetic analyses of proteins would benefit from exploring databases by using data mining techniques. Clustering algorithms can assign proteins into clusters such that proteins in the same cluster are more similar in homology than those in different clusters. This procedure not only simplifies the analysis task but also enhances the accuracy of the results. Most of the existing protein-clustering algorithms compute the similarity between proteins based on one-to-one pairwise sequence


Author(s):  
Vincent S. Tseng ◽  
Ching-Pin Kao

In recent years, clustering analysis has even become a valuable and useful tool for in-silico analysis of microarray or gene expression data. Although a number of clustering methods have been proposed, they are confronted with difficulties in meeting the requirements of automation, high quality, and high efficiency at the same time. In this chapter, we discuss the issue of parameterless clustering technique for gene expression analysis. We introduce two novel, parameterless and efficient clustering methods that fit for analysis of gene expression data. The unique feature of our methods is they incorporate the validation techniques into the clustering process so that high quality results can be obtained. Through experimental evaluation, these methods are shown to outperform other clustering methods greatly in terms of clustering quality, efficiency, and automation on both of synthetic and real data sets.


Author(s):  
Tatsuya Akutsu

This chapter provides an overview of computational problems and techniques for protein threading. Protein threading is one of the most powerful approaches to protein structure prediction, where protein structure prediction is to infer three-dimensional (3-D) protein structure for a given protein sequence. Protein threading can be modeled as an optimization problem. Optimal solutions can be obtained in polynomial time using simple dynamic programming algorithms if profile type score functions are employed. However, this problem is computationally hard (NP-hard) if score functions include pairwise interaction preferences between amino acid residues. Therefore, various algorithms have been developed for finding optimal or near-optimal solutions. This chapter explains the ideas employed in these algorithms. This chapter also gives brief explanations of related problems: protein threading with constraints, comparison of RNA secondary structures and protein structure alignment.


Author(s):  
Hsuan T. Chang

This chapter introduces various visualization (i.e., graphical representation) schemes of symbolic DNA sequences, which are basically represented by character strings in conventional sequence databases. Several visualization schemes are reviewed and their characterizations are summarized for comparison. Moreover, further potential applications based on the visualized sequences are discussed. By understanding the visualization process, the researchers will be able to analyze DNA sequences by designing signal processing algorithms for specific purposes such as sequence alignment, feature extraction, and sequence clustering, etc.


Sign in / Sign up

Export Citation Format

Share Document