Some Applications to Computational Biology
This chapter considers some applications of Markov processes and hidden Markov processes to computational biology. It introduces three important problems, namely: sequence alignment, the gene-finding problem, and protein classification. After providing an overview of some relevant aspects of biology, the chapter examines the problem of optimal gapped alignment between two sequences. This is a way to detect similarity between two sequences over a common alphabet, such as the four-symbol alphabet of nucleotides, or the 20-symbol alphabet of amino acids. The chapter proceeds by discussing some widely used algorithms for finding genes from DNA sequences (genomes), including the GLIMMER algorithm and the GENSCAN algorithm. Finally, it describes a special type of hidden Markov model termed profile hidden Markov model, which is commonly used to classify proteins into a small number of groups.