Text about genes can be effectively leveraged to enhance sequence analysis (MacCallum, Kelley et al. 2000; Chang, Raychaudhuri et al. 2001; McCallum and Ganesh 2003; Eskin and Agichtein 2004; Tu, Tang et al. 2004). Most of the emerging methods utilize textual representations similar to the one we introduced in the previous chapter. To analyze sequences, a numeric vector that contains information about the counts of different words in references about that sequence can be used in conjunction with the actual sequence information. Experienced biologists understand the value of using the information in scientific text during sequence searches, and commonly use scientific text and annotations to guide their intuition. For example, after a quick BLAST search, a trained expert might quickly look over the hits and their associated annotations and literature references and assess the validity of the hits. The apparently valid sequence hits can then be used to draw conclusions about the query sequence by transferring information from the hits. In most cases, the text serves as a proxy for structured functional information. High quality functional annotations that succinctly and thoroughly describe the function of a protein are often unavailable. Defining appropriate keywords for a protein requires a considerable amount of effort and expertise, and in most cases, the results are incomplete as there is an evergrowing collection of knowledge about proteins. So, one option is to use text to compare the biological function of different sequences instead. There are different ways in which the functional information in text could be used in the context of sequence analysis. One possibility is to first run a sequence analysis algorithm, and then to use text profiles to summarize or organize results. Functional keywords can be assigned to the whole group of hit sequences. Additionally, given a series of sequences, they can be grouped according to like function. In either case, quick assessment of the content of text associated with sequences offers insight about exactly what we are seeing. These approaches are particularly useful if we are querying a large database of sequences with a novel sequence that we have very little information about.