A FAST TECHNIQUE FOR DERIVING FREQUENT STRUCTURED PATTERNS FROM BIOLOGICAL DATA SETS
In the last years, the completion of the human genome sequencing showed a wide range of new challenging issues involving raw data analysis. In particular, the discovery of information implicitly encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually represented by patterns frequently occurring in the sequences. Because of biological observations, a specific class of patterns is becoming particularly interesting: frequent structured patterns. In this respect, it is biologically meaningful to look at both "exact" and "approximate" repetitions of pattens within the available sequences. This paper gives a contribution in this setting by providing algorithms which allow to discover frequent structured patterns, both in "exact" and "approximate" form, present in a collection of input biological sequences.