A New Approach Using Hidden Markov Model and Bayesian Method for Estimate of Word Types in Text Mining
Determining the structure of words in the text for the operations such as automated information extraction and text summarization of the text is essential. In computers, textual analysis to define the type of the word is considered as a vital advantage. Defining the types of words provides an estimate of the sequence of words in the sentence. In this article, estimating types of Turkish words is provided by developing a Hidden Markov Model and a Bayesian-based new model. In this model, an algorithm is developed which separates the suffixes of the words and grouping the words by counts of characters that suffixes of the words receive. A text composed of 584 Turkish words is used for the testing the dependability of the model. The model has achieved a high success rate in predicting the types of Turkish words.