Abstract
Challenging searching mechanisms are required to cater to the needs of search engine users in probing the voluminous web database. Searching the query matching keyword based on a probabilistic approach is attractive in most of the application areas, viz. spell checking and data cleaning, because it allows approximate search. A probabilistic approach with maximum likelihood estimation is used to handle real-world problems; however, it suffers from overfitting data. In this paper, a rule-based approach is presented for keyword searching. The process consists of two phases called the rule generation phase and the learning phase. The rule generation phase uses a new technique called N-Gram based Edit distance (NGE) to generate the rule dictionary. The Turing machine model is implemented to describe the rule generation using the NGE technique. In the learning phase, a log model with maximum-a-posterior estimation is used to select the best rule. When evaluated in real time, our system produces the best result in terms of efficiency and accuracy.