A computational analysis of short sentences based on ensemble similarity model
<p>The rapid development of Internet along with the wide use of social media applications produce huge volume of unstructured data in short text form such as tweets, text snippets and instant messages. This form of data rarely contains repeated word. It presents challenge in sentences similarity analysis as the standard text similarity models merely rely on the number of word occurrence, often resulting unreliable similarity value. Besides, the use of abbreviation, acronyms, slang, smiley, jargon, symbol or non-standard short form also contributes to the difficulty in similarity analysis. Thus, an extended ensemble similarity model approach is proposed. An experimental study has been conducted using datasets of English short sentences. The findings are very encouraging in improving the similarity value for short sentences.</p>