Comparative study of clustering techniques for short text documents

Author(s):  
Aniket Rangrej ◽  
Sayali Kulkarni ◽  
Ashish V. Tendulkar
2011 ◽  
Vol 268-270 ◽  
pp. 697-700
Author(s):  
Rui Xue Duan ◽  
Xiao Jie Wang ◽  
Wen Feng Li

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.


In Present situation, a huge quantity of data is recorded in variety of forms like text, image, video, and audio and is estimated to enhance in future. The major tasks related to text are entity extraction, information extraction, entity relation modeling, document summarization are performed by using text mining. This paper main focus is on document clustering, a sub task of text mining and to measure the performance of different clustering techniques. In this paper we are using an enhanced features selection for clustering of text documents to prove that it produces better results compared to traditional feature selection.


2021 ◽  
Vol 1 (1) ◽  
pp. 1-12
Author(s):  
Aytuğ Onan ◽  

With the advancement of information and communication technology, social networking and microblogging sites have become a vital source of information. Individuals can express their opinions, grievances, feelings, and attitudes about a variety of topics. Through microblogging platforms, they can express their opinions on current events and products. Sentiment analysis is a significant area of research in natural language processing because it aims to define the orientation of the sentiment contained in source materials. Twitter is one of the most popular microblogging sites on the internet, with millions of users daily publishing over one hundred million text messages (referred to as tweets). Choosing an appropriate term representation scheme for short text messages is critical. Term weighting schemes are critical representation schemes for text documents in the vector space model. We present a comprehensive analysis of Turkish sentiment analysis using nine supervised and unsupervised term weighting schemes in this paper. The predictive efficiency of term weighting schemes is investigated using four supervised learning algorithms (Naive Bayes, support vector machines, the k-nearest neighbor algorithm, and logistic regression) and three ensemble learning methods (AdaBoost, Bagging, and Random Subspace). The empirical evidence suggests that supervised term weighting models can outperform unsupervised term weighting models.


Author(s):  
Supakpong Jinarat ◽  
Choochart Haruechaiyasak ◽  
Arnon Rungsawang

A search engine usually returns a long list of web search results corresponding to a query from the user. Users must spend a lot of time for browsing and navigating the search results for the relevant results. Many research works applied the text clustering techniques, called web search results clustering, to handle the problem. Unfortunately, search result document returned from search engine is a very short text. It is difficult to cluster related documents into the same group because a short document has low informative content. In this paper, we proposed a method to cluster the web search results with high clustering quality using graph-based clustering with concept which extract from the external knowledge source. The main idea is to expand the original search results with some related concept terms. We applied the Wikipedia as the external knowledge source for concept extraction. We compared the clustering results of our proposed method with two well-known search results clustering techniques, Suffix Tree Clustering and Lingo. The experimental results showed that our proposed method significantly outperforms over the well-known clustering techniques.


Sign in / Sign up

Export Citation Format

Share Document