Comparative study of clustering techniques for short text documents

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.

Download Full-text

Wind turbine power output very short-term forecast: A comparative study of data clustering techniques in a PSO-ANFIS model

Journal of Cleaner Production ◽

10.1016/j.jclepro.2020.120135 ◽

2020 ◽

Vol 254 ◽

pp. 120135 ◽

Cited By ~ 15

Author(s):

Paul A. Adedeji ◽

Stephen Akinlabi ◽

Nkosinathi Madushele ◽

Obafemi O. Olatunji

Keyword(s):

Comparative Study ◽

Wind Turbine ◽

Power Output ◽

Data Clustering ◽

Short Term ◽

Anfis Model ◽

Clustering Techniques ◽

Short Term Forecast ◽

Term Forecast

Download Full-text

Research of Clustering Algorithms using Enhanced Feature Selection

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b5115.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4612-4615

Keyword(s):

Feature Selection ◽

Text Mining ◽

Clustering Algorithms ◽

Present Situation ◽

Features Selection ◽

Entity Extraction ◽

Text Documents ◽

Clustering Techniques ◽

Selection For ◽

Video And Audio

In Present situation, a huge quantity of data is recorded in variety of forms like text, image, video, and audio and is estimated to enhance in future. The major tasks related to text are entity extraction, information extraction, entity relation modeling, document summarization are performed by using text mining. This paper main focus is on document clustering, a sub task of text mining and to measure the performance of different clustering techniques. In this paper we are using an enhanced features selection for clustering of text documents to prove that it produces better results compared to traditional feature selection.

Download Full-text

A Comparative Study of KBS, ANN and Statistical Clustering Techniques for Unattended Stellar Classification

Lecture Notes in Computer Science - Progress in Pattern Recognition, Image Analysis and Applications ◽

10.1007/11578079_59 ◽

2005 ◽

pp. 566-577 ◽

Cited By ~ 5

Author(s):

Carlos Dafonte ◽

Alejandra Rodríguez ◽

Bernardino Arcay ◽

Iciar Carricajo ◽

Minia Manteiga

Keyword(s):

Comparative Study ◽

Clustering Techniques ◽

Statistical Clustering

Download Full-text

Ensemble of Classifiers and Term Weighting Schemes for Sentiment Analysis in Turkish

10.52460/src.2021.004 ◽

2021 ◽

Vol 1 (1) ◽

pp. 1-12

Author(s):

Aytuğ Onan ◽

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Text Messages ◽

Support Vector ◽

K Nearest Neighbor ◽

Term Weighting ◽

Text Documents ◽

Weighting Schemes ◽

Short Text

With the advancement of information and communication technology, social networking and microblogging sites have become a vital source of information. Individuals can express their opinions, grievances, feelings, and attitudes about a variety of topics. Through microblogging platforms, they can express their opinions on current events and products. Sentiment analysis is a significant area of research in natural language processing because it aims to define the orientation of the sentiment contained in source materials. Twitter is one of the most popular microblogging sites on the internet, with millions of users daily publishing over one hundred million text messages (referred to as tweets). Choosing an appropriate term representation scheme for short text messages is critical. Term weighting schemes are critical representation schemes for text documents in the vector space model. We present a comprehensive analysis of Turkish sentiment analysis using nine supervised and unsupervised term weighting schemes in this paper. The predictive efficiency of term weighting schemes is investigated using four supervised learning algorithms (Naive Bayes, support vector machines, the k-nearest neighbor algorithm, and logistic regression) and three ensemble learning methods (AdaBoost, Bagging, and Random Subspace). The empirical evidence suggests that supervised term weighting models can outperform unsupervised term weighting models.

Download Full-text

Semi-supervised clustering techniques for categorization of text documents

10.32657/10356/65400 ◽

2015 ◽

Author(s):

Yang Yan

Keyword(s):

Text Documents ◽

Clustering Techniques ◽

Supervised Clustering

Download Full-text

Applications of Clustering Techniques in Data Mining: A Comparative Study

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2020.0111218 ◽

2020 ◽

Vol 11 (12) ◽

Author(s):

Muhammad Faizan ◽

Megat F. ◽

Shahrinaz Ismail ◽

Sara Sultan

Keyword(s):

Data Mining ◽

Comparative Study ◽

Clustering Techniques

Download Full-text

Graph-Based Concept Clustering for Web Search Results

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v5i6.pp1536-1544 ◽

2015 ◽

Vol 5 (6) ◽

pp. 1536 ◽

Cited By ~ 1

Author(s):

Supakpong Jinarat ◽

Choochart Haruechaiyasak ◽

Arnon Rungsawang

Keyword(s):

Search Engine ◽

Web Search ◽

Knowledge Source ◽

Short Text ◽

Clustering Techniques ◽

External Knowledge ◽

Search Results ◽

Clustering Quality ◽

Informative Content ◽

Search Results Clustering

A search engine usually returns a long list of web search results corresponding to a query from the user. Users must spend a lot of time for browsing and navigating the search results for the relevant results. Many research works applied the text clustering techniques, called web search results clustering, to handle the problem. Unfortunately, search result document returned from search engine is a very short text. It is difficult to cluster related documents into the same group because a short document has low informative content. In this paper, we proposed a method to cluster the web search results with high clustering quality using graph-based clustering with concept which extract from the external knowledge source. The main idea is to expand the original search results with some related concept terms. We applied the Wikipedia as the external knowledge source for concept extraction. We compared the clustering results of our proposed method with two well-known search results clustering techniques, Suffix Tree Clustering and Lingo. The experimental results showed that our proposed method significantly outperforms over the well-known clustering techniques.

Download Full-text