text clustering Latest Research Papers

Arabic text clustering technique to improve information retrieval

10.1063/5.0066837 ◽

2022 ◽

Author(s):

Ahmed H. Aliwy ◽

Kadhim B. S. Aljanabi ◽

Huda A. Alameen

Keyword(s):

Information Retrieval ◽

Text Clustering ◽

Arabic Text ◽

Clustering Technique

TEXT CLUSTERING APPROACH TOWARD COMMUNITY EXPECTATIONS TO THE BUS RAPID TRANSIT (BRT) TRANSJATENG PURWOKERTO-PURBALINGGA OPERATIONS

Jurnal Sosioteknologi ◽

10.5614/sostek.itbj.2021.20.3.1 ◽

2021 ◽

Vol 20 (3) ◽

pp. 288-298

Author(s):

Famila Dwi Winati ◽

Fauzan Romadlon

Keyword(s):

Public Opinion ◽

Public Transportation ◽

Urban Areas ◽

Social Economic ◽

Text Clustering ◽

Environmental Benefits ◽

Bus Rapid Transit ◽

Rapid Transit ◽

Public Expectations ◽

Clustering Approach

Bus Rapid Transit (BRT) is one of the alternative public transportations in urban areas, which has begun to be implemented in some cities of Indonesia. By finding out the effectiveness of BRT as a mass transportation system, it is necessary to study the expectations of users and non-users of the Trans Jateng Purwokerto-Purbalingga BRT regarding the perceived social, economic, and environmental impacts. This study uses the text Clustering method to group public opinion based on similarities so that it can be analyzed further for policymaking. As a result, the majority of the community gave positive expectations of BRT implementation’s perceived social, economic, and environmental benefits. On the other hand, public opinion on the presence of BRT is not always positive and has a significant impact. Improvements are needed in several aspects that are considered not to meet public expectations to maximize the function of BRT as a substitute for public transportation for private vehicles.

Mini-Batch k-Means versus k-Means to Cluster English Tafseer Text: View of Al-Baqarah Chapter

Journal of Quranic Sciences and Research ◽

10.30880/jqsr.2021.02.02.006 ◽

2021 ◽

Vol 02 (02) ◽

Author(s):

Mohammed A. Ahmed ◽

◽

Hanif Baharin ◽

Puteri N. E. Nohuddin ◽

◽

...

Keyword(s):

Three Dimensional ◽

Principal Component ◽

Text Clustering ◽

Mining Method ◽

Inverse Document Frequency ◽

Reference Guide ◽

Document Frequency ◽

The World ◽

Implementation Time ◽

Cluster Categories

Al-Quran is the primary text of Muslims’ religion and practise. Millions of Muslims around the world use al-Quran as their reference guide, and so knowledge can be obtained from it by Muslims and Islamic scholars in general. Al-Quran has been reinterpreted to various languages in the world, for example, English and has been written by several translators. Each translator has ideas, comments and statements to translate the verses from which he has obtained (Tafseer). Therefore, this paper tries to cluster the translation of the Tafseer using text clustering. Text clustering is the text mining method that needs to be clustered in the same section of related documents. The study adapted (mini-batch k-means and k-means) algorithms of clustering techniques to explain and to define the link between keywords known as features or concepts for Al-Baqarah chapter of 286 verses. For this dataset, data preprocessing and extraction of features using Term Frequency-Inverse Document Frequency (TF-IDF) and Principal Component Analysis (PCA) applied. Results showed that two/three-dimensional clustering plotting assigning seven cluster categories (k = 7) for the Tafseer. The implementation time of the mini-batch k-means algorithm (0.05485s) outperformed the time of the k-means algorithm (0.23334s). Finally, the features ‘god’, ‘people’, and ‘believe’ was the most frequent features.

Machine learning in systematic reviews: comparing automated text clustering with Lingo3G and human researcher categorization in a rapid review

Research Synthesis Methods ◽

10.1002/jrsm.1541 ◽

2021 ◽

Author(s):

Ashley Elizabeth Muller ◽

Heather Melanie R. Ames ◽

Patricia Sofia Jacobsen Jardim ◽

Christopher James Rose

Keyword(s):

Machine Learning ◽

Systematic Reviews ◽

Text Clustering ◽

Rapid Review

Grey wolf optimization algorithm for hierarchical document clustering

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i3.pp1744-1758 ◽

2021 ◽

Vol 24 (3) ◽

pp. 1744

Author(s):

Ayad Mohammed Jabbar ◽

Ku Ruhana Ku-Mahamud

Keyword(s):

Document Clustering ◽

Text Clustering ◽

Search Space ◽

Learning Approaches ◽

Grey Wolf ◽

External Evaluation ◽

Grey Wolf Optimization ◽

Noise Data ◽

F Measure ◽

Better Than

In data mining, the application of grey wolf optimization (GWO) algorithm has been used in several learning approaches because of its simplicity in adapting to different application domains. Most recent works that concern unsupervised learning have focused on text clustering, where the GWO algorithm shows promising results. Although GWO has great potential in performing text clustering, it has limitations in dealing with outlier documents and noise data. This research introduces medoid GWO (M-GWO) algorithm, which incorporates a medoid recalculation process to share the information of medoids among the three best wolves and the rest of the population. This improvement aims to find the best set of medoids during the algorithm run and increases the exploitation search to find more local regions in the search space. Experimental results obtained from using well-known algorithms, such as genetic, firefly, GWO, and k-means algorithms, in four benchmarks. The results of external evaluation metrics, such as rand, purity, F-measure, and entropy, indicates that the proposed M-GWO algorithm achieves better document clustering than all other algorithms (i.e., 75% better when using Rand metric, 50% better than all algorithm based on purity metric, 75% better than all algorithms using F-measure metric, and 100% based on entropy metric).

Assessment of text clustering approaches for legal documents

10.5753/eniac.2021.18239 ◽

2021 ◽

Author(s):

Ingrid L. A. da Silva ◽

Rafael Ferreira Mello ◽

Péricles B. C. Miranda ◽

André C. A. Nascimento ◽

Isabel W. S. Maldonado ◽

...

Keyword(s):

Text Clustering ◽

Legal Documents

O sistema judiciário é composto por inúmeros documentos relacionados a processos jurídicos. Esses documentos podem conter informações relevantes que suportem a tomada de decisão em processos futuros. No entanto, a coleta dessas informações não é uma tarefa trivial. Este artigo propõe o uso de agrupamento para reunir processos semelhantes e facilitar a coleta de informações. Dessa forma, diferentes abordagens foram avaliadas com a intenção de identificar a mais adequada para realizar esta tarefa. As abordagens foram aplicadas a uma base de dados composta por 1515 textos de fatos de petições iniciais. Essas abordagens foram avaliadas levando em consideração métricas de avaliação internas e os textos dos processos agrupados. Os resultados apontaram que a melhor abordagem para realizar o agrupamento de processos jurídicos é composta pelo algoritmo K-Means e pela técnica de representação TF-IDF em combinação com a técnica PCA.

Improving Clustering Methods By Exploiting Richness Of Text Data

10.26686/wgtn.17019287.v1 ◽

2021 ◽

Author(s):

◽

Abdul Wahid

Keyword(s):

Evolutionary Algorithm ◽

State Of The Art ◽

Ensemble Methods ◽

Text Clustering ◽

Clustering Methods ◽

Clustering Method ◽

Clustering Ensemble ◽

Text Data ◽

Multi Objective ◽

User Queries

<p>Clustering is an unsupervised machine learning technique, which involves discovering different clusters (groups) of similar objects in unlabeled data and is generally considered to be a NP hard problem. Clustering methods are widely used in a verity of disciplines for analyzing different types of data, and a small improvement in clustering method can cause a ripple effect in advancing research of multiple fields. Clustering any type of data is challenging and there are many open research questions. The clustering problem is exacerbated in the case of text data because of the additional challenges such as issues in capturing semantics of a document, handling rich features of text data and dealing with the well known problem of the curse of dimensionality. In this thesis, we investigate the limitations of existing text clustering methods and address these limitations by providing five new text clustering methods--Query Sense Clustering (QSC), Dirichlet Weighted K-means (DWKM), Multi-View Multi-Objective Evolutionary Algorithm (MMOEA), Multi-objective Document Clustering (MDC) and Multi-Objective Multi-View Ensemble Clustering (MOMVEC). These five new clustering methods showed that the use of rich features in text clustering methods could outperform the existing state-of-the-art text clustering methods. The first new text clustering method QSC exploits user queries (one of the rich features in text data) to generate better quality clusters and cluster labels. The second text clustering method DWKM uses probability based weighting scheme to formulate a semantically weighted distance measure to improve the clustering results. The third text clustering method MMOEA is based on a multi-objective evolutionary algorithm. MMOEA exploits rich features to generate a diverse set of candidate clustering solutions, and forms a better clustering solution using a cluster-oriented approach. The fourth and the fifth text clustering method MDC and MOMVEC address the limitations of MMOEA. MDC and MOMVEC differ in terms of the implementation of their multi-objective evolutionary approaches. All five methods are compared with existing state-of-the-art methods. The results of the comparisons show that the newly developed text clustering methods out-perform existing methods by achieving up to 16\% improvement for some comparisons. In general, almost all newly developed clustering algorithms showed statistically significant improvements over other existing methods. The key ideas of the thesis highlight that exploiting user queries improves Search Result Clustering(SRC); utilizing rich features in weighting schemes and distance measures improves soft subspace clustering; utilizing multiple views and a multi-objective cluster oriented method improves clustering ensemble methods; and better evolutionary operators and objective functions improve multi-objective evolutionary clustering ensemble methods. The new text clustering methods introduced in this thesis can be widely applied in various domains that involve analysis of text data. The contributions of this thesis which include five new text clustering methods, will not only help researchers in the data mining field but also to help a wide range of researchers in other fields.</p>

Improving Clustering Methods By Exploiting Richness Of Text Data

10.26686/wgtn.17019287 ◽

2021 ◽

Author(s):

◽

Abdul Wahid

Keyword(s):

Evolutionary Algorithm ◽

State Of The Art ◽

Ensemble Methods ◽

Text Clustering ◽

Clustering Methods ◽

Clustering Method ◽

Clustering Ensemble ◽

Text Data ◽

Multi Objective ◽

User Queries

<p>Clustering is an unsupervised machine learning technique, which involves discovering different clusters (groups) of similar objects in unlabeled data and is generally considered to be a NP hard problem. Clustering methods are widely used in a verity of disciplines for analyzing different types of data, and a small improvement in clustering method can cause a ripple effect in advancing research of multiple fields. Clustering any type of data is challenging and there are many open research questions. The clustering problem is exacerbated in the case of text data because of the additional challenges such as issues in capturing semantics of a document, handling rich features of text data and dealing with the well known problem of the curse of dimensionality. In this thesis, we investigate the limitations of existing text clustering methods and address these limitations by providing five new text clustering methods--Query Sense Clustering (QSC), Dirichlet Weighted K-means (DWKM), Multi-View Multi-Objective Evolutionary Algorithm (MMOEA), Multi-objective Document Clustering (MDC) and Multi-Objective Multi-View Ensemble Clustering (MOMVEC). These five new clustering methods showed that the use of rich features in text clustering methods could outperform the existing state-of-the-art text clustering methods. The first new text clustering method QSC exploits user queries (one of the rich features in text data) to generate better quality clusters and cluster labels. The second text clustering method DWKM uses probability based weighting scheme to formulate a semantically weighted distance measure to improve the clustering results. The third text clustering method MMOEA is based on a multi-objective evolutionary algorithm. MMOEA exploits rich features to generate a diverse set of candidate clustering solutions, and forms a better clustering solution using a cluster-oriented approach. The fourth and the fifth text clustering method MDC and MOMVEC address the limitations of MMOEA. MDC and MOMVEC differ in terms of the implementation of their multi-objective evolutionary approaches. All five methods are compared with existing state-of-the-art methods. The results of the comparisons show that the newly developed text clustering methods out-perform existing methods by achieving up to 16\% improvement for some comparisons. In general, almost all newly developed clustering algorithms showed statistically significant improvements over other existing methods. The key ideas of the thesis highlight that exploiting user queries improves Search Result Clustering(SRC); utilizing rich features in weighting schemes and distance measures improves soft subspace clustering; utilizing multiple views and a multi-objective cluster oriented method improves clustering ensemble methods; and better evolutionary operators and objective functions improve multi-objective evolutionary clustering ensemble methods. The new text clustering methods introduced in this thesis can be widely applied in various domains that involve analysis of text data. The contributions of this thesis which include five new text clustering methods, will not only help researchers in the data mining field but also to help a wide range of researchers in other fields.</p>

A novel regularized asymmetric non-negative matrix factorization for text clustering

Information Processing & Management ◽

10.1016/j.ipm.2021.102694 ◽

2021 ◽

Vol 58 (6) ◽

pp. 102694

Author(s):

Mehdi Hosseinzadeh Aghdam ◽

Mohammad Daryaie Zanjani

Keyword(s):

Matrix Factorization ◽

Text Clustering ◽

Non Negative Matrix Factorization

A Scalable Short-Text Clustering Algorithm Using Apache Spark

10.1109/ictai52525.2021.00149 ◽

2021 ◽

Author(s):

Leonidas Akritidis ◽

Miltiadis Alamaniotis ◽

Athanasios Fevgas ◽

Panayiotis Bozanis

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Apache Spark ◽

Short Text ◽

Short Text Clustering

text clustering
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Arabic text clustering technique to improve information retrieval

TEXT CLUSTERING APPROACH TOWARD COMMUNITY EXPECTATIONS TO THE BUS RAPID TRANSIT (BRT) TRANSJATENG PURWOKERTO-PURBALINGGA OPERATIONS

Mini-Batch k-Means versus k-Means to Cluster English Tafseer Text: View of Al-Baqarah Chapter

Machine learning in systematic reviews: comparing automated text clustering with Lingo3G and human researcher categorization in a rapid review

Grey wolf optimization algorithm for hierarchical document clustering

Assessment of text clustering approaches for legal documents

Improving Clustering Methods By Exploiting Richness Of Text Data

Improving Clustering Methods By Exploiting Richness Of Text Data

A novel regularized asymmetric non-negative matrix factorization for text clustering

A Scalable Short-Text Clustering Algorithm Using Apache Spark

Export Citation Format

text clusteringRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Arabic text clustering technique to improve information retrieval

TEXT CLUSTERING APPROACH TOWARD COMMUNITY EXPECTATIONS TO THE BUS RAPID TRANSIT (BRT) TRANSJATENG PURWOKERTO-PURBALINGGA OPERATIONS

Mini-Batch k-Means versus k-Means to Cluster English Tafseer Text: View of Al-Baqarah Chapter

Machine learning in systematic reviews: comparing automated text clustering with Lingo3G and human researcher categorization in a rapid review

Grey wolf optimization algorithm for hierarchical document clustering

Assessment of text clustering approaches for legal documents

Improving Clustering Methods By Exploiting Richness Of Text Data

Improving Clustering Methods By Exploiting Richness Of Text Data

A novel regularized asymmetric non-negative matrix factorization for text clustering

A Scalable Short-Text Clustering Algorithm Using Apache Spark

text clustering
Recently Published Documents