TEXT CLUSTERING IN CONCEPT BASED MINING

In text mining most techniques depends on statistical analysis of terms. Statistical analysis trances important terms within document only. However this concept based mining model analyses terms in sentence, document and corpus level. This mining model consist of sentence based concept analysis, document based and corpus based concept analysis and concept based similarity measure. Experimental result enhances text clustering quality by using sentence, document, corpus and combined approach of concept analysis.

Download Full-text

Concept-Based Mining Model

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch004 ◽

2010 ◽

pp. 57-69 ◽

Cited By ~ 1

Author(s):

Shady Shehata ◽

Fakhri Karray ◽

Mohamed Kamel

Keyword(s):

Statistical Analysis ◽

Text Mining ◽

Text Clustering ◽

The Other ◽

Graph Representation ◽

Sentence Meaning ◽

Proposed Model ◽

Concept Based Mining ◽

Mining Model

Most of text mining techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, and concept extractor. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The concept-based model is used to enhance the quality of the text clustering, categorization and retrieval significantly.

Download Full-text

Enhancing Text Clustering Using Concept-based Mining Model

Sixth International Conference on Data Mining (ICDM'06) ◽

10.1109/icdm.2006.64 ◽

2006 ◽

Cited By ~ 19

Author(s):

Shady Shehata ◽

Fakhri Karray ◽

Mohamed Kamel

Keyword(s):

Text Clustering ◽

Concept Based Mining ◽

Mining Model

Download Full-text

An Efficient Concept-Based Mining Model for Enhancing Text Clustering

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2009.174 ◽

2010 ◽

Vol 22 (10) ◽

pp. 1360-1371 ◽

Cited By ~ 54

Author(s):

Shady Shehata ◽

Fakhri Karray ◽

Mohamed Kamel

Keyword(s):

Text Clustering ◽

Concept Based Mining ◽

Mining Model

Download Full-text

A Combined Approach of Formal Concept Analysis and Text Mining for Concept Based Document Clustering

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05) ◽

10.1109/wi.2005.1 ◽

2005 ◽

Cited By ~ 3

Author(s):

N.N. Myat ◽

K. Haymar ◽

Saw Hla

Keyword(s):

Text Mining ◽

Formal Concept Analysis ◽

Document Clustering ◽

Concept Analysis ◽

Formal Concept ◽

Combined Approach

Download Full-text

A set theory based similarity measure for text clustering and classification

Journal Of Big Data ◽

10.1186/s40537-020-00344-3 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Ali A. Amer ◽

Hassan I. Abdalla

Keyword(s):

Set Theory ◽

Similarity Measure ◽

Similarity Measures ◽

Text Clustering ◽

Plagiarism Detection ◽

K Nearest Neighbor ◽

Single Measure ◽

Highly Effective ◽

Clustering And Classification ◽

Effectiveness And Efficiency

Abstract Similarity measures have long been utilized in information retrieval and machine learning domains for multi-purposes including text retrieval, text clustering, text summarization, plagiarism detection, and several other text-processing applications. However, the problem with these measures is that, until recently, there has never been one single measure recorded to be highly effective and efficient at the same time. Thus, the quest for an efficient and effective similarity measure is still an open-ended challenge. This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification. Furthermore, the study aims to provide a comprehensive scrutinization for seven of the most widely used similarity measures, mainly concerning their effectiveness and efficiency. Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined in detail. The experimental evaluation has been made on two of the most popular datasets, namely, Reuters-21 and Web-KB. The obtained results confirm that the proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.

Download Full-text

A semantic approach for text document clustering using frequent itemsets and WordNet

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.9.10220 ◽

2018 ◽

Vol 7 (2.18) ◽

pp. 102

Author(s):

Harsha Patil ◽

Ramjeevan Singh Thakur

Keyword(s):

Clustering Algorithms ◽

Document Clustering ◽

Knowledge Bases ◽

Experimental Result ◽

Semantic Approach ◽

Text Document ◽

Clustering Quality ◽

Ship Function ◽

Membership Score ◽

Specific Cluster

Document Clustering is an unsupervised method for classified documents in clusters on the basis of their similarity. Any document get it place in any specific cluster, on the basis of membership score, which calculated through membership function. But many of the traditional clustering algorithms are generally based on only BOW (Bag of Words), which ignores the semantic similarity between document and Cluster. In this research we consider the semantic association between cluster and text document during the calculation of membership score of any document for any specific cluster. Several researchers are working on semantic aspects of document clustering to develop clustering performance. Many external knowledge bases like WordNet, Wikipedia, Lucene etc. are utilized for this purpose. The proposed approach exploits WordNet to improve cluster member ship function. The experimental result shows that clustering quality improved significantly by using proposed framework of semantic approach.

Download Full-text

A SIMILARITY MEASURE FOR TEXT CLUSTERING AND CLASSIFICATION

International Journal of Advance Engineering and Research Development ◽

10.21090/ijaerd.030133 ◽

2016 ◽

Vol 3 (01) ◽

Keyword(s):

Similarity Measure ◽

Text Clustering ◽

Clustering And Classification

Download Full-text

A New Similarity Measure by Combining Formal Concept Analysis and Clustering for Case-Based Reasoning

Current Approaches in Applied Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-319-19066-2_49 ◽

2015 ◽

pp. 503-513

Author(s):

Mohsen Asghari ◽

Somayeh Alizadeh

Keyword(s):

Similarity Measure ◽

Formal Concept Analysis ◽

Concept Analysis ◽

Formal Concept ◽

Case Based Reasoning ◽

Case Based

Download Full-text

Short Text Clustering Algorithm with Feature Keyword Expansion

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1716 ◽

2012 ◽

Vol 532-533 ◽

pp. 1716-1720 ◽

Cited By ~ 3

Author(s):

Chun Xia Jin ◽

Hai Yan Zhou ◽

Qiu Chan Bai

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Experimental Results ◽

Semantic Features ◽

Short Text ◽

Clustering Quality ◽

Short Text Clustering

To solve the problem of sparse keywords and similarity drift in short text segments, this paper proposes short text clustering algorithm with feature keyword expansion (STCAFKE). The method can realize short text clustering by expanding feature keyword based on HowNet and combining K-means algorithm and density algorithm. It may add the number of text keyword with feature keyword expansion and increase text semantic features to realize short text clustering. Experimental results show that this algorithm has increased the short text clustering quality on precision and recall.

Download Full-text

Fractal Video Coding Using Fast Normalized Covariance Based Similarity Measure

Mathematical Problems in Engineering ◽

10.1155/2016/1725051 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Ravindra E. Chaudhari ◽

Sanjay B. Dhok

Keyword(s):

Video Compression ◽

Similarity Measure ◽

Compression Ratio ◽

Prediction Error ◽

Experimental Result ◽

Gray Level ◽

Search Window ◽

Target Level ◽

Compression Time ◽

Fractal Encoding

Fast normalized covariance based similarity measure for fractal video compression with quadtree partitioning is proposed in this paper. To increase the speed of fractal encoding, a simplified expression of covariance between range and overlapped domain blocks within a search window is implemented in frequency domain. All the covariance coefficients are normalized by using standard deviation of overlapped domain blocks and these are efficiently calculated in one computation by using two different approaches, namely, FFT based and sum table based. Results of these two approaches are compared and they are almost equal to each other in all aspects, except the memory requirement. Based on proposed simplified similarity measure, gray level transformation parameters are computationally modified and isometry transformations are performed using rotation/reflection properties of IFFT. Quadtree decompositions are used for the partitions of larger size of range block, that is, 16 × 16, which is based on target level of motion compensated prediction error. Experimental result shows that proposed method can increase the encoding speed and compression ratio by 66.49% and 9.58%, respectively, as compared to NHEXS method with increase in PSNR by 0.41 dB. Compared to H.264, proposed method can save 20% of compression time with marginal variation in PSNR and compression ratio.

Download Full-text