Significant Term List Based Metadata Conceptual Mining Model for Effective Text Clustering

In text mining most techniques depends on statistical analysis of terms. Statistical analysis trances important terms within document only. However this concept based mining model analyses terms in sentence, document and corpus level. This mining model consist of sentence based concept analysis, document based and corpus based concept analysis and concept based similarity measure. Experimental result enhances text clustering quality by using sentence, document, corpus and combined approach of concept analysis.

Download Full-text

An Efficient Concept-Based Mining Model for Enhancing Text Clustering

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2009.174 ◽

2010 ◽

Vol 22 (10) ◽

pp. 1360-1371 ◽

Cited By ~ 54

Author(s):

Shady Shehata ◽

Fakhri Karray ◽

Mohamed Kamel

Keyword(s):

Text Clustering ◽

Concept Based Mining ◽

Mining Model

Download Full-text

Concept-Based Mining Model

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch004 ◽

2010 ◽

pp. 57-69 ◽

Cited By ~ 1

Author(s):

Shady Shehata ◽

Fakhri Karray ◽

Mohamed Kamel

Keyword(s):

Statistical Analysis ◽

Text Mining ◽

Text Clustering ◽

The Other ◽

Graph Representation ◽

Sentence Meaning ◽

Proposed Model ◽

Concept Based Mining ◽

Mining Model

Most of text mining techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, and concept extractor. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The concept-based model is used to enhance the quality of the text clustering, categorization and retrieval significantly.

Download Full-text

A Novel Educational Data Mining Model using Classification Algorithm for evaluating Students E-learning Performance

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.616624 ◽

2019 ◽

Vol 7 (5) ◽

pp. 616-624

Author(s):

S. Arumugam ◽

A. Kovalan ◽

A.E. Narayanan

Keyword(s):

Data Mining ◽

Educational Data Mining ◽

Classification Algorithm ◽

Learning Performance ◽

E Learning ◽

Mining Model

Download Full-text

K-means text clustering algorithm based on density and nearest neighbor

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01933 ◽

2010 ◽

Vol 30 (7) ◽

pp. 1933-1935 ◽

Cited By ~ 6

Author(s):

Wen-ming ZHANG ◽

Jiang WU ◽

Xiao-jiao YUAN

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Text Clustering

Download Full-text

An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405614666180903112541 ◽

2020 ◽

Vol 16 (4) ◽

pp. 296-306 ◽

Cited By ~ 3

Author(s):

Laith Mohammad Abualigah ◽

Essam Said Hanandeh ◽

Ahamad Tajudin Khader ◽

Mohammed Abdallh Otair ◽

Shishir Kumar Shandilya

Keyword(s):

Optimization Technique ◽

Document Clustering ◽

Text Clustering ◽

Hill Climbing ◽

Text Documents ◽

Clustering Problem ◽

Text Document ◽

Text Information ◽

Amount Of Knowledge ◽

The Hill

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

Download Full-text

Design of GA and Ontology based NLP Frameworks for Online Opinion Mining

Recent Patents on Engineering ◽

10.2174/1872212112666180115162726 ◽

2019 ◽

Vol 13 (2) ◽

pp. 159-165

Author(s):

Manik Sharma ◽

Gurvinder Singh ◽

Rajinder Singh

Keyword(s):

Genetic Algorithm ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Opinion Mining ◽

Hybrid Genetic Algorithm ◽

Online Reviews ◽

Middle Tier ◽

Complete Set ◽

Mining Model

Background: For almost every domain, a tremendous degree of data is accessible in an online and offline mode. Billions of users are daily posting their views or opinions by using different online applications like WhatsApp, Facebook, Twitter, Blogs, Instagram etc. Objective: These reviews are constructive for the progress of the venture, civilization, state and even nation. However, this momentous amount of information is useful only if it is collectively and effectively mined. Methodology: Opinion mining is used to extract the thoughts, expression, emotions, critics, appraisal from the data posted by different persons. It is one of the prevailing research techniques that coalesce and employ the features from natural language processing. Here, an amalgamated approach has been employed to mine online reviews. Results: To improve the results of genetic algorithm based opining mining patent, here, a hybrid genetic algorithm and ontology based 3-tier natural language processing framework named GAO_NLP_OM has been designed. First tier is used for preprocessing and corrosion of the sentences. Middle tier is composed of genetic algorithm based searching module, ontology for English sentences, base words for the review, complete set of English words with item and their features. Genetic algorithm is used to expedite the polarity mining process. The last tier is liable for semantic, discourse and feature summarization. Furthermore, the use of ontology assists in progressing more accurate opinion mining model. Conclusion: GAO_NLP_OM is supposed to improve the performance of genetic algorithm based opinion mining patent. The amalgamation of genetic algorithm, ontology and natural language processing seems to produce fast and more precise results. The proposed framework is able to mine simple as well as compound sentences. However, affirmative preceded interrogative, hidden feature and mixed language sentences still be a challenge for the proposed framework.

Download Full-text