scholarly journals Significant Term List Based Metadata Conceptual Mining Model for Effective Text Clustering

2012 ◽  
Vol 8 (10) ◽  
pp. 1660-1666
Author(s):  
T.
Author(s):  
PRADNYA S. RANDIVE ◽  
NITIN N. PISE

In text mining most techniques depends on statistical analysis of terms. Statistical analysis trances important terms within document only. However this concept based mining model analyses terms in sentence, document and corpus level. This mining model consist of sentence based concept analysis, document based and corpus based concept analysis and concept based similarity measure. Experimental result enhances text clustering quality by using sentence, document, corpus and combined approach of concept analysis.


2010 ◽  
Vol 22 (10) ◽  
pp. 1360-1371 ◽  
Author(s):  
Shady Shehata ◽  
Fakhri Karray ◽  
Mohamed Kamel

Author(s):  
Shady Shehata ◽  
Fakhri Karray ◽  
Mohamed Kamel

Most of text mining techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, and concept extractor. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The concept-based model is used to enhance the quality of the text clustering, categorization and retrieval significantly.


2010 ◽  
Vol 30 (7) ◽  
pp. 1933-1935 ◽  
Author(s):  
Wen-ming ZHANG ◽  
Jiang WU ◽  
Xiao-jiao YUAN

Author(s):  
Laith Mohammad Abualigah ◽  
Essam Said Hanandeh ◽  
Ahamad Tajudin Khader ◽  
Mohammed Abdallh Otair ◽  
Shishir Kumar Shandilya

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.


2019 ◽  
Vol 13 (2) ◽  
pp. 159-165
Author(s):  
Manik Sharma ◽  
Gurvinder Singh ◽  
Rajinder Singh

Background: For almost every domain, a tremendous degree of data is accessible in an online and offline mode. Billions of users are daily posting their views or opinions by using different online applications like WhatsApp, Facebook, Twitter, Blogs, Instagram etc. Objective: These reviews are constructive for the progress of the venture, civilization, state and even nation. However, this momentous amount of information is useful only if it is collectively and effectively mined. Methodology: Opinion mining is used to extract the thoughts, expression, emotions, critics, appraisal from the data posted by different persons. It is one of the prevailing research techniques that coalesce and employ the features from natural language processing. Here, an amalgamated approach has been employed to mine online reviews. Results: To improve the results of genetic algorithm based opining mining patent, here, a hybrid genetic algorithm and ontology based 3-tier natural language processing framework named GAO_NLP_OM has been designed. First tier is used for preprocessing and corrosion of the sentences. Middle tier is composed of genetic algorithm based searching module, ontology for English sentences, base words for the review, complete set of English words with item and their features. Genetic algorithm is used to expedite the polarity mining process. The last tier is liable for semantic, discourse and feature summarization. Furthermore, the use of ontology assists in progressing more accurate opinion mining model. Conclusion: GAO_NLP_OM is supposed to improve the performance of genetic algorithm based opinion mining patent. The amalgamation of genetic algorithm, ontology and natural language processing seems to produce fast and more precise results. The proposed framework is able to mine simple as well as compound sentences. However, affirmative preceded interrogative, hidden feature and mixed language sentences still be a challenge for the proposed framework.


Sign in / Sign up

Export Citation Format

Share Document