document categorization
Recently Published Documents


TOTAL DOCUMENTS

189
(FIVE YEARS 24)

H-INDEX

18
(FIVE YEARS 2)

2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Algorithmic – based search approach is ineffective at addressing the problem of multi-dimensional feature selection for document categorization. This study proposes the use of meta heuristic based search approach for optimal feature selection. Elephant optimization (EO) and Ant Colony optimization (ACO) algorithms coupled with Naïve Bayes (NB), Support Vector Machin (SVM), and J48 classifiers were used to highlight the optimization capability of meta-heuristic search for multi-dimensional feature selection problem in document categorization. In addition, the performance results for feature selection using the two meta-heuristic based approaches (EO and ACO) were compared with conventional Best First Search (BFS) and Greedy Stepwise (GS) algorithms on news document categorization. The comparative results showed that global optimal feature subsets were attained using adaptive parameters tuning in meta-heuristic based feature selection optimization scheme. In addition, the selected number of feature subsets were minimized dramatically for document classification.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Retrieving keywords in a text is attracting researchers for a long time as it forms a base for many natural language applications like information retrieval, text summarization, document categorization etc. A text is a collection of words that represent the theme of the text naturally and to bring the naturalism under certain rules is itself a challenging task. In the present paper, the authors evaluate different spatial distribution based keyword extraction methods available in the literature on three standard scientific texts. The authors choose the first few high-frequency words for evaluation to reduce the complexity as all the methods are somehow based on frequency. The authors find that the methods are not providing good results particularly in the case of the first few retrieved words. Thus, the authors propose a new measure based on frequency, inverse document frequency, variance, and Tsallis entropy. Evaluation of different methods is done on the basis of precision, recall, and F-measure. Results show that the proposed method provides improved results.


2021 ◽  
pp. 137-147
Author(s):  
Mostaq Ahmed ◽  
Partha Chakraborty ◽  
Tanupriya Choudhury

PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0255127
Author(s):  
Hend Alrasheed

Keyword extraction refers to the process of detecting the most relevant terms and expressions in a given text in a timely manner. In the information explosion era, keyword extraction has attracted increasing attention. The importance of keyword extraction in text summarization, text comparisons, and document categorization has led to an emphasis on graph-based keyword extraction techniques because they can capture more structural information compared to other classic text analysis methods. In this paper, we propose a simple unsupervised text mining approach that aims to extract a set of keywords from a given text and analyze its topic diversity using graph analysis tools. Initially, the text is represented as a directed graph using synonym relationships. Then, community detection and other measures are used to identify keywords in the text. The set of extracted keywords is used to assess topic diversity within the text and analyze its sentiment. The proposed approach relies on grouping semantically similar candidate words. This approach ensures that the set of extracted keywords is comprehensive. Differing from other graph-based keyword extraction approaches, the proposed method does not require user parameters during graph construction and word scoring. The proposed approach achieved significant results compared to other keyword extraction techniques.


2021 ◽  
pp. 40-51
Author(s):  
Antonio M. Rinaldi ◽  
Cristiano Russo ◽  
Cristian Tommasino

Sign in / Sign up

Export Citation Format

Share Document