keyword extraction
Recently Published Documents


TOTAL DOCUMENTS

522
(FIVE YEARS 178)

H-INDEX

22
(FIVE YEARS 5)

2022 ◽  
Vol 16 (4) ◽  
pp. 1-30
Author(s):  
Muhammad Abulaish ◽  
Mohd Fazil ◽  
Mohammed J. Zaki

Domain-specific keyword extraction is a vital task in the field of text mining. There are various research tasks, such as spam e-mail classification, abusive language detection, sentiment analysis, and emotion mining, where a set of domain-specific keywords (aka lexicon) is highly effective. Existing works for keyword extraction list all keywords rather than domain-specific keywords from a document corpus. Moreover, most of the existing approaches perform well on formal document corpuses but fail on noisy and informal user-generated content in online social media. In this article, we present a hybrid approach by jointly modeling the local and global contextual semantics of words, utilizing the strength of distributional word representation and contrasting-domain corpus for domain-specific keyword extraction. Starting with a seed set of a few domain-specific keywords, we model the text corpus as a weighted word-graph. In this graph, the initial weight of a node (word) represents its semantic association with the target domain calculated as a linear combination of three semantic association metrics, and the weight of an edge connecting a pair of nodes represents the co-occurrence count of the respective words. Thereafter, a modified PageRank method is applied to the word-graph to identify the most relevant words for expanding the initial set of domain-specific keywords. We evaluate our method over both formal and informal text corpuses (comprising six datasets), and show that it performs significantly better in comparison to state-of-the-art methods. Furthermore, we generalize our approach to handle the language-agnostic case, and show that it outperforms existing language-agnostic approaches.


2022 ◽  
Vol 97 ◽  
pp. 107639
Author(s):  
Tiantian Ding ◽  
Wenzhong Yang ◽  
Fuyuan Wei ◽  
Chao Ding ◽  
Peng Kang ◽  
...  

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Retrieving keywords in a text is attracting researchers for a long time as it forms a base for many natural language applications like information retrieval, text summarization, document categorization etc. A text is a collection of words that represent the theme of the text naturally and to bring the naturalism under certain rules is itself a challenging task. In the present paper, the authors evaluate different spatial distribution based keyword extraction methods available in the literature on three standard scientific texts. The authors choose the first few high-frequency words for evaluation to reduce the complexity as all the methods are somehow based on frequency. The authors find that the methods are not providing good results particularly in the case of the first few retrieved words. Thus, the authors propose a new measure based on frequency, inverse document frequency, variance, and Tsallis entropy. Evaluation of different methods is done on the basis of precision, recall, and F-measure. Results show that the proposed method provides improved results.


Author(s):  
Niladri Chatterjee ◽  
Aayush Singha Roy ◽  
Nidhika Yadav

The present work proposes an application of Soft Rough Set and its span for unsupervised keyword extraction. In recent times Soft Rough Sets are being applied in various domains, though none of its applications are in the area of keyword extraction. On the other hand, the concept of Rough Set based span has been developed for improved efficiency in the domain of extractive text summarization. In this work we amalgamate these two techniques, called Soft Rough Set based Span (SRS), to provide an effective solution for keyword extraction from texts. The universe for Soft Rough Set is taken to be a collection of words from the input texts. SRS provides an ideal platform for identifying the set of keywords from the input text which cannot always be defined clearly and unambiguously. The proposed technique uses greedy algorithm for computing spanning sets. The experimental results suggest that extraction of keywords using the proposed scheme gives consistent results across different domains. Also, it has been found to be more efficient in comparison with several existing unsupervised techniques.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
M. Saef Ullah Miah ◽  
Junaida Sulaiman ◽  
Talha Bin Sarwar ◽  
Kamal Z. Zamli ◽  
Rajan Jose

Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques.


2021 ◽  
Vol 26 (6) ◽  
pp. 886-893
Author(s):  
Ao Xiong ◽  
Derong Liu ◽  
Hongkang Tian ◽  
Zhengyuan Liu ◽  
Peng Yu ◽  
...  

2021 ◽  
pp. 863-871
Author(s):  
S. Vijaya Shetty ◽  
S. Akshay ◽  
B. S. Shritej Reddy ◽  
Hemant Rakesh ◽  
M. Mihir ◽  
...  

2021 ◽  
Author(s):  
G. Vijay Kumar ◽  
Arvind Yadav ◽  
B. Vishnupriya ◽  
M. Naga Lahari ◽  
J. Smriti ◽  
...  

In this era everything is digitalized we can find a large amount of digital data for different purposes on the internet and relatively it’s very hard to summarize this data manually. Automatic Text Summarization (ATS) is the subsequent big one that could simply summarize the source data and give us a short version that could preserve the content and the overall meaning. While the concept of ATS is started long back in 1950’s, this field is still struggling to give the best and efficient summaries. ATS proceeds towards 2 methods, Extractive and Abstractive Summarization. The Extractive and Abstractive methods had a process to improve text summarization technique. Text Summarization is implemented with NLP due to packages and methods in Python. Different approaches are present for summarizing the text and having few algorithms with which we can implement it. Text Rank is what to extractive text summarization and it is an unsupervised learning. Text Rank algorithm also uses undirected graphs, weighted graphs. keyword extraction, sentence extraction. So, in this paper, a model is made to get better result in text summarization with Genism library in NLP. This method improves the overall meaning of the phrase and the person reading it can understand in a better way.


Sign in / Sign up

Export Citation Format

Share Document