keyword extraction Latest Research Papers

Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual Semantics

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3494560 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-30

Author(s):

Muhammad Abulaish ◽

Mohd Fazil ◽

Mohammed J. Zaki

Keyword(s):

Hybrid Approach ◽

Joint Modeling ◽

Keyword Extraction ◽

Semantic Association ◽

Target Domain ◽

Domain Specific ◽

Online Social Media ◽

Word Representation ◽

E Mail ◽

Language Detection

Domain-specific keyword extraction is a vital task in the field of text mining. There are various research tasks, such as spam e-mail classification, abusive language detection, sentiment analysis, and emotion mining, where a set of domain-specific keywords (aka lexicon) is highly effective. Existing works for keyword extraction list all keywords rather than domain-specific keywords from a document corpus. Moreover, most of the existing approaches perform well on formal document corpuses but fail on noisy and informal user-generated content in online social media. In this article, we present a hybrid approach by jointly modeling the local and global contextual semantics of words, utilizing the strength of distributional word representation and contrasting-domain corpus for domain-specific keyword extraction. Starting with a seed set of a few domain-specific keywords, we model the text corpus as a weighted word-graph. In this graph, the initial weight of a node (word) represents its semantic association with the target domain calculated as a linear combination of three semantic association metrics, and the weight of an edge connecting a pair of nodes represents the co-occurrence count of the respective words. Thereafter, a modified PageRank method is applied to the word-graph to identify the most relevant words for expanding the initial set of domain-specific keywords. We evaluate our method over both formal and informal text corpuses (comprising six datasets), and show that it performs significantly better in comparison to state-of-the-art methods. Furthermore, we generalize our approach to handle the language-agnostic case, and show that it outperforms existing language-agnostic approaches.

Chinese keyword extraction model with distributed computing

Computers & Electrical Engineering ◽

10.1016/j.compeleceng.2021.107639 ◽

2022 ◽

Vol 97 ◽

pp. 107639

Author(s):

Tiantian Ding ◽

Wenzhong Yang ◽

Fuyuan Wei ◽

Chao Ding ◽

Peng Kang ◽

...

Keyword(s):

Distributed Computing ◽

Keyword Extraction ◽

Extraction Model

A comparative evaluation of different keyword extraction techniques

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289573 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

High Frequency ◽

Extraction Methods ◽

Text Summarization ◽

Keyword Extraction ◽

Extraction Techniques ◽

Scientific Texts ◽

Inverse Document Frequency ◽

Document Frequency ◽

Long Time ◽

Document Categorization

Retrieving keywords in a text is attracting researchers for a long time as it forms a base for many natural language applications like information retrieval, text summarization, document categorization etc. A text is a collection of words that represent the theme of the text naturally and to bring the naturalism under certain rules is itself a challenging task. In the present paper, the authors evaluate different spatial distribution based keyword extraction methods available in the literature on three standard scientific texts. The authors choose the first few high-frequency words for evaluation to reduce the complexity as all the methods are somehow based on frequency. The authors find that the methods are not providing good results particularly in the case of the first few retrieved words. Thus, the authors propose a new measure based on frequency, inverse document frequency, variance, and Tsallis entropy. Evaluation of different methods is done on the basis of precision, recall, and F-measure. Results show that the proposed method provides improved results.

TextRank keyword extraction method weighted by multivariate quantitative indexes

10.1117/12.2626538 ◽

2021 ◽

Author(s):

Xin Luan ◽

Wenya Gao ◽

Ming Chen ◽

Dalei Song

Keyword(s):

Extraction Method ◽

Keyword Extraction ◽

Quantitative Indexes

Soft Rough Set based span for unsupervised keyword extraction

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219228 ◽

2021 ◽

pp. 1-8

Author(s):

Niladri Chatterjee ◽

Aayush Singha Roy ◽

Nidhika Yadav

Keyword(s):

Greedy Algorithm ◽

Rough Set ◽

Rough Sets ◽

Text Summarization ◽

The Other ◽

Keyword Extraction ◽

Effective Solution ◽

Input Text ◽

Soft Rough Set ◽

The Universe

The present work proposes an application of Soft Rough Set and its span for unsupervised keyword extraction. In recent times Soft Rough Sets are being applied in various domains, though none of its applications are in the area of keyword extraction. On the other hand, the concept of Rough Set based span has been developed for improved efficiency in the domain of extractive text summarization. In this work we amalgamate these two techniques, called Soft Rough Set based Span (SRS), to provide an effective solution for keyword extraction from texts. The universe for Soft Rough Set is taken to be a collection of words from the input texts. SRS provides an ideal platform for identifying the set of keywords from the input text which cannot always be defined clearly and unambiguously. The proposed technique uses greedy algorithm for computing spanning sets. The experimental results suggest that extraction of keywords using the proposed scheme gives consistent results across different domains. Also, it has been found to be more efficient in comparison with several existing unsupervised techniques.

Study of Keyword Extraction Techniques for Electric Double-Layer Capacitor Domain Using Text Similarity Indexes: An Experimental Analysis

Complexity ◽

10.1155/2021/8192320 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

M. Saef Ullah Miah ◽

Junaida Sulaiman ◽

Talha Bin Sarwar ◽

Kamal Z. Zamli ◽

Rajan Jose

Keyword(s):

Double Layer ◽

Experimental Analysis ◽

Time Management ◽

Electric Double Layer ◽

Similarity Index ◽

Accurate Information ◽

Keyword Extraction ◽

Electric Double Layer Capacitor ◽

Extraction Techniques ◽

Double Layer Capacitor

Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques.

News keyword extraction algorithm based on semantic clustering and word graph model

Tsinghua Science & Technology ◽

10.26599/tst.2020.9010051 ◽

2021 ◽

Vol 26 (6) ◽

pp. 886-893

Author(s):

Ao Xiong ◽

Derong Liu ◽

Hongkang Tian ◽

Zhengyuan Liu ◽

Peng Yu ◽

...

Keyword(s):

Graph Model ◽

Keyword Extraction ◽

Semantic Clustering ◽

Extraction Algorithm

Graph-Based Keyword Extraction for Twitter Data

10.1007/978-981-16-1342-5_68 ◽

2021 ◽

pp. 863-871

Author(s):

S. Vijaya Shetty ◽

S. Akshay ◽

B. S. Shritej Reddy ◽

Hemant Rakesh ◽

M. Mihir ◽

...

Keyword(s):

Keyword Extraction ◽

Twitter Data

Text Summarizing Using NLP

10.3233/apc210179 ◽

2021 ◽

Author(s):

G. Vijay Kumar ◽

Arvind Yadav ◽

B. Vishnupriya ◽

M. Naga Lahari ◽

J. Smriti ◽

...

Keyword(s):

Text Summarization ◽

Digital Data ◽

The Internet ◽

Short Version ◽

Keyword Extraction ◽

Automatic Text Summarization ◽

Sentence Extraction ◽

Source Data ◽

Abstractive Summarization ◽

Automatic Text

In this era everything is digitalized we can find a large amount of digital data for different purposes on the internet and relatively it’s very hard to summarize this data manually. Automatic Text Summarization (ATS) is the subsequent big one that could simply summarize the source data and give us a short version that could preserve the content and the overall meaning. While the concept of ATS is started long back in 1950’s, this field is still struggling to give the best and efficient summaries. ATS proceeds towards 2 methods, Extractive and Abstractive Summarization. The Extractive and Abstractive methods had a process to improve text summarization technique. Text Summarization is implemented with NLP due to packages and methods in Python. Different approaches are present for summarizing the text and having few algorithms with which we can implement it. Text Rank is what to extractive text summarization and it is an unsupervised learning. Text Rank algorithm also uses undirected graphs, weighted graphs. keyword extraction, sentence extraction. So, in this paper, a model is made to get better result in text summarization with Genism library in NLP. This method improves the overall meaning of the phrase and the person reading it can understand in a better way.

Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set

International Journal of Fuzzy Systems ◽

10.1007/s40815-021-01190-y ◽

2021 ◽

Author(s):

Dong Qiu ◽

Qin Zheng

Keyword(s):

Rough Set ◽

Keyword Extraction ◽

Tolerance Rough Set

keyword extraction
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual Semantics

Chinese keyword extraction model with distributed computing

A comparative evaluation of different keyword extraction techniques

TextRank keyword extraction method weighted by multivariate quantitative indexes

Soft Rough Set based span for unsupervised keyword extraction

Study of Keyword Extraction Techniques for Electric Double-Layer Capacitor Domain Using Text Similarity Indexes: An Experimental Analysis

News keyword extraction algorithm based on semantic clustering and word graph model

Graph-Based Keyword Extraction for Twitter Data

Text Summarizing Using NLP

Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set

Export Citation Format

keyword extractionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual Semantics

Chinese keyword extraction model with distributed computing

A comparative evaluation of different keyword extraction techniques

TextRank keyword extraction method weighted by multivariate quantitative indexes

Soft Rough Set based span for unsupervised keyword extraction

Study of Keyword Extraction Techniques for Electric Double-Layer Capacitor Domain Using Text Similarity Indexes: An Experimental Analysis

News keyword extraction algorithm based on semantic clustering and word graph model

Graph-Based Keyword Extraction for Twitter Data

Text Summarizing Using NLP

Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set

keyword extraction
Recently Published Documents