scholarly journals News keyword extraction algorithm based on semantic clustering and word graph model

2021 ◽  
Vol 26 (6) ◽  
pp. 886-893
Author(s):  
Ao Xiong ◽  
Derong Liu ◽  
Hongkang Tian ◽  
Zhengyuan Liu ◽  
Peng Yu ◽  
...  
2010 ◽  
Vol 44-47 ◽  
pp. 4041-4049 ◽  
Author(s):  
Hong Zhao ◽  
Chen Sheng Bai ◽  
Song Zhu

Search engines can bring a lot of benefit to the website. For a site, each page’s search engine ranking is very important. To make web page ranking in search engine ahead, Search engine optimization (SEO) make effect on the ranking. Web page needs to set the keywords as “keywords" to use SEO. The paper focuses on the content of a given word, and extracts the keywords of each page by calculating the word frequency. The algorithm is implemented by C # language. Keywords setting of webpage are of great importance on the information and products


2021 ◽  
Vol 2078 (1) ◽  
pp. 012021
Author(s):  
Hongyang Zhao ◽  
Qiang Xie

Abstract In view of the fact that the traditional graph model method which only considers statistical features or general semantic features when extracting keywords from existing massive educational resources, lacks the function of mining and utilizing multi-factor semantic features, this paper proposes an improved TextRank-based algorithm for keyword extraction of educational resources. According to the characteristics of Chinese text and the shortcomings of traditional TextRank algorithm, the improved algorithm featuring multi-feature fusion is developed using the importance of words in the corpus, the location information in the text and the attributes of words. Experimental results show that this method has higher accuracy, recall rate, and F-measure value than traditional algorithms in the process of keyword extraction of educational resources, which improves the quality of keyword extraction and is beneficial to better utilization and management of educational resources.


2017 ◽  
Vol 6 (1) ◽  
pp. 36-52
Author(s):  
Urmila Shrawankar ◽  
Kranti Wankhede

A considerable amount of time is required to interpret whole news article to get the gist of it. Therefore, in order to reduce the reading and interpretation time, headlines are necessary. The available techniques for news headline construction mainly includes extractive and abstractive headline generation techniques. In this paper, context based news headline is formed from long news article by using techniques of core Natural Language Processing (NLP) and key terms of news article. Key terms are retrieved from lengthy news article by using various approaches of keyword extraction. The keyphrases are picked out using Keyphrase Extraction Algorithm (KEA) which helps to construct headline syntax along with NLP's parsing technique. Sentence compression algorithm helps to generate compressed sentences from generated parse tree of leading sentences. Headline helps user for reducing cognitive burden of reader by reflecting important contents of news. The objective is to frame headline using key terms for reducing reading time and efforts of reader.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Yoojoong Kim ◽  
Jeong Hyeon Lee ◽  
Sunho Choi ◽  
Jeong Moon Lee ◽  
Jong-Ho Kim ◽  
...  

AbstractPathology reports contain the essential data for both clinical and research purposes. However, the extraction of meaningful, qualitative data from the original document is difficult due to the narrative and complex nature of such reports. Keyword extraction for pathology reports is necessary to summarize the informative text and reduce intensive time consumption. In this study, we employed a deep learning model for the natural language process to extract keywords from pathology reports and presented the supervised keyword extraction algorithm. We considered three types of pathological keywords, namely specimen, procedure, and pathology types. We compared the performance of the present algorithm with the conventional keyword extraction methods on the 3115 pathology reports that were manually labeled by professional pathologists. Additionally, we applied the present algorithm to 36,014 unlabeled pathology reports and analysed the extracted keywords with biomedical vocabulary sets. The results demonstrated the suitability of our model for practical application in extracting important data from pathology reports.


Author(s):  
Bahare Hashemzahde ◽  
Majid Abdolrazzagh-Nezhad

The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Kun Zhang

In order to shorten the time for users to query news on the Internet, this paper studies and designs a network news data extraction technology, which can obtain the main news information through the extraction of news text keywords. Firstly, the TF-IDF keyword extraction algorithm, TextRank keyword extraction algorithm, and LDA keyword extraction algorithm are analyzed to understand the keyword extraction process, and the TF-IDF algorithm is optimized by Zipf’s law. By introducing the idea of model fusion, five schemes based on waterfall fusion and parallel combination fusion are designed, and the effects of the five schemes are verified by experiments. It is found that the designed extraction technology has a good effect on network news data extraction. News keyword extraction has a great application prospect, which can provide the basis for the research fields of news key phrases, news abstracts, and so on.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Miao Teng

In this paper, we conduct an in-depth study of Japanese keyword extraction from news reports, train external computer document word sets from text preprocessing into word vectors using the Ship-gram model in the deep learning tool Word2Vec, and calculate the cosine distance between word vectors. In this paper, the sliding window in TextRank is designed to connect internal document information to improve the in-text semantic coherence. The main idea is to use not only the statistical and structural features of words but also the semantic features of words extracted through word-embedding techniques, i.e., multifeature fusion, to obtain the importance weights of words themselves and the attraction weights between words and then iteratively calculate the final weight of each word through the graph model algorithm to determine the extracted keywords. To verify the performance of the algorithm, extensive simulation experimental studies were conducted on three different types of datasets. The experimental results show that the proposed keyword extraction algorithm can improve the performance by a maximum of 6.45% and 20.36% compared with the existing word frequency statistics and graph model methods, respectively; MF-Rank can achieve a maximum performance improvement of 1.76% compared with PW-TF.


Sign in / Sign up

Export Citation Format

Share Document