News keyword extraction algorithm based on semantic clustering and word graph model

Search engines can bring a lot of benefit to the website. For a site, each page’s search engine ranking is very important. To make web page ranking in search engine ahead, Search engine optimization (SEO) make effect on the ranking. Web page needs to set the keywords as “keywords" to use SEO. The paper focuses on the content of a given word, and extracts the keywords of each page by calculating the word frequency. The algorithm is implemented by C # language. Keywords setting of webpage are of great importance on the information and products

Download Full-text

An Improved TextRank Multi-feature Fusion Algorithm For Keyword Extraction of Educational Resources

Journal of Physics Conference Series ◽

10.1088/1742-6596/2078/1/012021 ◽

2021 ◽

Vol 2078 (1) ◽

pp. 012021

Author(s):

Hongyang Zhao ◽

Qiang Xie

Keyword(s):

Chinese Text ◽

Feature Fusion ◽

Graph Model ◽

Recall Rate ◽

Educational Resources ◽

Keyword Extraction ◽

Semantic Features ◽

Fusion Algorithm ◽

General Semantic

Abstract In view of the fact that the traditional graph model method which only considers statistical features or general semantic features when extracting keywords from existing massive educational resources, lacks the function of mining and utilizing multi-factor semantic features, this paper proposes an improved TextRank-based algorithm for keyword extraction of educational resources. According to the characteristics of Chinese text and the shortcomings of traditional TextRank algorithm, the improved algorithm featuring multi-feature fusion is developed using the importance of words in the corpus, the location information in the text and the attributes of words. Experimental results show that this method has higher accuracy, recall rate, and F-measure value than traditional algorithms in the process of keyword extraction of educational resources, which improves the quality of keyword extraction and is beneficial to better utilization and management of educational resources.

Download Full-text

News Headline Building using Hybrid Headline Generation Technique for Quick Gist

International Journal of Natural Computing Research ◽

10.4018/ijncr.2017010103 ◽

2017 ◽

Vol 6 (1) ◽

pp. 36-52

Author(s):

Urmila Shrawankar ◽

Kranti Wankhede

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Reading Time ◽

News Article ◽

Keyword Extraction ◽

Keyphrase Extraction ◽

Generation Technique ◽

Extraction Algorithm ◽

Key Terms

A considerable amount of time is required to interpret whole news article to get the gist of it. Therefore, in order to reduce the reading and interpretation time, headlines are necessary. The available techniques for news headline construction mainly includes extractive and abstractive headline generation techniques. In this paper, context based news headline is formed from long news article by using techniques of core Natural Language Processing (NLP) and key terms of news article. Key terms are retrieved from lengthy news article by using various approaches of keyword extraction. The keyphrases are picked out using Keyphrase Extraction Algorithm (KEA) which helps to construct headline syntax along with NLP's parsing technique. Sentence compression algorithm helps to generate compressed sentences from generated parse tree of leading sentences. Headline helps user for reducing cognitive burden of reader by reflecting important contents of news. The objective is to frame headline using key terms for reducing reading time and efforts of reader.

Download Full-text

Research on Keyword Extraction Algorithm Using PMI and TextRank

2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT) ◽

10.1109/infoct.2019.8711099 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yang Tao ◽

Zhu Cui ◽

Zhang Jiazhe

Keyword(s):

Keyword Extraction ◽

Extraction Algorithm

Download Full-text

Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records

Scientific Reports ◽

10.1038/s41598-020-77258-w ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Yoojoong Kim ◽

Jeong Hyeon Lee ◽

Sunho Choi ◽

Jeong Moon Lee ◽

Jong-Ho Kim ◽

...

Keyword(s):

Deep Learning ◽

Natural Language ◽

Language Processing ◽

Extraction Methods ◽

Complex Nature ◽

Keyword Extraction ◽

Extraction Algorithm ◽

Pathology Reports ◽

Natural Language Process ◽

Present Algorithm

AbstractPathology reports contain the essential data for both clinical and research purposes. However, the extraction of meaningful, qualitative data from the original document is difficult due to the narrative and complex nature of such reports. Keyword extraction for pathology reports is necessary to summarize the informative text and reduce intensive time consumption. In this study, we employed a deep learning model for the natural language process to extract keywords from pathology reports and presented the supervised keyword extraction algorithm. We considered three types of pathological keywords, namely specimen, procedure, and pathology types. We compared the performance of the present algorithm with the conventional keyword extraction methods on the 3115 pathology reports that were manually labeled by professional pathologists. Additionally, we applied the present algorithm to 36,014 unlabeled pathology reports and analysed the extracted keywords with biomedical vocabulary sets. The results demonstrated the suitability of our model for practical application in extracting important data from pathology reports.

Download Full-text

Micro-blog Keyword Extraction Method Based on Graph Model and Semantic Space

Journal of Multimedia ◽

10.4304/jmm.8.5.611-617 ◽

2013 ◽

Vol 8 (5) ◽

Cited By ~ 1

Author(s):

Hua Zhao ◽

Qingtian Zeng

Keyword(s):

Extraction Method ◽

Graph Model ◽

Semantic Space ◽

Keyword Extraction

Download Full-text

Chinese Document Keyword Extraction Algorithm Based on FP-growth

2016 International Conference on Smart City and Systems Engineering (ICSCSE) ◽

10.1109/icscse.2016.0062 ◽

2016 ◽

Cited By ~ 2

Author(s):

Meng Zhao ◽

Wanjun Yu ◽

Wenjing Lu ◽

Quan Liu ◽

Jinxiao Li

Keyword(s):

Keyword Extraction ◽

Extraction Algorithm

Download Full-text

Improving keyword extraction in multilingual texts

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i6.pp5909-5916 ◽

2020 ◽

Vol 10 (6) ◽

pp. 5909

Author(s):

Bahare Hashemzahde ◽

Majid Abdolrazzagh-Nezhad

Keyword(s):

Extraction Procedure ◽

The Other ◽

Keyword Extraction ◽

Inverse Document Frequency ◽

Retrieval Systems ◽

Document Frequency ◽

Extraction Algorithm ◽

Information Retrieval Systems ◽

Multilingual Text ◽

Available Information

The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.

Download Full-text

Web News Data Extraction Technology Based on Text Keywords

Complexity ◽

10.1155/2021/5529447 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Kun Zhang

Keyword(s):

Data Extraction ◽

Extraction Process ◽

Network News ◽

Good Effect ◽

Keyword Extraction ◽

Extraction Technology ◽

Research Fields ◽

Extraction Algorithm ◽

Key Phrases ◽

Web News

In order to shorten the time for users to query news on the Internet, this paper studies and designs a network news data extraction technology, which can obtain the main news information through the extraction of news text keywords. Firstly, the TF-IDF keyword extraction algorithm, TextRank keyword extraction algorithm, and LDA keyword extraction algorithm are analyzed to understand the keyword extraction process, and the TF-IDF algorithm is optimized by Zipf’s law. By introducing the idea of model fusion, five schemes based on waterfall fusion and parallel combination fusion are designed, and the effects of the five schemes are verified by experiments. It is found that the designed extraction technology has a good effect on network news data extraction. News keyword extraction has a great application prospect, which can provide the basis for the research fields of news key phrases, news abstracts, and so on.

Download Full-text

Using the Ship-Gram Model for Japanese Keyword Extraction Based on News Reports

Complexity ◽

10.1155/2021/9965843 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Miao Teng

Keyword(s):

Experimental Studies ◽

Main Idea ◽

Graph Model ◽

Structural Features ◽

Keyword Extraction ◽

Semantic Features ◽

News Reports ◽

Semantic Coherence ◽

Depth Study ◽

Cosine Distance

In this paper, we conduct an in-depth study of Japanese keyword extraction from news reports, train external computer document word sets from text preprocessing into word vectors using the Ship-gram model in the deep learning tool Word2Vec, and calculate the cosine distance between word vectors. In this paper, the sliding window in TextRank is designed to connect internal document information to improve the in-text semantic coherence. The main idea is to use not only the statistical and structural features of words but also the semantic features of words extracted through word-embedding techniques, i.e., multifeature fusion, to obtain the importance weights of words themselves and the attraction weights between words and then iteratively calculate the final weight of each word through the graph model algorithm to determine the extracted keywords. To verify the performance of the algorithm, extensive simulation experimental studies were conducted on three different types of datasets. The experimental results show that the proposed keyword extraction algorithm can improve the performance by a maximum of 6.45% and 20.36% compared with the existing word frequency statistics and graph model methods, respectively; MF-Rank can achieve a maximum performance improvement of 1.76% compared with PW-TF.

Download Full-text