Keyphrase Extraction from Chinese News Web Pages Based on Semantic Relations

Author(s):  
Fei Xie ◽  
Xindong Wu ◽  
Xue-Gang Hu ◽  
Fei-Yue Wang
2019 ◽  
Vol 3 (3) ◽  
pp. 58 ◽  
Author(s):  
Tim Haarman ◽  
Bastiaan Zijlema ◽  
Marco Wiering

Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to clean corpora such as abstracts and articles from academic journals or sets of scraped texts from a single domain. However, textual data from web pages differ from normal text documents, as it is structured using HTML elements and often consists of many small fragments. These elements are furthermore used in a highly inconsistent manner and are likely to contain noise. We evaluated the keyphrases extracted by several state-of-the-art extraction methods and found that they did not transfer well to web pages. We therefore propose WebEmbedRank, an adaptation of a recently proposed extraction method that can make use of structural information in web pages in a robust manner. We compared this novel method to other baselines and state-of-the-art methods using a manually annotated dataset and found that WebEmbedRank achieved significant improvements over existing extraction methods on web pages.


2015 ◽  
Vol 21 (5) ◽  
pp. 661-664
Author(s):  
ZORNITSA KOZAREVA ◽  
VIVI NASTASE ◽  
RADA MIHALCEA

Graph structures naturally model connections. In natural language processing (NLP) connections are ubiquitous, on anything between small and web scale. We find them between words – as grammatical, collocation or semantic relations – contributing to the overall meaning, and maintaining the cohesive structure of the text and the discourse unity. We find them between concepts in ontologies or other knowledge repositories – since the early ages of artificial intelligence, associative or semantic networks have been proposed and used as knowledge stores, because they naturally capture the language units and relations between them, and allow for a variety of inference and reasoning processes, simulating some of the functionalities of the human mind. We find them between complete texts or web pages, and between entities in a social network, where they model relations at the web scale. Beyond the more often encountered ‘regular’ graphs, hypergraphs have also appeared in our field to model relations between more than two units.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 167507-167520 ◽  
Author(s):  
Huiting Liu ◽  
Lili Wang ◽  
Peng Zhao ◽  
Xindong Wu

Author(s):  
Ali I. El-Dsouky ◽  
Hesham A. Ali ◽  
Rabab Samy Rashed

With the rapid growth of the World Wide Web comes the need for a fast and accurate way to reach the information required. Search engines play an important role in retrieving the required information for users. Ranking algorithms are an important step in search engines so that the user could retrieve the pages most relevant to his query In this work, the authors present a method for utilizing genealogical information from ontology to find the suitable hierarchical concepts for query extension, and ranking web pages based on semantic relations of the hierarchical concepts related to query terms, taking into consideration the hierarchical relations of domain searched (sibling, synonyms and hyponyms) by different weighting based on AHP method. So, it provides an accurate solution for ranking documents when compared to the three common methods.


Author(s):  
Chandrakala Arya ◽  
◽  
Sanjay k. Dwivedi

Author(s):  
Emre Tolga Ayan ◽  
Rabia Arslan ◽  
Muhammed Said Zengin ◽  
Haci Ali Duru ◽  
Sedat Salman ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document