SCOPAS — SEMANTIC COMPUTATION OF PAGE SCORE

This paper presents a novel model for scoring web pages, entitled SCOPAS (Semantic COmputation of PAge Score). With the prolific growth in the number of users of World Wide Web and the heterogeneity of their information needs, it becomes mandatory to evaluate the relevance of a web page in terms of user specific requirements. SCOPAS is aimed at modeling the web pages to facilitate efficient evaluation by harnessing the inherent features of the page in terms of its content and structure. The proposed model further enriches the scoring procedure by fine-graining the evaluation to a micro level through segmentation of the page. A variable magnitude, multi-dimensional approach is proposed for evaluating each of the segments by incorporating the relevance of intra-segment level components. The user-interest is captured with the help of FOAF (Friend Of A Friend) Ontology to achieve personalized page scoring. The generic SCOPAS model is extended to SCOPAS-Rank, which explores utilization of the model in improving the web search engine's result ordering. A prototype implementation of the proposed SCOPAS-Rank model is developed and experiments were conducted on it. The results of the experiments validate the effectiveness of the proposed model.

Download Full-text

Classification of means and methods of the Web semantic retrieval

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2017.01.030 ◽

2017 ◽

pp. 030-050

Author(s):

J.V. Rogushina ◽

Keyword(s):

Search Engines ◽

Domain Knowledge ◽

Information Needs ◽

Web Search ◽

User Interaction ◽

Query Languages ◽

Semantic Search ◽

Semantic Retrieval ◽

The Web

Problems associated with the improve ment of information retrieval for open environment are considered and the need for it’s semantization is grounded. Thecurrent state and prospects of development of semantic search engines that are focused on the Web information resources processing are analysed, the criteria for the classification of such systems are reviewed. In this analysis the significant attention is paid to the semantic search use of ontologies that contain knowledge about the subject area and the search users. The sources of ontological knowledge and methods of their processing for the improvement of the search procedures are considered. Examples of semantic search systems that use structured query languages (eg, SPARQL), lists of keywords and queries in natural language are proposed. Such criteria for the classification of semantic search engines like architecture, coupling, transparency, user context, modification requests, ontology structure, etc. are considered. Different ways of support of semantic and otology based modification of user queries that improve the completeness and accuracy of the search are analyzed. On base of analysis of the properties of existing semantic search engines in terms of these criteria, the areas for further improvement of these systems are selected: the development of metasearch systems, semantic modification of user requests, the determination of an user-acceptable transparency level of the search procedures, flexibility of domain knowledge management tools, increasing productivity and scalability. In addition, the development of means of semantic Web search needs in use of some external knowledge base which contains knowledge about the domain of user information needs, and in providing the users with the ability to independent selection of knowledge that is used in the search process. There is necessary to take into account the history of user interaction with the retrieval system and the search context for personalization of the query results and their ordering in accordance with the user information needs. All these aspects were taken into account in the design and implementation of semantic search engine "MAIPS" that is based on an ontological model of users and resources cooperation into the Web.

Download Full-text

AntWeb—Web Search Based on Ant Behavior

Emerging Technologies of Text Mining ◽

10.4018/978-1-59904-373-9.ch010 ◽

2008 ◽

pp. 208-222

Author(s):

Li Weigang ◽

Wu Man Qi

Keyword(s):

Web Mining ◽

Web Search ◽

Theory Model ◽

Web Pages ◽

Web Portal ◽

Knowledge Based ◽

Log Files ◽

Ant Behavior ◽

Shortest Route ◽

The Web

This chapter presents a study of Ant Colony Optimization (ACO) to Interlegis Web portal, Brazilian legislation Website. The approach of AntWeb is inspired by ant colonies foraging behavior to adaptively mark the most significant link by means of the shortest route to arrive the target pages. The system considers the users in the Web portal as artificial ants and the links among the pages of the Web pages as the researching network. To identify the group of the visitors, Web mining is applied to extract knowledge based on preprocessing Web log files. The chapter describes the theory, model, main utilities and implementation of AntWeb prototype in Interlegis Web portal. The case study shows Off-line Web mining; simulations without and with the use of AntWeb; testing by modification of the parameters. The result demonstrates the sensibility and accessibility of AntWeb and the benefits for the Interlegis Web users.

Download Full-text

Enhancing Web Search through Web Structure Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch084 ◽

2011 ◽

pp. 443-447

Author(s):

Ji-Rong Wen

Keyword(s):

Information Retrieval ◽

Web Search ◽

Product Information ◽

Semantic Representation ◽

Web Pages ◽

Search Performance ◽

Information Display ◽

Web Structure Mining ◽

Free Environment ◽

The Web

The Web is an open and free environment for people to publish and get information. Everyone on the Web can be either an author, a reader, or both. The language of the Web, HTML (Hypertext Markup Language), is mainly designed for information display, not for semantic representation. Therefore, current Web search engines usually treat Web pages as unstructured documents, and traditional information retrieval (IR) technologies are employed for Web page parsing, indexing, and searching. The unstructured essence of Web pages seriously blocks more accurate search and advanced applications on the Web. For example, many sites contain structured information about various products. Extracting and integrating product information from multiple Web sites could lead to powerful search functions, such as comparison shopping and business intelligence. However, these structured data are embedded in Web pages, and there are no proper traditional methods to extract and integrate them. Another example is the link structure of the Web. If used properly, information hidden in the links could be taken advantage of to effectively improve search performance and make Web search go beyond traditional information retrieval (Page, Brin, Motwani, & Winograd, 1998, Kleinberg, 1998).

Download Full-text

A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

International Journal of Software Innovation ◽

10.4018/ijsi.2018070105 ◽

2018 ◽

Vol 6 (3) ◽

pp. 67-78

Author(s):

Tian Nie ◽

Yi Ding ◽

Chen Zhao ◽

Youchao Lin ◽

Takehito Utsuro

Keyword(s):

Search Engine ◽

Information Needs ◽

Web Search ◽

Topic Model ◽

Japanese Version ◽

Word Embedding ◽

Coarse Grained ◽

Web Pages ◽

Word Embeddings

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.

Download Full-text

Abstract Concept Instantiation with Context Relevance Measurement

Journal of Web Engineering ◽

10.13052/jwe1540-9589.19562 ◽

2020 ◽

Author(s):

Shengwei Gu ◽

Xiangfeng Luo ◽

Hao Wang ◽

Jing Huang ◽

Subin Huang

Keyword(s):

Web Search ◽

False Negative ◽

Model Performance ◽

Category Structure ◽

Abstract Concept ◽

Contextual Constraint ◽

Proposed Model ◽

The Right ◽

Novel Model ◽

Walk Algorithm

In different contexts, one abstract concept (e.g., fruit) may be mapped into different concrete instance sets, which is called abstract concept instantiation. It has been widely applied in many applications, such as web search, intelligent recommendation, etc. However, in most abstract concept instantiation models have the following problems: (1) the neglect of incorrect label and label incompleteness in the category structure on which instance selection relies; (2) the subjective design of instance profile for calculating the relevance between instance and contextual constraint. The above problems lead to false prediction in terms of abstract concept instantiation. To tackle these problems, we proposed a novel model to instantiate the abstract concept. Firstly, to alleviate the incorrect label and remedy label incompleteness in the category structure, an improved random-walk algorithm is proposed, called InstanceRank, which not only utilize the category information, but it also exploits the association information to infer the right instances of an abstract concept. Secondly, for better measuring the relevance between instances and contextual constraint, we learn the proper instance profile from different granularity ones. They are designed based on the surrounding text of the instance. Finally, noise reduction and instance filtering are introduced to further enhance the model performance. Experiments on Chinese food abstract concept set show that the proposed model can effectively reduce false positive and false negative of instantiation results.

Download Full-text

Semantic Features with Contextual Knowledge-Based Web Page Categorization Using the GloVe Model and Stacked BiLSTM

Symmetry ◽

10.3390/sym13101772 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1772

Author(s):

Amit Kumar Nandanwar ◽

Jaytrilok Choudhary

Keyword(s):

Short Term Memory ◽

Web Pages ◽

Semantic Features ◽

Contextual Knowledge ◽

Web Page ◽

Short Term ◽

Knowledge Based ◽

Proposed Model ◽

Long Short Term Memory ◽

The Web

Internet technologies are emerging very fast nowadays, due to which web pages are generated exponentially. Web page categorization is required for searching and exploring relevant web pages based on users’ queries and is a tedious task. The majority of web page categorization techniques ignore semantic features and the contextual knowledge of the web page. This paper proposes a web page categorization method that categorizes web pages based on semantic features and contextual knowledge. Initially, the GloVe model is applied to capture the semantic features of the web pages. Thereafter, a Stacked Bidirectional long short-term memory (BiLSTM) with symmetric structure is applied to extract the contextual and latent symmetry information from the semantic features for web page categorization. The performance of the proposed model has been evaluated on the publicly available WebKB dataset. The proposed model shows superiority over the existing state-of-the-art machine learning and deep learning methods.

Download Full-text

WEB GRAPH BASED SEARCH BY USING DENSITY OF KEYWORD AND AGE FACTOR

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2013.1124 ◽

2013 ◽

pp. 89-93

Author(s):

GAURAV AGARWAL ◽

SACHI GUPTA ◽

SAURABH MUKHERJEE

Keyword(s):

Search Engine ◽

Web Search ◽

Web Pages ◽

Main Role ◽

Ranking Algorithm ◽

Web Page ◽

Web Crawler ◽

User Requirement ◽

Priority Assignment ◽

The Web

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.

Download Full-text

Search Situations and Transitions

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch264 ◽

2011 ◽

pp. 1735-1740

Author(s):

Nils Pharo

Keyword(s):

Information Search ◽

Information Needs ◽

Web Search ◽

Information Searching ◽

Related Factors ◽

Work Task ◽

Task Knowledge ◽

Web Information ◽

Search Processes ◽

Micro Level

Several studies of Web information searching (Agosto, 2002, Pharo & Järvelin, 2006, Prabha et al. 2007) have pointed out that searchers tend to satisfice. This means that, instead of planning for optimal search outcomes based on the best available knowledge, and on choosing the best information sources for their purpose, they aim at obtaining satisfactory results with a minimum of effort. Thus it is necessary to study other factors than the information needs and sources to explain Web search behaviour. Web information search processes are influenced by the interplay of factors at the micro-level and we need to understand how search process related factors such as the actions performed by the searcher on the system are influenced by various factors, e.g. those related to the searcher’s work task, search task, knowledge about the work task or searching etc. The Search Situation Transition (SST) method schema provides a framework for such analysis.

Download Full-text

Contextualized Clustering in Exploratory Web Search

Emerging Technologies of Text Mining ◽

10.4018/978-1-59904-373-9.ch009 ◽

2008 ◽

pp. 184-207 ◽

Cited By ~ 3

Author(s):

Jon Atle Gulla ◽

Hans Olaf Borch ◽

Jon Espen Ingvaldsen

Keyword(s):

Search Engines ◽

Information Needs ◽

Large Scale ◽

Suffix Tree ◽

Web Search ◽

Amount Of Information ◽

Acceptable Quality ◽

Inherent Problem ◽

Web Search Engines ◽

The Web

Due to the large amount of information on the web and the difficulties of relating user’s expressed information needs to document content, large-scale web search engines tend to return thousands of ranked documents. This chapter discusses the use of clustering to help users navigate through the result sets and explore the domain. A newly developed system, HOBSearch, makes use of suffix tree clustering to overcome many of the weaknesses of traditional clustering approaches. Using result snippets rather than full documents, HOBSearch both speeds up clustering substantially and manages to tailor the clustering to the topics indicated in user’s query. An inherent problem with clustering, though, is the choice of cluster labels. Our experiments with HOBSearch show that cluster labels of an acceptable quality can be generated with no upervision or predefined structures and within the constraints given by large-scale web search.

Download Full-text

Clustering of the Web Search Results in Educational Recommender Systems

Educational Recommender Systems and Technologies ◽

10.4018/978-1-61350-489-5.ch007 ◽

2012 ◽

pp. 154-181 ◽

Cited By ~ 12

Author(s):

Constanta-Nicoleta Bodea ◽

Maria-Iuliana Dascalu ◽

Adina Lipai

Keyword(s):

Recommender Systems ◽

Clustering Algorithm ◽

Web Search ◽

Web Pages ◽

Lexical Database ◽

Assessment Task ◽

Search Results ◽

Meta Search ◽

Search Approach ◽

The Web

This chapter presents a meta-search approach, meant to deliver bibliography from the internet, according to trainees’ results obtained at an e-assessment task. The bibliography consists of web pages related to the knowledge gaps of the trainees. The meta-search engine is part of an education recommender system, attached to an e-assessment application for project management knowledge. Meta-search means that, for a specific query (or mistake made by the trainee), several search mechanisms for suitable bibliography (further reading) could be applied. The lists of results delivered by the standard search mechanisms are used to build thematically homogenous groups using an ontology-based clustering algorithm. The clustering process uses an educational ontology and WordNet lexical database to create its categories. The research is presented in the context of recommender systems and their various applications to the education domain.

Download Full-text