A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

2018 ◽  
Vol 6 (3) ◽  
pp. 67-78
Author(s):  
Tian Nie ◽  
Yi Ding ◽  
Chen Zhao ◽  
Youchao Lin ◽  
Takehito Utsuro

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.

Author(s):  
Rizwan Ur Rahman ◽  
Rishu Verma ◽  
Himani Bansal ◽  
Deepak Singh Tomar

With the explosive expansion of information on the world wide web, search engines are becoming more significant in the day-to-day lives of humans. Even though a search engine generally gives huge number of results for certain query, the majority of the search engine users simply view the first few web pages in result lists. Consequently, the ranking position has become a most important concern of internet service providers. This article addresses the vulnerabilities, spamming attacks, and countermeasures in blogging sites. In the first part, the article explores the spamming types and detailed section on vulnerabilities. In the next part, an attack scenario of form spamming is presented, and defense approach is presented. Consequently, the aim of this article is to provide review of vulnerabilities, threats of spamming associated with blogging websites, and effective measures to counter them.


2017 ◽  
pp. 030-050
Author(s):  
J.V. Rogushina ◽  

Problems associated with the improve ment of information retrieval for open environment are considered and the need for it’s semantization is grounded. Thecurrent state and prospects of development of semantic search engines that are focused on the Web information resources processing are analysed, the criteria for the classification of such systems are reviewed. In this analysis the significant attention is paid to the semantic search use of ontologies that contain knowledge about the subject area and the search users. The sources of ontological knowledge and methods of their processing for the improvement of the search procedures are considered. Examples of semantic search systems that use structured query languages (eg, SPARQL), lists of keywords and queries in natural language are proposed. Such criteria for the classification of semantic search engines like architecture, coupling, transparency, user context, modification requests, ontology structure, etc. are considered. Different ways of support of semantic and otology based modification of user queries that improve the completeness and accuracy of the search are analyzed. On base of analysis of the properties of existing semantic search engines in terms of these criteria, the areas for further improvement of these systems are selected: the development of metasearch systems, semantic modification of user requests, the determination of an user-acceptable transparency level of the search procedures, flexibility of domain knowledge management tools, increasing productivity and scalability. In addition, the development of means of semantic Web search needs in use of some external knowledge base which contains knowledge about the domain of user information needs, and in providing the users with the ability to independent selection of knowledge that is used in the search process. There is necessary to take into account the history of user interaction with the retrieval system and the search context for personalization of the query results and their ordering in accordance with the user information needs. All these aspects were taken into account in the design and implementation of semantic search engine "MAIPS" that is based on an ontological model of users and resources cooperation into the Web.


2016 ◽  
Vol 6 (2) ◽  
pp. 41-65 ◽  
Author(s):  
Sheetal A. Takale ◽  
Prakash J. Kulkarni ◽  
Sahil K. Shah

Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process: identification of relevant theme clusters, and selection of top ranking sentences to form summarized result for user query. A new approach to semantic similarity computation using semantic roles and semantic meaning is proposed. Document clustering is effectively achieved by application of MDL principle and sentence clustering and ranking is done by using SNMF. Experiments conducted demonstrate the effectiveness of system in semantic text understanding, document clustering and summarization.


2017 ◽  
Author(s):  
Xi Zhu ◽  
Xiangmiao Qiu ◽  
Dingwang Wu ◽  
Shidong Chen ◽  
Jiwen Xiong ◽  
...  

BACKGROUND All electronic health practices like app/software are involved in web search engine due to its convenience for receiving information. The success of electronic health has link with the success of web search engines in field of health. Yet information reliability from search engine results remains to be evaluated. A detail analysis can find out setbacks and bring inspiration. OBJECTIVE Find out reliability of women epilepsy related information from the searching results of main search engines in China. METHODS Six physicians conducted the search work every week. Search key words are one kind of AEDs (valproate acid/oxcarbazepine/levetiracetam/ lamotrigine) plus "huaiyun"/"renshen", both of which means pregnancy in Chinese. The search were conducted in different devices (computer/cellphone), different engines (Baidu/Sogou/360). Top ten results of every search result page were included. Two physicians classified every results into 9 categories according to their contents and also evaluated the reliability. RESULTS A total of 16411 searching results were included. 85.1% of web pages were with advertisement. 55% were categorized into question and answers according to their contents. Only 9% of the searching results are reliable, 50.7% are partly reliable, 40.3% unreliable. With the ranking of the searching results higher, advertisement up and the proportion of those unreliable increase. All contents from hospital websites are unreliable at all and all from academic publishing are reliable. CONCLUSIONS Several first principles must be emphasized to further the use of web search engines in field of healthcare. First, identification of registered physicians and development of an efficient system to guide the patients to physicians guarantee the quality of information provided. Second, corresponding department should restrict the excessive advertisement sale trades in healthcare area by specific regulations to avoid negative impact on patients. Third, information from hospital websites should be carefully judged before embracing them wholeheartedly.


Author(s):  
GAURAV AGARWAL ◽  
SACHI GUPTA ◽  
SAURABH MUKHERJEE

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.


2012 ◽  
pp. 217-238 ◽  
Author(s):  
Orland Hoeber

People commonly experience difficulties when searching the Web, arising from an incomplete knowledge regarding their information needs, an inability to formulate accurate queries, and a low tolerance for considering the relevance of the search results. While simple and easy to use interfaces have made Web search universally accessible, they provide little assistance for people to overcome the difficulties they experience when their information needs are more complex than simple fact-verification. In human-centred Web search, the purpose of the search engine expands from a simple information retrieval engine to a decision support system. People are empowered to take an active role in the search process, with the search engine supporting them in developing a deeper understanding of their information needs, assisting them in crafting and refining their queries, and aiding them in evaluating and exploring the search results. In this chapter, recent research in this domain is outlined and discussed.


Author(s):  
Yasufumi Takama ◽  
Takuya Tezuka ◽  
Hiroki Shibata ◽  
Lieu-Hen Chen ◽  
◽  
...  

This paper estimates users’ search intents when using the context search engine (CSE) by analyzing submitted queries. Recently, due to the increase in the amount of information on the Web and the diversification of information needs, the gap between user’s information needs and a basic search function provided by existing web search engines becomes larger. As a solution to this problem, the CSE that limits its tasks to answer questions about temporal trends has been proposed. It provides three primitive search functions, which users can use in accordance with their purposes. Furthermore, if the system can estimate users’ search intents, it can provide more user-friendly services that contribute the improvement of search efficiency. Aiming at estimating users’ search intents only from submitted queries, this paper analyzes the characteristics of queries in terms of typical search intents when using CSE, and defines classification rules. To show the potential use of the estimated search intents, this paper introduces a learning to rank into CSE. Experimental results show that MAP (mean average precision) is improved by learning rank models separately for different search intents.


2014 ◽  
Vol 971-973 ◽  
pp. 1870-1873
Author(s):  
Xiao Gang Dong

Web search engine based on DNS, the standard proposed solution of IETF for public web search system, is introduced in this paper. Now no web search engine can cover more than 60 percent of all the pages on Internet. The update interval of most pages database is almost one month. This condition hasn't changed for many years. Converge and recency problems have become the bottleneck problem of current web search engine. To solve these problems, a new system, search engine based on DNS is proposed in this paper. This system adopts the hierarchical distributed architecture like DNS, which is different from any current commercial search engine. In theory, this system can cover all the web pages on Internet. Its update interval could even be one day. The original idea, detailed content and implementation of this system all are introduced in this paper.


Our existing society is totally dependent on web search to fulfill our daily requirements. Therefore millions of web pages are accessed every day. To fulfill user need number of websites and webpages are added .The growing size of web data results to the difficulty in attaining useful information with a minimum clicks. This results to the acquisition of personalization a major place in Web search. But the use of personalization breaches privacy in searching. Personalization with privacy is leading issue in current web environment. This paper aims at user satisfaction by using user identification based personalization approach in web search engine. Beside personalization the proposed model creates privacy during personalization. The proposed system will prove to be user friendly with less efforts and privacy concern.


Sign in / Sign up

Export Citation Format

Share Document