An Intelligent Web Search Using Multi-Document Summarization

2016 ◽  
Vol 6 (2) ◽  
pp. 41-65 ◽  
Author(s):  
Sheetal A. Takale ◽  
Prakash J. Kulkarni ◽  
Sahil K. Shah

Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process: identification of relevant theme clusters, and selection of top ranking sentences to form summarized result for user query. A new approach to semantic similarity computation using semantic roles and semantic meaning is proposed. Document clustering is effectively achieved by application of MDL principle and sentence clustering and ranking is done by using SNMF. Experiments conducted demonstrate the effectiveness of system in semantic text understanding, document clustering and summarization.

2021 ◽  
Author(s):  
Xiangyi Chen

Text, link and usage information are the most commonly used sources in the ranking algorithm of a web search engine. In this thesis, we argue that the quality of the web pages such as the performance of the page delivery (e.g. reliability and response time) should also play an important role in ranking, especially for users with a slow Internet connection or mobile users. Based on this principle, if two pages have the same level of relevancy to a query, the one with a higher delivery quality (e.g. faster response) should be ranked higher. We define several important attributes for the Quality of Service (QoS) and explain how we rank the web pages based on these algorithms. In addition, while combining those QoS attributes, we have tested and compared different aggregation algorithms. The experiment results show that our proposed algorithms can promote the pages with a higher delivery quality to higher positions in the result list, which is beneficial to users to improve their overall experiences of using the search engine and QoS based re-ranking algorithm always gets the best performance.


2021 ◽  
Author(s):  
Xiangyi Chen

Text, link and usage information are the most commonly used sources in the ranking algorithm of a web search engine. In this thesis, we argue that the quality of the web pages such as the performance of the page delivery (e.g. reliability and response time) should also play an important role in ranking, especially for users with a slow Internet connection or mobile users. Based on this principle, if two pages have the same level of relevancy to a query, the one with a higher delivery quality (e.g. faster response) should be ranked higher. We define several important attributes for the Quality of Service (QoS) and explain how we rank the web pages based on these algorithms. In addition, while combining those QoS attributes, we have tested and compared different aggregation algorithms. The experiment results show that our proposed algorithms can promote the pages with a higher delivery quality to higher positions in the result list, which is beneficial to users to improve their overall experiences of using the search engine and QoS based re-ranking algorithm always gets the best performance.


2012 ◽  
Vol 532-533 ◽  
pp. 1752-1756 ◽  
Author(s):  
Jun Ya Yan ◽  
Xiao Hui Ma ◽  
Wen Juan Zhao

The development of the internet and exponential growth of network information produce a large number of duplicated pages on the network, reducing the retrieval of recall and precision and affecting the retrieval efficiency. The accuracy of the web, therefore, influences the quality of search engine. On the basis of the structural text description, this paper proposes an improved eliminating repetitive algorithm method, which is based on MD5 of Near-replicas. It proves that the method has a good effect on improving the recall and the precision through experiment.


2013 ◽  
Vol 25 ◽  
pp. 189-203 ◽  
Author(s):  
Dominik Schlosser

This paper attempts to give an overview of the different representations of the pilgrimage to Mecca found in the ‘liminal space’ of the internet. For that purpose, it examines a handful of emblematic examples of how the hajj is being presented and discussed in cyberspace. Thereby, special attention shall be paid to the question of how far issues of religious authority are manifest on these websites, whether the content providers of web pages appoint themselves as authorities by scrutinizing established views of the fifth pillar of Islam, or if they upload already printed texts onto their sites in order to reiterate normative notions of the pilgrimage to Mecca, or of they make use of search engine optimisation techniques, thus heightening the very visibility of their online presence and increasing the possibility of becoming authoritative in shaping internet surfers’ perceptions of the hajj.


2012 ◽  
Vol 170-173 ◽  
pp. 3431-3435
Author(s):  
Yan Chyuan Shiau ◽  
Lian Ting Lu ◽  
Tai Yu Chen ◽  
Chih Ying Lee

As the living quality of the citizens gradually improved, traveling becomes an important recreational activity. The internet speedily provides information related to tour sites. However, web pages generally present merely words and pictures that are not impressive enough to the viewers. Spatial concepts, distance calculation, and tools for vacation planning are also often not provided by the websites. This study combines the usage of 3dSpace, GoogleMap, ER Model, Windows Mobile, SuperPad. It gathers the tour-sites related information of HsinChu city; such as local restaurants, famous attractions, and high rating hotels in the area. The study develops search interface integrated with the Google Map Engine. After selecting of category and input of specific key words, the related information of specific location and 360° satellite image could be shown on browser. The route calculation trips between local attractions is provided on this project. This investigation combines the GPS function to the smart phone, helping the users to arrive at their destination correctly within the minimum time.


Author(s):  
Rizwan Ur Rahman ◽  
Rishu Verma ◽  
Himani Bansal ◽  
Deepak Singh Tomar

With the explosive expansion of information on the world wide web, search engines are becoming more significant in the day-to-day lives of humans. Even though a search engine generally gives huge number of results for certain query, the majority of the search engine users simply view the first few web pages in result lists. Consequently, the ranking position has become a most important concern of internet service providers. This article addresses the vulnerabilities, spamming attacks, and countermeasures in blogging sites. In the first part, the article explores the spamming types and detailed section on vulnerabilities. In the next part, an attack scenario of form spamming is presented, and defense approach is presented. Consequently, the aim of this article is to provide review of vulnerabilities, threats of spamming associated with blogging websites, and effective measures to counter them.


Author(s):  
Ricardo Barros ◽  
Geraldo Xexéo ◽  
Wallace A. Pinheiro ◽  
Jano de Souza

Currently, in the Web environment, users have to deal with an enormous amount of information. In a Web search, they often receive useless, replicated, outdated, or false data, which, at first, they have no means to assess. Web search engines provide good examples of these problems: As reply from these mechanisms, users usually find links to replicated or conflicting information. Further, in these cases, information is spread out among heterogeneous and unrelated data sources, that normally present different information quality approaches. This chapter addresses those issues by proposing a Web Metadata-Based Model to evaluate and recommend Web pages based on their information quality, as predicted by their metadata. We adopt a fuzzy theory approach to obtain the values of quality dimensions from metadata values and to evaluate the quality of information, taking advantage of fuzzy logic’s ability to capture humans’ imprecise knowledge and deal with different concepts.


2018 ◽  
Vol 6 (3) ◽  
pp. 67-78
Author(s):  
Tian Nie ◽  
Yi Ding ◽  
Chen Zhao ◽  
Youchao Lin ◽  
Takehito Utsuro

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.


2014 ◽  
Vol 687-691 ◽  
pp. 1908-1911
Author(s):  
Wei Zhong Huang

The universal search engine, which is widely used now, has significantly improved the efficiency of retrieving information. According to CNNIC (China Internet Network Information Center) 26th Internet survey, the search takes up 76.30% for absolute advantage as a major way for users to obtain information from the Internet. Among almost all the surveys of using on the Internet in the world, search engine is second only to e-mail service. But with the growth of a wide range of information, these universal search engines can not meet people's needs either in retrieval precision or in retrieval efficiency when retrieving information on a subject or topic. That's because as long as the user enters the same keywords, the feedbacks of universal search engine are just the same. Universal search engine does not take the differences in interests and needs between different users, which often exist, into account. For example, dentists and ceramics enthusiasts would hold different concerns about the term "ceramic". In order to be more rapid, accurate and efficient in retrieving information on particular subject or theme, it is essential to develop information retrieval systems on specific areas, that is, the domain-specific search engine.


2017 ◽  
Author(s):  
Xi Zhu ◽  
Xiangmiao Qiu ◽  
Dingwang Wu ◽  
Shidong Chen ◽  
Jiwen Xiong ◽  
...  

BACKGROUND All electronic health practices like app/software are involved in web search engine due to its convenience for receiving information. The success of electronic health has link with the success of web search engines in field of health. Yet information reliability from search engine results remains to be evaluated. A detail analysis can find out setbacks and bring inspiration. OBJECTIVE Find out reliability of women epilepsy related information from the searching results of main search engines in China. METHODS Six physicians conducted the search work every week. Search key words are one kind of AEDs (valproate acid/oxcarbazepine/levetiracetam/ lamotrigine) plus "huaiyun"/"renshen", both of which means pregnancy in Chinese. The search were conducted in different devices (computer/cellphone), different engines (Baidu/Sogou/360). Top ten results of every search result page were included. Two physicians classified every results into 9 categories according to their contents and also evaluated the reliability. RESULTS A total of 16411 searching results were included. 85.1% of web pages were with advertisement. 55% were categorized into question and answers according to their contents. Only 9% of the searching results are reliable, 50.7% are partly reliable, 40.3% unreliable. With the ranking of the searching results higher, advertisement up and the proportion of those unreliable increase. All contents from hospital websites are unreliable at all and all from academic publishing are reliable. CONCLUSIONS Several first principles must be emphasized to further the use of web search engines in field of healthcare. First, identification of registered physicians and development of an efficient system to guide the patients to physicians guarantee the quality of information provided. Second, corresponding department should restrict the excessive advertisement sale trades in healthcare area by specific regulations to avoid negative impact on patients. Third, information from hospital websites should be carefully judged before embracing them wholeheartedly.


Sign in / Sign up

Export Citation Format

Share Document