Users’ Click and Bookmark Based Personalization Using Modified Agglomerative Clustering for Web Search Engine

T. Dhiliphan Rajkumar; S. P. Raja; A. Suruliandi

doi:10.1142/s0218213017300022

Research and Improvement on Content-Based Web Search Engine

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1282 ◽

2012 ◽

Vol 532-533 ◽

pp. 1282-1286

Author(s):

Zhi Chao Lin ◽

Lei Sun ◽

Xiao Liu

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Full Text ◽

Search Engines ◽

Query Expansion ◽

Web Search ◽

Text Search ◽

Web Search Engine ◽

Query Word ◽

The Web

There is a lot of information contained in the World Wide Web. It has become a research focus to obtain the required related resources quickly and accurately from the web through the content-based search engines. Most current tools of full text web search engine, such as Lucene which is a widely used open source retrieval library in information retrieval field, are purely keyword based. This may not sufficient for users to retrieve in the web. In this paper, we employ a method to overcome the limitations of current full text search engines in represent of Lucene. We propose a Query Expansion and Information Retrieval approach which can help users to acquire more accurate contents from the web. The Query Expansion component finds expanded candidate words of the query word through WordNet which contains synonyms in several different senses; In the Information Retrieval component, the query word and its candidate words are used together as the input of the search module to get the result items. Furthermore, we can put the result items into different classes based on the expansion. Some experiments and the results are described in the late part of this paper.

Download Full-text

A Study on Web Searching

Intelligent Agents for Data Mining and Information Retrieval ◽

10.4018/978-1-59140-194-0.ch014 ◽

2004 ◽

pp. 208-225

Author(s):

Shanfeng Zhu ◽

Xiaotie Deng ◽

Qizhi Fang ◽

Weimin Zhang

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Experimental Results ◽

Web Searching ◽

Search Results ◽

Total Index ◽

Depth Study ◽

Web Search Engines ◽

The Web

Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried out to estimate the size and overlap of the general web search engines, it may not benefit the ordinary web searching users, since they care more about the overlap of the top N (N=10, 20 or 50) search results on concrete queries, but not the overlap of the total index database. In this study, we present experimental results on the comparison of the overlap of the top N (N=10, 20 or 50) search results from AlltheWeb, Google, AltaVista and WiseNut for the 58 most popular queries, as well as for the distance of the overlapped results. These 58 queries are chosen from WordTracker service, which records the most popular queries submitted to some famous metasearch engines, such as MetaCrawler and Dogpile. We divide these 58 queries into three categories for further investigation. Through in-depth study, we observe a number of interesting results: the overlap of the top N results retrieved by different search engines is very small; the search results of the queries in different categories behave in dramatically different ways; Google, on average, has the highest overlap among these four search engines; each search engine tends to adopt a different rank algorithm independently.

Download Full-text

A Study on Web Searching

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch115 ◽

2008 ◽

pp. 1926-1937

Author(s):

Shanfeng Chu ◽

Xiaotie Deng ◽

Qizhi Fang ◽

Weimin Zhang

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Experimental Results ◽

Web Searching ◽

Search Results ◽

Total Index ◽

Depth Study ◽

Web Search Engines ◽

The Web

Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried out to estimate the size and overlap of the general web search engines, it may not benefit the ordinary web searching users, since they care more about the overlap of the top N (N=10, 20 or 50) search results on concrete queries, but not the overlap of the total index database. In this study, we present experimental results on the comparison of the overlap of the top N (N=10, 20 or 50) search results from AlltheWeb, Google, AltaVista and WiseNut for the 58 most popular queries, as well as for the distance of the overlapped results. These 58 queries are chosen from WordTracker service, which records the most popular queries submitted to some famous metasearch engines, such as MetaCrawler and Dogpile. We divide these 58 queries into three categories for further investigation. Through in-depth study, we observe a number of interesting results: the overlap of the top N results retrieved by different search engines is very small; the search results of the queries in different categories behave in dramatically different ways; Google, on average, has the highest overlap among these four search engines; each search engine tends to adopt a different rank algorithm independently.

Download Full-text

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Download Full-text

Associating Searching on Search Engines to Subsequent Searching on Sites

International Journal of Information Systems in the Service Sector ◽

10.4018/ijisss.2016040103 ◽

2016 ◽

Vol 8 (2) ◽

pp. 30-43

Author(s):

Adan Ortiz-Cordova ◽

Bernard J. Jansen

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Research Study ◽

Search Queries ◽

Web Search Engine ◽

Search Patterns ◽

Search Information

In this research study, the authors investigate the association between external searching, which is searching on a web search engine, and internal searching, which is searching on a website. They classify 295,571 external – internal searches where each search is composed of a search engine query that is submitted to a web search engine and then one or more subsequent queries submitted to a commercial website by the same user. The authors examine 891,453 queries from all searches, of which 295,571 were external search queries and 595,882 were internal search queries. They algorithmically classify all queries into states, and then clustered the searching episodes into major searching configurations and identify the most commonly occurring search patterns for both external, internal, and external-to-internal searching episodes. The research implications of this study are that external sessions and internal sessions must be considered as part of a continuous search episode and that online businesses can leverage external search information to more effectively target potential consumers.

Download Full-text

Web Search Engine Architectures and their Performance Analysis

Handbook of Research on Web Information Systems Quality ◽

10.4018/978-1-59904-847-5.ch028 ◽

2011 ◽

pp. 491-509

Author(s):

Xiannong Meng

Keyword(s):

Performance Analysis ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

General Purpose ◽

Performance Measurements ◽

Web Documents ◽

System Architectures ◽

Web Search Engine ◽

And Performance

This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a general-purpose search engine covered in this survey include system architectures, information retrieval theories as the basis of Web search, indexing and ranking of Web documents, relevance feedback and machine learning, personalization, and performance measurements. The objectives of the chapter are to review the theories and technologies pertaining to Web search, and help us understand how Web search engines work and how to use the search engines more effectively and efficiently.

Download Full-text

Deep Web

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch062 ◽

2009 ◽

pp. 581-588 ◽

Cited By ~ 5

Author(s):

Denis Shestakov

Keyword(s):

Search Engines ◽

Large Scale ◽

Web Search ◽

Web Database ◽

Web Search Engine ◽

Search Form ◽

Complete Set ◽

Web Crawlers ◽

Pass Through ◽

The Web

Finding information on the Web using a web search engine is one of the primary activities of today’s web users. For a majority of users results returned by conventional search engines are an essentially complete set of links to all pages on the Web relevant to their queries. However, currentday searchers do not crawl and index a significant portion of the Web and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the nonindexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages which embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale.

Download Full-text

Enhancing Web Search through Query Log Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch083 ◽

2011 ◽

pp. 438-442

Author(s):

Ji-Rong Wen

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Web Mining ◽

Web Search ◽

Information Source ◽

Query Log ◽

Additional Information ◽

Query Logs ◽

Query Log Mining ◽

The Web

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.

Download Full-text

Eysenbach, Tuische and Diepgen’s Evaluation of Web Searching for Identifying Unpublished Studies for Systematic Reviews: An Innovative Study Which is Still Relevant Today

Evidence Based Library and Information Practice ◽

10.18438/b8f049 ◽

2016 ◽

Vol 11 (3) ◽

pp. 108

Author(s):

Simon Briscoe

Keyword(s):

Systematic Review ◽

Clinical Trials ◽

Search Engine ◽

Systematic Reviews ◽

Search Engines ◽

Web Search ◽

Web Searching ◽

Web Searches ◽

Web Search Engine ◽

Unpublished Studies

A Review of: Eysenbach, G., Tuische, J. & Diepgen, T.L. (2001). Evaluation of the usefulness of Internet searches to identify unpublished clinical trials for systematic reviews. Medical Informatics and the Internet in Medicine, 26(3), 203-218. http://dx.doi.org/10.1080/14639230110075459 Objective – To consider whether web searching is a useful method for identifying unpublished studies for inclusion in systematic reviews. Design – Retrospective web searches using the AltaVista search engine were conducted to identify unpublished studies – specifically, clinical trials – for systematic reviews which did not use a web search engine. Setting – The Department of Clinical Social Medicine, University of Heidelberg, Germany. Subjects – n/a Methods – Pilot testing of 11 web search engines was carried out to determine which could handle complex search queries. Pre-specified search requirements included the ability to handle Boolean and proximity operators, and truncation searching. A total of seven Cochrane systematic reviews were randomly selected from the Cochrane Library Issue 2, 1998, and their bibliographic database search strategies were adapted for the web search engine, AltaVista. Each adaptation combined search terms for the intervention, problem, and study type in the systematic review. Hints to planned, ongoing, or unpublished studies retrieved by the search engine, which were not cited in the systematic reviews, were followed up by visiting websites and contacting authors for further details when required. The authors of the systematic reviews were then contacted and asked to comment on the potential relevance of the identified studies. Main Results – Hints to 14 unpublished and potentially relevant studies, corresponding to 4 of the 7 randomly selected Cochrane systematic reviews, were identified. Out of the 14 studies, 2 were considered irrelevant to the corresponding systematic review by the systematic review authors. The relevance of a further three studies could not be clearly ascertained. This left nine studies which were considered relevant to a systematic review. In addition to this main finding, the pilot study to identify suitable search engines found that AltaVista was the only search engine able to handle the complex searches required to search for unpublished studies. Conclusion –Web searches using a search engine have the potential to identify studies for systematic reviews. Web search engines have considerable limitations which impede the identification of studies.

Download Full-text

An investigation of biases in web search engine query suggestions

Online Information Review ◽

10.1108/oir-11-2018-0341 ◽

2019 ◽

Vol 44 (2) ◽

pp. 365-381 ◽

Cited By ~ 1

Author(s):

Malte Bonart ◽

Anastasiia Samokhina ◽

Gernot Heisenberg ◽

Philipp Schaer

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Query Suggestion ◽

Data Set ◽

Content Type ◽

Web Search Engine ◽

The Stability ◽

Query Suggestions ◽

Over Time

Purpose Survey-based studies suggest that search engines are trusted more than social media or even traditional news, although cases of false information or defamation are known. The purpose of this paper is to analyze query suggestion features of three search engines to see if these features introduce some bias into the query and search process that might compromise this trust. The authors test the approach on person-related search suggestions by querying the names of politicians from the German Bundestag before the German federal election of 2017. Design/methodology/approach This study introduces a framework to systematically examine and automatically analyze the varieties in different query suggestions for person names offered by major search engines. To test the framework, the authors collected data from the Google, Bing and DuckDuckGo query suggestion APIs over a period of four months for 629 different names of German politicians. The suggestions were clustered and statistically analyzed with regards to different biases, like gender, party or age and with regards to the stability of the suggestions over time. Findings By using the framework, the authors located three semantic clusters within the data set: suggestions related to politics and economics, location information and personal and other miscellaneous topics. Among other effects, the results of the analysis show a small bias in the form that male politicians receive slightly fewer suggestions on “personal and misc” topics. The stability analysis of the suggested terms over time shows that some suggestions are prevalent most of the time, while other suggestions fluctuate more often. Originality/value This study proposes a novel framework to automatically identify biases in web search engine query suggestions for person-related searches. Applying this framework on a set of person-related query suggestions shows first insights into the influence search engines can have on the query process of users that seek out information on politicians.

Download Full-text