Research and Improvement on Content-Based Web Search Engine

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1282 ◽

2012 ◽

Vol 532-533 ◽

pp. 1282-1286

Author(s):

Zhi Chao Lin ◽

Lei Sun ◽

Xiao Liu

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Full Text ◽

Search Engines ◽

Query Expansion ◽

Web Search ◽

Text Search ◽

Web Search Engine ◽

Query Word ◽

The Web

There is a lot of information contained in the World Wide Web. It has become a research focus to obtain the required related resources quickly and accurately from the web through the content-based search engines. Most current tools of full text web search engine, such as Lucene which is a widely used open source retrieval library in information retrieval field, are purely keyword based. This may not sufficient for users to retrieve in the web. In this paper, we employ a method to overcome the limitations of current full text search engines in represent of Lucene. We propose a Query Expansion and Information Retrieval approach which can help users to acquire more accurate contents from the web. The Query Expansion component finds expanded candidate words of the query word through WordNet which contains synonyms in several different senses; In the Information Retrieval component, the query word and its candidate words are used together as the input of the search module to get the result items. Furthermore, we can put the result items into different classes based on the expansion. Some experiments and the results are described in the late part of this paper.

Users’ Click and Bookmark Based Personalization Using Modified Agglomerative Clustering for Web Search Engine

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213017300022 ◽

2017 ◽

Vol 26 (06) ◽

pp. 1730002 ◽

Cited By ~ 3

Author(s):

T. Dhiliphan Rajkumar ◽

S. P. Raja ◽

A. Suruliandi

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

Irrelevant Information ◽

Experimental Results ◽

Agglomerative Clustering ◽

Semantic Level ◽

Web Search Engine ◽

The Web

Short and ambiguous queries are the major problems in search engines which lead to irrelevant information retrieval for the users’ input. The increasing nature of the information on the web also makes various difficulties for the search engine to provide the users needed results. The web search engine experience the ill effects of ambiguity, since the queries are looked at on a rational level rather than the semantic level. In this paper, for improving the performance of search engine as of the users’ interest, personalization is based on the users’ clicks and bookmarking is proposed. Modified agglomerative clustering is used in this work for clustering the results. The experimental results prove that the proposed work scores better precision, recall and F-score.

Quantification of competitive value of documents

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun200957050285 ◽

2009 ◽

Vol 57 (5) ◽

pp. 285-290

Author(s):

Pavel Šimek ◽

Jiří Vaněk ◽

Jan Jarolímek

Keyword(s):

Search Engine ◽

Market Share ◽

Full Text ◽

Search Engines ◽

Web Site ◽

Optimization Techniques ◽

Text Search ◽

Full Text Search ◽

Google Search ◽

The Web

The majority of Internet users use the global network to search for different information using fulltext search engines such as Google, Yahoo!, or Seznam. The web presentation operators are trying, with the help of different optimization techniques, to get to the top places in the results of fulltext search engines. Right there is a great importance of Search Engine Optimization and Search Engine Marketing, because normal users usually try links only on the first few pages of the fulltext search engines results on certain keywords and in catalogs they use primarily hierarchically higher placed links in each category. Key to success is the application of optimization methods which deal with the issue of keywords, structure and quality of content, domain names, individual sites and quantity and reliability of backward links. The process is demanding, long-lasting and without a guaranteed outcome. A website operator without advanced analytical tools do not identify the contribution of individual documents from which the entire web site consists. If the web presentation operators want to have an overview of their documents and web site in global, it is appropriate to quantify these positions in a specific way, depending on specific key words. For this purpose serves the quantification of competitive value of documents, which consequently sets global competitive value of a web site. Quantification of competitive values is performed on a specific full-text search engine. For each full-text search engine can be and often are, different results. According to published reports of ClickZ agency or Market Share is according to the number of searches by English-speaking users most widely used Google search engine, which has a market share of more than 80%. The whole procedure of quantification of competitive values is common, however, the initial step which is the analysis of keywords depends on a choice of the fulltext search engine.

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Enhancing Web Search through Query Expansion

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch116 ◽

2011 ◽

pp. 752-757 ◽

Cited By ~ 2

Author(s):

Daniel Crabtree

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Query Expansion ◽

Web Search ◽

User Involvement ◽

Semantic Knowledge ◽

Web Pages ◽

Search Performance ◽

Interactive Query ◽

Web Search Engines

Web search engines help users find relevant web pages by returning a result set containing the pages that best match the user’s query. When the identified pages have low relevance, the query must be refined to capture the search goal more effectively. However, finding appropriate refinement terms is difficult and time consuming for users, so researchers developed query expansion approaches to identify refinement terms automatically. There are two broad approaches to query expansion, automatic query expansion (AQE) and interactive query expansion (IQE) (Ruthven et al., 2003). AQE has no user involvement, which is simpler for the user, but limits its performance. IQE has user involvement, which is more complex for the user, but means it can tackle more problems such as ambiguous queries. Searches fail by finding too many irrelevant pages (low precision) or by finding too few relevant pages (low recall). AQE has a long history in the field of information retrieval, where the focus has been on improving recall (Velez et al., 1997). Unfortunately, AQE often decreased precision as the terms used to expand a query often changed the query’s meaning (Croft and Harper (1979) identified this effect and named it query drift). The problem is that users typically consider just the first few results (Jansen et al., 2005), which makes precision vital to web search performance. In contrast, IQE has historically balanced precision and recall, leading to an earlier uptake within web search. However, like AQE, the precision of IQE approaches needs improvement. Most recently, approaches have started to improve precision by incorporating semantic knowledge.

Associating Searching on Search Engines to Subsequent Searching on Sites

International Journal of Information Systems in the Service Sector ◽

10.4018/ijisss.2016040103 ◽

2016 ◽

Vol 8 (2) ◽

pp. 30-43

Author(s):

Adan Ortiz-Cordova ◽

Bernard J. Jansen

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Research Study ◽

Search Queries ◽

Web Search Engine ◽

Search Patterns ◽

Search Information

In this research study, the authors investigate the association between external searching, which is searching on a web search engine, and internal searching, which is searching on a website. They classify 295,571 external – internal searches where each search is composed of a search engine query that is submitted to a web search engine and then one or more subsequent queries submitted to a commercial website by the same user. The authors examine 891,453 queries from all searches, of which 295,571 were external search queries and 595,882 were internal search queries. They algorithmically classify all queries into states, and then clustered the searching episodes into major searching configurations and identify the most commonly occurring search patterns for both external, internal, and external-to-internal searching episodes. The research implications of this study are that external sessions and internal sessions must be considered as part of a continuous search episode and that online businesses can leverage external search information to more effectively target potential consumers.

Web Search Engine Architectures and their Performance Analysis

Handbook of Research on Web Information Systems Quality ◽

10.4018/978-1-59904-847-5.ch028 ◽

2011 ◽

pp. 491-509

Author(s):

Xiannong Meng

Keyword(s):

Performance Analysis ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

General Purpose ◽

Performance Measurements ◽

Web Documents ◽

System Architectures ◽

Web Search Engine ◽

And Performance

This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a general-purpose search engine covered in this survey include system architectures, information retrieval theories as the basis of Web search, indexing and ranking of Web documents, relevance feedback and machine learning, personalization, and performance measurements. The objectives of the chapter are to review the theories and technologies pertaining to Web search, and help us understand how Web search engines work and how to use the search engines more effectively and efficiently.

Deep Web

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch062 ◽

2009 ◽

pp. 581-588 ◽

Cited By ~ 5

Author(s):

Denis Shestakov

Keyword(s):

Search Engines ◽

Large Scale ◽

Web Search ◽

Web Database ◽

Web Search Engine ◽

Search Form ◽

Complete Set ◽

Web Crawlers ◽

Pass Through ◽

The Web

Finding information on the Web using a web search engine is one of the primary activities of today’s web users. For a majority of users results returned by conventional search engines are an essentially complete set of links to all pages on the Web relevant to their queries. However, currentday searchers do not crawl and index a significant portion of the Web and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the nonindexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages which embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale.

Enhancing Web Search through Query Log Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch083 ◽

2011 ◽

pp. 438-442

Author(s):

Ji-Rong Wen

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Web Mining ◽

Web Search ◽

Information Source ◽

Query Log ◽

Additional Information ◽

Query Logs ◽

Query Log Mining ◽

The Web

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.

Full-Text Search Engine using MySQL

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2010.5.2233 ◽

2010 ◽

Vol 5 (5) ◽

pp. 735

Author(s):

Cornelia Gyorodi ◽

Robert Gyorodi ◽

George Pecherle ◽

George Mihai Cornea

Keyword(s):

Search Engine ◽

Full Text ◽

Bag Of Words ◽

Text Search ◽

Full Text Search ◽

Medium Scale ◽

The Web ◽

Spelling Mistake

In this article we will try to explain how we can create a search engine using the powerful MySQL full-text search. The ever increasing demands of the web requires cheap and elaborate search options. One of the most important issues for a search engine is to have the capacity to order its results set as relevance and provide the user with suggestions in the case of a spelling mistake or a small result set. In order to fulfill this request we thought about using the powerful MySQL full-text search. This option is suitable for small to medium scale websites. In order to provide sound like capabilities, a second table containing a bag of words from the main table together with the corresponding metaphone is created. When a suggestion is needed, this table is interrogated for the metaphone of the searched word and the result set is computed resulting a suggestion.

Eysenbach, Tuische and Diepgen’s Evaluation of Web Searching for Identifying Unpublished Studies for Systematic Reviews: An Innovative Study Which is Still Relevant Today

Evidence Based Library and Information Practice ◽

10.18438/b8f049 ◽

2016 ◽

Vol 11 (3) ◽

pp. 108

Author(s):

Simon Briscoe

Keyword(s):

Systematic Review ◽

Clinical Trials ◽

Search Engine ◽

Systematic Reviews ◽

Search Engines ◽

Web Search ◽

Web Searching ◽

Web Searches ◽

Web Search Engine ◽

Unpublished Studies

A Review of: Eysenbach, G., Tuische, J. & Diepgen, T.L. (2001). Evaluation of the usefulness of Internet searches to identify unpublished clinical trials for systematic reviews. Medical Informatics and the Internet in Medicine, 26(3), 203-218. http://dx.doi.org/10.1080/14639230110075459 Objective – To consider whether web searching is a useful method for identifying unpublished studies for inclusion in systematic reviews. Design – Retrospective web searches using the AltaVista search engine were conducted to identify unpublished studies – specifically, clinical trials – for systematic reviews which did not use a web search engine. Setting – The Department of Clinical Social Medicine, University of Heidelberg, Germany. Subjects – n/a Methods – Pilot testing of 11 web search engines was carried out to determine which could handle complex search queries. Pre-specified search requirements included the ability to handle Boolean and proximity operators, and truncation searching. A total of seven Cochrane systematic reviews were randomly selected from the Cochrane Library Issue 2, 1998, and their bibliographic database search strategies were adapted for the web search engine, AltaVista. Each adaptation combined search terms for the intervention, problem, and study type in the systematic review. Hints to planned, ongoing, or unpublished studies retrieved by the search engine, which were not cited in the systematic reviews, were followed up by visiting websites and contacting authors for further details when required. The authors of the systematic reviews were then contacted and asked to comment on the potential relevance of the identified studies. Main Results – Hints to 14 unpublished and potentially relevant studies, corresponding to 4 of the 7 randomly selected Cochrane systematic reviews, were identified. Out of the 14 studies, 2 were considered irrelevant to the corresponding systematic review by the systematic review authors. The relevance of a further three studies could not be clearly ascertained. This left nine studies which were considered relevant to a systematic review. In addition to this main finding, the pilot study to identify suitable search engines found that AltaVista was the only search engine able to handle the complex searches required to search for unpublished studies. Conclusion –Web searches using a search engine have the potential to identify studies for systematic reviews. Web search engines have considerable limitations which impede the identification of studies.