Using a Graph-Based Data Mining System to Perform Web Search

Author(s):  
Diane J. Cook ◽  
Nitish Manocha ◽  
Lawrence B. Holder

The World Wide Web provides an immense source of information. Accessing information of interest presents a challenge to scientists and analysts, particularly if the desired information is structural in nature. Our goal is to design a structural search engine that uses the hyperlink structure of the Web, in addition to textual information, to search for sites of interest. Our structural search engine, called WebSUBDUE, searches not only for particular words or topics but also for a desired hyperlink structure. Enhanced by WordNet text functions, our search engine retrieves sites corresponding to structures formed by graph-based user queries. We hypothesize that this system can form the heart of a structural query engine, and demonstrate the approach on a number of structural web queries.

2013 ◽  
Vol 8 (3) ◽  
pp. 913-921 ◽  
Author(s):  
Noryusliza Abdullah ◽  
Rosziati Ibrahim

Semantic Web approach with the assistance of ontology is widely used to give more reliable application in retrieving information and knowledge.  It is capable to discover the World Wide Web (WWW) that is presented in natural-language text.  Based on previous research, incorporating categorization with ontology concept has proven to give better results.  However, performing hybrid of the search engine using another technique that is user profiling has a promising potency in enhancing the searching process.  Utilizing searching time and giving relevant results are the contributions of this research.  The proposed hybrid techniques integrate ontologies, categorization and user profiling concept.  In user profiling, similarity measure is adopted in making comparison between two different ontologies.  WordNet and UTHM Onto are the independent ontologies used in this process.  The preliminary experimental results have given interesting results in terms of data arrangement and time usage.


2006 ◽  
Vol 1 (3) ◽  
pp. 67
Author(s):  
David Hook

A review of: Jansen, Bernard J., and Amanda Spink. “How Are We Searching the World Wide Web? A Comparison of Nine Search Engine Transaction Logs.” Information Processing & Management 42.1 (2006): 248-263. Objective – To examine the interactions between users and search engines, and how they have changed over time. Design – Comparative analysis of search engine transaction logs. Setting – Nine major analyses of search engine transaction logs. Subjects – Nine web search engine studies (4 European, 5 American) over a seven-year period, covering the search engines Excite, Fireball, AltaVista, BWIE and AllTheWeb. Methods – The results from individual studies are compared by year of study for percentages of single query sessions, one-term queries, operator (and, or, not, etc.) usage and single result page viewing. As well, the authors group the search queries into eleven different topical categories and compare how the breakdown has changed over time. Main Results – Based on the percentage of single query sessions, it does not appear that the complexity of interactions has changed significantly for either the U.S.-based or the European-based search engines. As well, there was little change observed in the percentage of one-term queries over the years of study for either the U.S.-based or the European-based search engines. Few users (generally less than 20%) use Boolean or other operators in their queries, and these percentages have remained relatively stable. One area of noticeable change is in the percentage of users viewing only one results page, which has increased over the years of study. Based on the studies of the U.S.-based search engines, the topical categories of ‘People, Place or Things’ and ‘Commerce, Travel, Employment or Economy’ are becoming more popular, while the categories of ‘Sex and Pornography’ and ‘Entertainment or Recreation’ are declining. Conclusions – The percentage of users viewing only one results page increased during the years of the study, while the percentages of single query sessions, one-term sessions and operator usage remained stable. The increase in single result page viewing implies that users are tending to view fewer results per web query. There was also a significant difference in the percentage of queries using Boolean operators between the US-based and the European-based search engines. One of the study’s findings was that results from a study of a particular search engine cannot necessarily be applied to all search engines. Finally, web search topics show a trend towards information or commerce searching rather than entertainment.


Author(s):  
Abhishek Das ◽  
Ankit Jain

In this chapter, the authors describe the key indexing components of today’s web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. The authors present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. Techniques are highlighted that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concepts in this context. In particular, the authors delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. Some thoughts on information organization for the newly emerging data-forms conclude the chapter.


NASKO ◽  
2011 ◽  
Vol 3 (1) ◽  
pp. 33
Author(s):  
Elizabeth Milonas

The World Wide Web has grown exponentially in the last few years. The popularity of Web search engines has also grown in a similar manner. The task of a Web search engine is to provide the Web searcher with accurate and targeted information from the plethora of information available on the Web. This is a daunting task that requires the careful usage of language to ensure accuracy. As a result, the importance of the usage and meaning of language in the Web domain has become the focus of recent research. In this paper, the author will explore Wittgenstein’s later philosophy of language as it applies to the language used in the search result pages of a Web search engine in an effort to broaden the understanding of language usage within this domain.


2008 ◽  
Vol 1 (3) ◽  
pp. 273-285 ◽  
Author(s):  
Yair Galily

From its explosive development in the last decade of the 20th century, the World Wide Web has become an ideal medium for dedicated sports fanatics and a useful resource for casual fans, as well. Its accessibility, interactivity, speed, and multimedia content have triggered a fundamental change in the delivery of mediated sports, a change for which no one can yet predict the outcome (Real, 2006). This commentary sheds light on a process in which the talk-back mechanism, which enables readers to comment on Web-published articles, is (re)shaping the sport realm in Israeli media. The study on which this commentary is based involved the comparative analysis of over 3,000 talk-backs from the sports sections of 3 daily Web news sites (Ynet, nrg, and Walla!). The argument is made that talkbacks serve not only as an extension of the journalistic sphere but also as a new source of information and debate.


1998 ◽  
Vol 21 (3) ◽  
pp. 163-185 ◽  
Author(s):  
Johnny S.K. Wong ◽  
Rishi Nayar ◽  
Armin R. Mikler

Author(s):  
Rizwan Ur Rahman ◽  
Rishu Verma ◽  
Himani Bansal ◽  
Deepak Singh Tomar

With the explosive expansion of information on the world wide web, search engines are becoming more significant in the day-to-day lives of humans. Even though a search engine generally gives huge number of results for certain query, the majority of the search engine users simply view the first few web pages in result lists. Consequently, the ranking position has become a most important concern of internet service providers. This article addresses the vulnerabilities, spamming attacks, and countermeasures in blogging sites. In the first part, the article explores the spamming types and detailed section on vulnerabilities. In the next part, an attack scenario of form spamming is presented, and defense approach is presented. Consequently, the aim of this article is to provide review of vulnerabilities, threats of spamming associated with blogging websites, and effective measures to counter them.


Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.


2011 ◽  
pp. 178-184
Author(s):  
David Parry

The World Wide Web (WWW) is a critical source of information for healthcare. Because of this, systems for allowing increased efficiency and effectiveness of information retrieval and discovery are critical. Increased intelligence in web pages will allow information sharing and discovery to become vastly more efficient .The semantic web is an umbrella term for a series of standards and technologies that will support this development.


Author(s):  
Esharenana E. Adomi

The World Wide Web (WWW) has led to the advent of the information age. With increased demand for information from various quarters, the Web has turned out to be a veritable resource. Web surfers in the early days were frustrated by the delay in finding the information they needed. The first major leap for information retrieval came from the deployment of Web search engines such as Lycos, Excite, AltaVista, etc. The rapid growth in the popularity of the Web during the past few years has led to a precipitous pronouncement of death for the online services that preceded the Web in the wired world.


Sign in / Sign up

Export Citation Format

Share Document