Web Image Classification Using an Optimized Feature Set

2005 ◽  
Vol 277-279 ◽  
pp. 361-368
Author(s):  
Soo Sun Cho ◽  
Dong Won Han ◽  
Chi Jung Hwang

Redundant images currently abundant in World Wide Web pages need to be removed in order to transform or simplify the Web pages for suitable display in small-screened devices. Classifying removable images on the Web pages according to their uniqueness of content will allow simpler representation of Web pages. For such classification, machine learning based methods can be used to categorize images into two groups; eliminable and non-eliminable. We use two representative learning methods, the Naïve Bayesian classifier and C4.5 decision trees. For our Web image classification, we propose new features that have expressive power for Web images to be classified. We apply image samples to the two classifiers and analyze the results. In addition, we propose an algorithm to construct an optimized subset from a whole feature set, which includes most influential features for the purposes of classification. By using the optimized feature set, the accuracy of classification is found to improve markedly.

Author(s):  
Anuradha T ◽  
Tayyaba Nousheen

The web is the heap and huge collection of wellspring of data. The Search Engine are used for retrieving the information from World Wide Web (WWW). Search Engines are helpful for searching user keywords and provide the accurate result in fraction of seconds. This paper proposed Machine Learning based search engine which will give more relevant user searches in the form of web pages. To display the user entered query search engine plays a major role of basic interface. Every site comprises of the heaps of site pages that are being made and sent on the server.


Author(s):  
Adélia Gouveia ◽  
Jorge Cardoso

The World Wide Web (WWW) emerged in 1989, developed by Tim Berners-Lee who proposed to build a system for sharing information among physicists of the CERN (Conseil Européen pour la Recherche Nucléaire), the world’s largest particle physics laboratory. Currently, the WWW is primarily composed of documents written in HTML (hyper text markup language), a language that is useful for visual presentation (Cardoso & Sheth, 2005). HTML is a set of “markup” symbols contained in a Web page intended for display on a Web browser. Most of the information on the Web is designed only for human consumption. Humans can read Web pages and understand them, but their inherent meaning is not shown in a way that allows their interpretation by computers (Cardoso & Sheth, 2006). Since the visual Web does not allow computers to understand the meaning of Web pages (Cardoso, 2007), the W3C (World Wide Web Consortium) started to work on a concept of the Semantic Web with the objective of developing approaches and solutions for data integration and interoperability purpose. The goal was to develop ways to allow computers to understand Web information. The aim of this chapter is to present the Web ontology language (OWL) which can be used to develop Semantic Web applications that understand information and data on the Web. This language was proposed by the W3C and was designed for publishing, sharing data and automating data understood by computers using ontologies. To fully comprehend OWL we need first to study its origin and the basic blocks of the language. Therefore, we will start by briefly introducing XML (extensible markup language), RDF (resource description framework), and RDF Schema (RDFS). These concepts are important since OWL is written in XML and is an extension of RDF and RDFS.


Author(s):  
Kevin Curran ◽  
Gary Gumbleton

Tim Berners-Lee, director of the World Wide Web Consortium (W3C), states that, “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation” (Berners-Lee, 2001). The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents, roaming from page to page, can readily carry out sophisticated tasks for users. The Semantic Web (SW) is a vision of the Web where information is more efficiently linked up in such a way that machines can more easily process it. It is generating interest not just because Tim Berners-Lee is advocating it, but because it aims to solve the problem of information being hidden away in HTML documents, which are easy for humans to get information out of but are difficult for machines to do so. We will discuss the Semantic Web here.


Author(s):  
Deepak Mayal

World Wide Web (WWW)also referred to as web acts as a vital source of information and searching over the web has become so much easy nowadays all thanks to search engines google, yahoo etc. A search engine is basically a complex multiprogram that allows user to search information available on the web and for that purpose, they use web crawlers. Web crawler systematically browses the world wide web. Effective search helps in avoiding downloading and visiting irrelevant web pages on the web in order to do that web crawlers use different searching algorithm . This paper reviews different web crawling algorithm that determines the fate of the search system.


1999 ◽  
Vol 34 (12) ◽  
pp. 37-46 ◽  
Author(s):  
Kurt Nørmark
Keyword(s):  

Author(s):  
Anthony D. Andre

This paper provides an overview of the various human factors and ergonomics (HF/E) resources on the World Wide Web (WWW). A list of the most popular and useful HF/E sites will be provided, along with several critical guidelines relevant to using the WWW. The reader will gain a clear understanding of how to find HF/E information on the Web and how to successfully use the Web towards various HF/E professional consulting activities. Finally, we consider the ergonomic implications of surfing the Web.


2002 ◽  
Vol 7 (1) ◽  
pp. 9-25 ◽  
Author(s):  
Moses Boudourides ◽  
Gerasimos Antypas

In this paper we are presenting a simple simulation of the Internet World-Wide Web, where one observes the appearance of web pages belonging to different web sites, covering a number of different thematic topics and possessing links to other web pages. The goal of our simulation is to reproduce the form of the observed World-Wide Web and of its growth, using a small number of simple assumptions. In our simulation, existing web pages may generate new ones as follows: First, each web page is equipped with a topic concerning its contents. Second, links between web pages are established according to common topics. Next, new web pages may be randomly generated and subsequently they might be equipped with a topic and be assigned to web sites. By repeated iterations of these rules, our simulation appears to exhibit the observed structure of the World-Wide Web and, in particular, a power law type of growth. In order to visualise the network of web pages, we have followed N. Gilbert's (1997) methodology of scientometric simulation, assuming that web pages can be represented by points in the plane. Furthermore, the simulated graph is found to possess the property of small worlds, as it is the case with a large number of other complex networks.


2005 ◽  
Vol 11 (3) ◽  
pp. 278-281 ◽  

Following is a list of microscopy-related meetings and courses. The editors would greatly appreciate input to this list via the electronic submission form found in the MSA World-Wide Web page at http://www.msa.microscopy.com. We will gladly add hypertext links to the notice on the web and insert a listing of the meeting in the next issue of the Journal. Send comments and questions to JoAn Hudson, [email protected] or Nestor Zaluzec, [email protected]. Please furnish the following information (any additional information provided will be edited as required and printed on a space-available basis):


2017 ◽  
Vol 4 (1) ◽  
pp. 95-110 ◽  
Author(s):  
Deepika Punj ◽  
Ashutosh Dixit

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.


Sign in / Sign up

Export Citation Format

Share Document