WIEAS: Helping to Discover Web Information Sources and Extract Data from Them

Author(s):  
Liyu Li ◽  
Shiwei Tang ◽  
Dongqing Yang ◽  
Tengjiao Wang ◽  
Zhihong Deng ◽  
...  
2011 ◽  
pp. 325-346 ◽  
Author(s):  
Donato Barbagallo ◽  
Cinzia Cappiello ◽  
Chiara Francalanci ◽  
Maristella Mate

Author(s):  
Iñaki Fernández de Viana ◽  
Inma Hernandez ◽  
Patricia Jiménez ◽  
Carlos R. Rivero ◽  
Hassan A. Sleiman

Author(s):  
Athena Vakali ◽  
George Pallis ◽  
Lefteris Angelis

The explosive growth of the Web scale has drastically increased information circulation and dissemination rates. As the number of both Web users and Web sources grows significantly everyday, crucial data management issues, such as clustering on the Web, should be addressed and analyzed. Clustering has been proposed towards improving both the information availability and the Web users’ personalization. Clusters on the Web are either users’ sessions or Web information sources, which are managed in a variation of applications and implementations testbeds. This chapter focuses on the topic of clustering information over the Web, in an effort to overview and survey on the theoretical background and the adopted practices of most popular emerging and challenging clustering research efforts. An up-to-date survey of the existing clustering schemes is given, to be of use for both researchers and practitioners interested in the area of Web data mining.


Author(s):  
Mu-Chun Su ◽  
◽  
Shao-Jui Wang ◽  
Chen-Ko Huang ◽  
Pa-ChunWang ◽  
...  

Most of the dramatically increased amount of information available on the World Wide Web is provided via HTML and formatted for human browsing rather than for software programs. This situation calls for a tool that automatically extracts information from semistructured Web information sources, increasing the usefulness of value-added Web services. We present a <u>si</u>gnal-<u>r</u>epresentation-b<u>a</u>sed <u>p</u>arser (SIRAP) that breaks Web pages up into logically coherent groups - groups of information related to an entity, for example. Templates for records with different tag structures are generated incrementally by a Histogram-Based Correlation Coefficient (HBCC) algorithm, then records on a Web page are detected efficiently using templates generated by matching. Hundreds of Web pages from 17 state-of-the-art search engines were used to demonstrate the feasibility of our approach.


Author(s):  
Athena Vakali ◽  
Geroge Pallis ◽  
Lefteris Angelis

The explosive growth of the Web scale has drastically increased information circulation and dissemination rates. As the number of both Web users and Web sources grows significantly everyday, crucial data management issues, such as clustering on the Web, should be addressed and analyzed. Clustering has been proposed towards improving both the information availability and the Web users’ personalization. Clusters on the Web are either users’ sessions or Web information sources, which are managed in a variation of applications and implementations testbeds. This chapter focuses on the topic of clustering information over the Web, in an effort to overview and survey on the theoretical background and the adopted practices of most popular emerging and challenging clustering research efforts. An up-to-date survey of the existing clustering schemes is given, to be of use for both researchers and practitioners interested in the area of Web data mining.


Author(s):  
María R. Romagnano ◽  
Silvana V. Aciar ◽  
Martín G. Marchetta

Living in times of technological changes that alter our daily activities, involving tasks such as reading the newspaper, following the weather, scheduling a trip, are usually executed after perusal of the gigantic repository of information, commonly known as the World Wide Web. However some problems are still associated with the information found in such a vast amount of information: heterogeneity, availability, distribution, quality and quantity of irrelevant information. Recent work has suggested different ways of grouping similar information sources, trying to give solutions to these problems. However, some domains are more complex than others. For example, a person looking for tourist information, is generally overwhelmed by visiting various websites. This paper proposes the implementation of a method to retrieve and group web information sources, depending on the services they offer; thereby allowing the user to get accurate answers; thus reducing the time and complexity in the search.


2013 ◽  
Vol 9 (1) ◽  
Author(s):  
Laerte Pereira da Silva Júnior

Resumo O portal do Centro de Ciências Humanas, Letras e Artes (CCHLA) compõe o conjunto de informações web indicadas pela Carta de Serviços ao Cidadão da Universidade Federal da Paraíba (UFPB). Esta pesquisa procura colaborar para o incremento da qualidade do uso dessa fonte de informação por meio dos Estudos de Usuários, no campo da Ciência da Informação, com base na Engenharia da Usabilidade, mais precisamente, no conceito de usabilidade, definido por Jakob Nielsen, associado aos atributos: facilidade de aprendizado, eficiência de uso, facilidade de memorização, incidência de erros e satisfação subjetiva.Palavras-chave Estudos de Usuários, Ciência da Informação, Usabilidade, Engenharia de Usabilidade, Portal do CCHLA.Abstract The homepage of the Centre for Humanities, Arts and Letters of the Federal University of Paraíba is listed among the web information sources indicated by the Univerity’s Carta de Serviços ao Cidadão. The present study intends to collaborate towards increasing the quality of this source of information through the Users Study in Information Science based on Usability Engineering, mainly the concept of usability adopted by Jakob Nielsen, associated to the atributes of learning facility, use efficiency, memorizing facility, error occurrences and subject satisfaction.Keywords Users Studies, Information Science, Usability, Usability Engineering, CCHLA Homepage.


Sign in / Sign up

Export Citation Format

Share Document