WIEAS: Helping to Discover Web Information Sources and Extract Data from Them

The explosive growth of the Web scale has drastically increased information circulation and dissemination rates. As the number of both Web users and Web sources grows significantly everyday, crucial data management issues, such as clustering on the Web, should be addressed and analyzed. Clustering has been proposed towards improving both the information availability and the Web users’ personalization. Clusters on the Web are either users’ sessions or Web information sources, which are managed in a variation of applications and implementations testbeds. This chapter focuses on the topic of clustering information over the Web, in an effort to overview and survey on the theoretical background and the adopted practices of most popular emerging and challenging clustering research efforts. An up-to-date survey of the existing clustering schemes is given, to be of use for both researchers and practitioners interested in the area of Web data mining.

Download Full-text

A Signal-Representation-Based Parser to Extract Text-Based Information from the Web

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0531 ◽

2010 ◽

Vol 14 (5) ◽

pp. 531-539

Author(s):

Mu-Chun Su ◽

◽

Shao-Jui Wang ◽

Chen-Ko Huang ◽

Pa-ChunWang ◽

...

Keyword(s):

Web Services ◽

World Wide ◽

Information Sources ◽

State Of The Art ◽

Value Added ◽

Web Pages ◽

Web Page ◽

Web Information ◽

The World ◽

The Web

Most of the dramatically increased amount of information available on the World Wide Web is provided via HTML and formatted for human browsing rather than for software programs. This situation calls for a tool that automatically extracts information from semistructured Web information sources, increasing the usefulness of value-added Web services. We present a signal-representation-based parser (SIRAP) that breaks Web pages up into logically coherent groups - groups of information related to an entity, for example. Templates for records with different tag structures are generated incrementally by a Histogram-Based Correlation Coefficient (HBCC) algorithm, then records on a Web page are detected efficiently using templates generated by matching. Hundreds of Web pages from 17 state-of-the-art search engines were used to demonstrate the feasibility of our approach.

Download Full-text

Clustering Web Information Sources

Personalized Information Retrieval and Access ◽

10.4018/978-1-59904-510-8.ch005 ◽

2011 ◽

pp. 98-117

Author(s):

Athena Vakali ◽

Geroge Pallis ◽

Lefteris Angelis

Keyword(s):

Data Mining ◽

Data Management ◽

Information Sources ◽

Theoretical Background ◽

Web Data ◽

Information Availability ◽

Web Information ◽

Information Circulation ◽

The Web ◽

Management Issues

The explosive growth of the Web scale has drastically increased information circulation and dissemination rates. As the number of both Web users and Web sources grows significantly everyday, crucial data management issues, such as clustering on the Web, should be addressed and analyzed. Clustering has been proposed towards improving both the information availability and the Web users’ personalization. Clusters on the Web are either users’ sessions or Web information sources, which are managed in a variation of applications and implementations testbeds. This chapter focuses on the topic of clustering information over the Web, in an effort to overview and survey on the theoretical background and the adopted practices of most popular emerging and challenging clustering research efforts. An up-to-date survey of the existing clustering schemes is given, to be of use for both researchers and practitioners interested in the area of Web data mining.

Download Full-text

WICA General-Purpose Algorithm for Monitoring Web Information Sources

Proceedings 2004 VLDB Conference ◽

10.1016/b978-012088469-8/50034-6 ◽

2004 ◽

pp. 360-371 ◽

Cited By ~ 5

Author(s):

S PANDEY ◽

K DHAMDHERE ◽

C OLSTON

Keyword(s):

Information Sources ◽

General Purpose ◽

Web Information

Download Full-text

Method to Reduce Complexity and Response Time in a Web Search

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch006 ◽

2018 ◽

pp. 97-113

Author(s):

María R. Romagnano ◽

Silvana V. Aciar ◽

Martín G. Marchetta

Keyword(s):

Response Time ◽

World Wide ◽

Web Search ◽

Information Sources ◽

Irrelevant Information ◽

Daily Activities ◽

Technological Changes ◽

Web Information ◽

The World ◽

Tourist Information

Living in times of technological changes that alter our daily activities, involving tasks such as reading the newspaper, following the weather, scheduling a trip, are usually executed after perusal of the gigantic repository of information, commonly known as the World Wide Web. However some problems are still associated with the information found in such a vast amount of information: heterogeneity, availability, distribution, quality and quantity of irrelevant information. Recent work has suggested different ways of grouping similar information sources, trying to give solutions to these problems. However, some domains are more complex than others. For example, a person looking for tourist information, is generally overwhelmed by visiting various websites. This paper proposes the implementation of a method to retrieve and group web information sources, depending on the services they offer; thereby allowing the user to get accurate answers; thus reducing the time and complexity in the search.

Download Full-text

REPUTATION-BASED SELECTION OF WEB INFORMATION SOURCES

Proceedings of the 12th International Conference on Enterprise Information Systems ◽

10.5220/0002908400300037 ◽

2010 ◽

Keyword(s):

Information Sources ◽

Web Information ◽

Selection Of

Download Full-text

An integration system of Web information sources for mobile users

Proceedings 2000 International Database Engineering and Applications Symposium (Cat. No.PR00789) ◽

10.1109/ideas.2000.880584 ◽

2002 ◽

Author(s):

Wisut Sae-Tung ◽

T. Ohmori ◽

M. Hoshi

Keyword(s):

Information Sources ◽

Mobile Users ◽

Integration System ◽

Web Information

Download Full-text

Estudos híbridos de uso da informação no portal do Centro de Ciências Humanas, Letras e Artes da UFPB │ Hybrid Studies of information use in the homepage of the Centre for Humanities, Arts and Letters of the Federal University of Paraíba

Liinc em Revista ◽

10.18617/liinc.v9i1.532 ◽

2013 ◽

Vol 9 (1) ◽

Author(s):

Laerte Pereira da Silva Júnior

Keyword(s):

Information Sources ◽

Information Science ◽

Information Use ◽

Usability Engineering ◽

Web Information ◽

Use Efficiency ◽

Source Of Information ◽

The Web

Resumo O portal do Centro de Ciências Humanas, Letras e Artes (CCHLA) compõe o conjunto de informações web indicadas pela Carta de Serviços ao Cidadão da Universidade Federal da Paraíba (UFPB). Esta pesquisa procura colaborar para o incremento da qualidade do uso dessa fonte de informação por meio dos Estudos de Usuários, no campo da Ciência da Informação, com base na Engenharia da Usabilidade, mais precisamente, no conceito de usabilidade, definido por Jakob Nielsen, associado aos atributos: facilidade de aprendizado, eficiência de uso, facilidade de memorização, incidência de erros e satisfação subjetiva.Palavras-chave Estudos de Usuários, Ciência da Informação, Usabilidade, Engenharia de Usabilidade, Portal do CCHLA.Abstract The homepage of the Centre for Humanities, Arts and Letters of the Federal University of Paraíba is listed among the web information sources indicated by the Univerity’s Carta de Serviços ao Cidadão. The present study intends to collaborate towards increasing the quality of this source of information through the Users Study in Information Science based on Usability Engineering, mainly the concept of usability adopted by Jakob Nielsen, associated to the atributes of learning facility, use efficiency, memorizing facility, error occurrences and subject satisfaction.Keywords Users Studies, Information Science, Usability, Usability Engineering, CCHLA Homepage.

Download Full-text