An integration system of Web information sources for mobile users

The explosive growth of the Web scale has drastically increased information circulation and dissemination rates. As the number of both Web users and Web sources grows significantly everyday, crucial data management issues, such as clustering on the Web, should be addressed and analyzed. Clustering has been proposed towards improving both the information availability and the Web users’ personalization. Clusters on the Web are either users’ sessions or Web information sources, which are managed in a variation of applications and implementations testbeds. This chapter focuses on the topic of clustering information over the Web, in an effort to overview and survey on the theoretical background and the adopted practices of most popular emerging and challenging clustering research efforts. An up-to-date survey of the existing clustering schemes is given, to be of use for both researchers and practitioners interested in the area of Web data mining.

Download Full-text

THE USE OF CARIN LANGUAGE AND ALGORITHMS FOR INFORMATION INTEGRATION: THE PICSEL SYSTEM

International Journal of Cooperative Information Systems ◽

10.1142/s0218843000000181 ◽

2000 ◽

Vol 09 (04) ◽

pp. 383-401 ◽

Cited By ~ 54

Author(s):

FRANÇOIS GOASDOUÉ ◽

VÉRONIQUE LATTÈS ◽

MARIE-CHRISTINE ROUSSET

Keyword(s):

Information Integration ◽

Information Sources ◽

Expressive Power ◽

Real Case ◽

Travel Agency ◽

Integration System ◽

The Core ◽

Knowledge Based ◽

Information Server ◽

Logical Formalism

PICSEL is an information integration system over sources that are distributed and possibly heterogeneous. The approach which has been chosen in PICSEL is to define an information server as a knowledge-based mediator in which CARIN is used as the core logical formalism to represent both the domain of application and the contents of information sources relevant to that domain. In this paper, we describe the way the expressive power of the CARIN language is exploited in the PICSEL information integration system, while maintaining the decidability of query answering. We illustrate it on examples coming from the tourism domain, which is the first real case that we have to consider in PICSEL, in collaboration with the travel agency Degriftour. see

Download Full-text

A Signal-Representation-Based Parser to Extract Text-Based Information from the Web

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0531 ◽

2010 ◽

Vol 14 (5) ◽

pp. 531-539

Author(s):

Mu-Chun Su ◽

◽

Shao-Jui Wang ◽

Chen-Ko Huang ◽

Pa-ChunWang ◽

...

Keyword(s):

Web Services ◽

World Wide ◽

Information Sources ◽

State Of The Art ◽

Value Added ◽

Web Pages ◽

Web Page ◽

Web Information ◽

The World ◽

The Web

Most of the dramatically increased amount of information available on the World Wide Web is provided via HTML and formatted for human browsing rather than for software programs. This situation calls for a tool that automatically extracts information from semistructured Web information sources, increasing the usefulness of value-added Web services. We present a signal-representation-based parser (SIRAP) that breaks Web pages up into logically coherent groups - groups of information related to an entity, for example. Templates for records with different tag structures are generated incrementally by a Histogram-Based Correlation Coefficient (HBCC) algorithm, then records on a Web page are detected efficiently using templates generated by matching. Hundreds of Web pages from 17 state-of-the-art search engines were used to demonstrate the feasibility of our approach.

Download Full-text

Source Integration for Data Warehousing

Multidimensional Databases ◽

10.4018/978-1-59140-053-0.ch012 ◽

2003 ◽

pp. 361-392 ◽

Cited By ~ 3

Author(s):

Andrea Cali ◽

Domenico Lembo ◽

Maurizio Lenzerini ◽

Riccardo Rosati

Keyword(s):

Data Analysis ◽

Data Warehouse ◽

Information Sources ◽

Data Cleaning ◽

Relevant Information ◽

Fundamental Aspect ◽

Complex Task ◽

Schema Integration ◽

Operational Environment ◽

Integration System

While the main goal of a data warehouse is to provide support for data analysis and management’s decisions, a fundamental aspect in design of a data warehouse system is the process of acquiring the raw data from a set of relevant information sources. We will call source integration system the component of a data warehouse system dealing with this process. The main goal of a source integration system is to deal with the transfer of data from the set of sources constituting the application-oriented operational environment, to the data warehouse. Since sources are typically autonomous, distributed, and heterogeneous, this task has to deal with the problem of cleaning, reconciling, and integrating data coming from the sources. The design of a source integration system is a very complex task, which comprises several different issues. The purpose of this chapter is to discuss the most important problems arising in the design of a source integration system, with special emphasis on schema integration, processing queries for data integration, and data cleaning and reconciliation.

Download Full-text

Clustering Web Information Sources

Personalized Information Retrieval and Access ◽

10.4018/978-1-59904-510-8.ch005 ◽

2011 ◽

pp. 98-117

Author(s):

Athena Vakali ◽

Geroge Pallis ◽

Lefteris Angelis

Keyword(s):

Data Mining ◽

Data Management ◽

Information Sources ◽

Theoretical Background ◽

Web Data ◽

Information Availability ◽

Web Information ◽

Information Circulation ◽

The Web ◽

Management Issues

The explosive growth of the Web scale has drastically increased information circulation and dissemination rates. As the number of both Web users and Web sources grows significantly everyday, crucial data management issues, such as clustering on the Web, should be addressed and analyzed. Clustering has been proposed towards improving both the information availability and the Web users’ personalization. Clusters on the Web are either users’ sessions or Web information sources, which are managed in a variation of applications and implementations testbeds. This chapter focuses on the topic of clustering information over the Web, in an effort to overview and survey on the theoretical background and the adopted practices of most popular emerging and challenging clustering research efforts. An up-to-date survey of the existing clustering schemes is given, to be of use for both researchers and practitioners interested in the area of Web data mining.

Download Full-text

An integration system of Web information sources for mobile users

Navigational integration of autonomous Web information sources by mobile users

Semantic Sentiment Analyses Based on Reputations of Web Information Sources

Integrating Deep-Web Information Sources

WIEAS: Helping to Discover Web Information Sources and Extract Data from Them

RDF-Based Web Information Integration System: A Travel System Use Case

Clustering Web Information Services

THE USE OF CARIN LANGUAGE AND ALGORITHMS FOR INFORMATION INTEGRATION: THE PICSEL SYSTEM

A Signal-Representation-Based Parser to Extract Text-Based Information from the Web

Source Integration for Data Warehousing

Clustering Web Information Sources

Export Citation Format