Web Data Management Practices
Latest Publications


TOTAL DOCUMENTS

12
(FIVE YEARS 0)

H-INDEX

4
(FIVE YEARS 0)

Published By IGI Global

9781599042282, 9781599042305

Author(s):  
Athena Vakali ◽  
George Pallis ◽  
Lefteris Angelis

The explosive growth of the Web scale has drastically increased information circulation and dissemination rates. As the number of both Web users and Web sources grows significantly everyday, crucial data management issues, such as clustering on the Web, should be addressed and analyzed. Clustering has been proposed towards improving both the information availability and the Web users’ personalization. Clusters on the Web are either users’ sessions or Web information sources, which are managed in a variation of applications and implementations testbeds. This chapter focuses on the topic of clustering information over the Web, in an effort to overview and survey on the theoretical background and the adopted practices of most popular emerging and challenging clustering research efforts. An up-to-date survey of the existing clustering schemes is given, to be of use for both researchers and practitioners interested in the area of Web data mining.


Author(s):  
Dušan Husek ◽  
Jaroslav Pokorny ◽  
Hana Rezankova ◽  
Václav Snasel

Document and information retrieval (IR) is an important task for Web communities. In this chapter, we introduce some clustering methods and focus on their use for the clustering, classification, and retrieval of Web documents.


2007 ◽  
pp. 244-267
Author(s):  
Bernd Aman ◽  
Salima Benbernou ◽  
Benjamin Nguyen

Unlike traditional applications, which depend upon a tight interconnection of all program elements, Web service applications are composed of loosely coupled, autonomous and independent services published on the Web. In this chapter, we first introduces the concept of service oriented computing (SOC) on the Web and the current standards enabling the definition and publication of Web services. This technology’s next evolution is to facilitate the creation and maintenance of Web applications. This can be achieved by exploiting the self-descriptive nature of Web services combined with more powerful models and languages for composing Web services. A second objective of this chapter is to illustrate the complexity of the Web service composition problem and to provide a representative overview of the existing approaches. The chapter concludes with a short presentation of two research projects exploiting and extending the Web service paradigm.


2007 ◽  
pp. 159-178 ◽  
Author(s):  
Dimitrios Katsaros

Discrete sequence modeling and prediction is an important goal and a challenge for Web environments, both wired and wireless. Web client’s data request forecasting and mobile location tracking in wireless cellular networks are characteristic application areas of sequence prediction in such environments. Accurate data request prediction results in effective data prefetching which combined with a caching mechanism can reduce user-preceived latencies as well as server and network loads. Also, effective solutions to the mobility tracking/prediction problem can reduce the update and paging costs, freeing the network from exceesive signaling traffic. Therefore, sequence prediction comprises a very important study and development area . This article presents information-theoretic techniques for discrete sequence prediction. It surveys, classifies, and compares the state-of-the-art solutions, suggesting routes for further research by discussing the critical issues and challenges of prediction in wired and wireless networks


2007 ◽  
pp. 124-158
Author(s):  
Mehregan Mahdavi ◽  
Boualem Bentallah

The World Wide Web provides a means for sharing data and applications among users. However, its performance and in particular providing fast response time is still an issue. Caching is a key technique that addresses some of the performance issues in today’s Web-enabled applications. Deploying dynamic data especially in an emerging class of Web applications, called Web Portals, makes caching even more interesting. In this chapter, we study Web caching techniques with focus on dynamic content. We also discuss the limitations of caching in Web portals and study a solution that addresses these limitations. The solution is based on the collaboration between the portal and its providers.


2007 ◽  
pp. 199-219
Author(s):  
Angelo Brayner ◽  
Macelo Meireles ◽  
José de Aguiar Moraes Filho

Integrating data sources published on the web requires an integration strategy that guarantees local data sources autonomy. Multidatabase System (MDBS) has been consolidated as an approach to integrate multiple heterogeneous and distributed data sources in flexible and dynamic environments such as the Web. A key property of MDBSs is to guarantee a higher degree of local autonomy. In order to adopt the MDBS strategy, it is necessary to use a query language, called multidatabase language (MDL), which provides the necessary constructs for jointly manipulating and accessing data in heterogeneous data sources. In other words, the MDL is responsible for solving integration conflicts. This chapter describes an extension to the XQuery language, called MXQuery, which supports queries over several data sources and solves integration problems as semantic heterogeneity and incomplete information.


2007 ◽  
pp. 179-198
Author(s):  
Rosa Meo ◽  
Maristella Matera

In this Chapter we present the usage of a modeling language, WebML, for the design and the management of dynamic Web applications. WebML also makes easier the analysis of the usage of the application contents by the users, even if applications are dynamic. In fact, it makes use of some special-purpose logs, called conceptual logs, generated by the application runtime engine. In this Chapter we report on a case study about the analysis of the conceptual logs for testifying to the effectiveness of WebML and of its conceptual modeling methods. The methodology of analysis of Web logs is based on the data mining paradigm of itemsets and frequent patterns and makes full use of constraints on the conceptual logs content. As a consequence, we could obtain many interesting patterns for the application management such as recurrent navigation paths, the most frequently visited page contents, and anomalies.


2007 ◽  
pp. 79-103 ◽  
Author(s):  
Laura Irina Rusu ◽  
Wenny Rahayu ◽  
David Taniar

This chapter presents some of the existing mining techniques for extracting association rules out of XML documents, in the context of rapid changes in the Web knowledge discovery area. The initiative of this study was driven by the fast emergence of XML (eXtensible Markup Language) as a standard language for representing semi-structured data and as a new standard of exchanging information between different applications. The data exchanged as XML documents becomes every day richer and richer, so the necessity to not only store these large volume of XML data for later use, but to mine them as well, to discover interesting information, has became obvious. The hidden knowledge can be used in various ways, for example to decide on a business issue or to make predictions about future e-customer behaviour in a web-application. One type of knowledge which can be discovered in a collection of XML documents relates to association rules between parts of the document, and this chapter presents some of the top techniques for extracting them.


Author(s):  
Giovanna Guerrini ◽  
Marco Mesiti ◽  
Ismael Sanz

The large amount and heterogeneity of XML documents on the Web require the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and links inside and among documents. For instance, grouping together documents with similar structures has interesting applications in the context of information extraction, of heterogeneous data integration, of personalized content delivery, of access control definition, of web site structural analysis, of comparison of RNA secondary structures. Many approaches have been proposed for evaluating the structural and content similarity between tree-based and vector-based representations of XML documents. Link-based similarity approaches developed for Web data clustering have been adapted for XML documents. This chapter discusses and compares the most relevant similarity measures and their employment for XML document clustering.


2007 ◽  
pp. 104-123
Author(s):  
Stavros Papastavrou ◽  
George Samaras ◽  
Paraskevas Evripidou ◽  
Panos K. Chrysanthis

This chapter takes a tutorial approach to present the Web-related technologies and content middlewares that attempt to accelerate the generation and optimize the delivery of dynamic content. It covers the historical aspects of dynamic content and presents the reasoning behind its introduction while discussing early content middlewares such as the CGI and FastCGI. It then presents the evolution of content middlewares along the lines of contacted research. The discussion focuses on popular techniques that mostly include content caching and content fragmentation. It also discusses a variety of other research efforts such as hardware and low-level acceleration techniques, active caching, and delta encoding. Finally, the authors hope that this chapter will server as an introductory tutorial to students and researchers in the field of dynamic Web content technology.


Sign in / Sign up

Export Citation Format

Share Document