Deep Web Mining through Web Services

Author(s):  
Monica Maceli ◽  
Min Song

With the increase in Web-based databases and dynamically- generated Web pages, the concept of the “deep Web” has arisen. The deep Web refers to Web content that, while it may be freely and publicly accessible, is stored, queried, and retrieved through a database and one or more search interfaces, rendering the Web content largely hidden from conventional search and spidering techniques. These methods are adapted to a more static model of the “surface Web”, or series of static, linked Web pages. The amount of deep Web data is truly staggering; a July 2000 study claimed 550 billion documents (Bergman, 2000), while a September 2004 study estimated 450,000 deep Web databases (Chang, He, Li, Patel, & Zhang, 2004). In pursuit of a truly searchable Web, it comes as no surprise that the deep Web is an important and increasingly studied area of research in the field of Web mining. The challenges include issues such as new crawling and Web mining techniques, query translation across multiple target databases, and the integration and discovery of often quite disparate interfaces and database structures (He, Chang, & Han, 2004; He, Zhang, & Chang, 2004; Liddle, Yau, & Embley, 2002; Zhang, He, & Chang, 2004). Similarly, as the Web platform continues to evolve to support applications more complex than the simple transfer of HTML documents over HTTP, there is a strong need for the interoperability of applications and data across a variety of platforms. From the client perspective, there is the need to encapsulate these interactions out of view of the end user (Balke & Wagner, 2004). Web services provide a robust, scalable and increasingly commonplace solution to these needs. As identified in earlier research efforts, due to the inherent nature of the deep Web, dynamic and ad hoc information retrieval becomes a requirement for mining such sources (Chang, He, & Zhang, 2004; Chang, He, Li, Patel, & Zhang, 2004). The platform and program-agnostic nature of Web services, combined with the power and simplicity of HTTP transport, makes Web services an ideal technique for application to the field of deep Web mining. We have identified, and will explore, specific areas in which Web services can offer solutions in the realm of deep Web mining, particularly when serving the need for dynamic, ad-hoc information gathering.

2007 ◽  
Vol 16 (05) ◽  
pp. 793-828 ◽  
Author(s):  
JUAN D. VELÁSQUEZ ◽  
VASILE PALADE

Understanding the web user browsing behaviour in order to adapt a web site to the needs of a particular user represents a key issue for many commercial companies that do their business over the Internet. This paper presents the implementation of a Knowledge Base (KB) for building web-based computerized recommender systems. The Knowledge Base consists of a Pattern Repository that contains patterns extracted from web logs and web pages, by applying various web mining tools, and a Rule Repository containing rules that describe the use of discovered patterns for building navigation or web site modification recommendations. The paper also focuses on testing the effectiveness of the proposed online and offline recommendations. An ample real-world experiment is carried out on a web site of a bank.


Author(s):  
Raghvendra Kumar ◽  
Priyanka Pandey ◽  
Prasant Kumar Pattnaik

The Web can be defined as a depot of varied range of information present in the form of millions of websites dispersed around us. Often users find it difficult to locate the appropriate information fulfilling their needs with the abundant number of websites in the Web. Hence multiple research work has been conducted in the field of Web Mining so as to present any information matching the user's needs. The application of data mining techniques on web usage, web content or web structure data to find out useful data like users' way in patterns and website utility statistics on a whole can be defined as Web mining. The main cause behind development of such websites was to personalize the substance of a website on user's preference. New methods are developed to deal with a Web site using a link hierarchy and a conceptual link hierarchy respectively on the basis of how users have used the Web site link structure.


2016 ◽  
pp. 866-884
Author(s):  
Georgios Bouloukakis ◽  
Ioannis Basdekis ◽  
Constantine Stephanidis

Web services are an emerging technology that has attracted much attention from both the research and the industry sectors in recent years. The exploitation of Web services as components in Web applications facilitates development and supports application interoperability, regardless of the programming language and platform used. However, existing Web services development standards do not take into account the fact that the provided content and the interactive functionality should be accessible to, and easily operable by, people with disabilities. This chapter presents a platform named myWebAccess, which provides a mechanism for the semi-automated “repair” of Web services' interaction characteristics in order to support the automatic generation of interface elements that conform to the de facto standard of the Web Content Accessibility Guidelines 2.0. myWebAccess enhances interaction quality for specific target user groups, including people with visual and motor disabilities, and supports the use of Web services on diverse platforms (e.g., mobile phones equipped with a browser). The Web developers can build their own design templates and the users of myWebAccess can create a personalized environment containing their favourite services. Thus, they can interact with them through interfaces appropriate to their specific individual characteristics.


Author(s):  
Punam Bedi ◽  
Neha Gupta ◽  
Vinita Jindal

The World Wide Web is a part of the Internet that provides data dissemination facility to people. The contents of the Web are crawled and indexed by search engines so that they can be retrieved, ranked, and displayed as a result of users' search queries. These contents that can be easily retrieved using Web browsers and search engines comprise the Surface Web. All information that cannot be crawled by search engines' crawlers falls under Deep Web. Deep Web content never appears in the results displayed by search engines. Though this part of the Web remains hidden, it can be reached using targeted search over normal Web browsers. Unlike Deep Web, there exists a portion of the World Wide Web that cannot be accessed without special software. This is known as the Dark Web. This chapter describes how the Dark Web differs from the Deep Web and elaborates on the commonly used software to enter the Dark Web. It highlights the illegitimate and legitimate sides of the Dark Web and specifies the role played by cryptocurrencies in the expansion of Dark Web's user base.


Internet technology continues to grow fast and has now become the dominant computing technology in developing software and computing applications. By fully taking advantage of the quick development of the service concept and modeling, Web services technology, as part of Internet technology, has rapidly evolved and made a drastic impact on enterprise integration. A deployed Web based service, relying on a suite of Internet based standard protocols, is self-contained, self-describing, and network-neutral computing component. It can be readily deployed, published, located, and invoked over the heterogeneous networks. This chapter starts with a brief introduction to the concepts of services and enterprise service computing. The Web service’s technical fundamentals are then fully explored. XML, SOAP, WSDL, and UDDI as the core technologies are further explained in great detail. Implementation examples are finally used to demonstrate how the Web services technology can be typically applied in integrating distributed applications across an organization.


Author(s):  
Li Weigang ◽  
Wu Man Qi

This chapter presents a study of Ant Colony Optimization (ACO) to Interlegis Web portal, Brazilian legislation Website. The approach of AntWeb is inspired by ant colonies foraging behavior to adaptively mark the most significant link by means of the shortest route to arrive the target pages. The system considers the users in the Web portal as artificial ants and the links among the pages of the Web pages as the researching network. To identify the group of the visitors, Web mining is applied to extract knowledge based on preprocessing Web log files. The chapter describes the theory, model, main utilities and implementation of AntWeb prototype in Interlegis Web portal. The case study shows Off-line Web mining; simulations without and with the use of AntWeb; testing by modification of the parameters. The result demonstrates the sensibility and accessibility of AntWeb and the benefits for the Interlegis Web users.


Author(s):  
Marta Fernández de Arriba ◽  
Eugenia Díaz ◽  
Jesús Rodríguez Pérez

This chapter presents the structure of an index which serves as support so allowing the development team to create the specification of the context of use document for the development of Web applications, bearing in mind characteristics of usability and accessibility, each point of the index being explained in detail. A correct preparation of this document ensures the quality of the developed Web applications. The international rules and standards related to the identification of the context of use have been taken into account. Also, the functionality limitations (sensorial, physical, or cognitive) which affect access to the Web are described, as well as the technological environment used by disabled people (assistive technologies or alternative browsers) to facilitate their access to the Web content. Therefore, following the developed specification of the context of use, usable and accessible Web applications with their corresponding benefits can be created.


Author(s):  
John DiMarco

Web authoring is the process of developing Web pages. The Web development process requires you to use software to create functional pages that will work on the Internet. Adding Web functionality is creating specific components within a Web page that do something. Adding links, rollover graphics, and interactive multimedia items to a Web page creates are examples of enhanced functionality. This chapter demonstrates Web based authoring techniques using Macromedia Dreamweaver. The focus is on adding Web functions to pages generated from Macromedia Fireworks and to overview creating Web pages from scratch using Dreamweaver. Dreamweaver and Fireworks are professional Web applications. Using professional Web software will benefit you tremendously. There are other ways to create Web pages using applications not specifically made to create Web pages. These applications include Microsoft Word and Microsoft PowerPoint. The use of Microsoft applications for Web page development is not covered in this chapter. However, I do provide steps on how to use these applications for Web page authoring within the appendix of this text. If you feel that you are more comfortable using the Microsoft applications or the Macromedia applications simply aren’t available to you yet, follow the same process for Web page conceptualization and content creation and use the programs available to you. You should try to get Web page development skills using Macromedia Dreamweaver because it helps you expand your software skills outside of basic office applications. The ability to create a Web page using professional Web development software is important to building a high-end computer skills set. The main objectives of this chapter are to get you involved in some technical processes that you’ll need to create the Web portfolio. Focus will be on guiding you through opening your sliced pages, adding links, using tables, creating pop up windows for content and using layers and timelines for dynamic HTML. The coverage will not try to provide a complete tutorial set for Macromedia Dreamweaver, but will highlight essential techniques. Along the way you will get pieces of hand coded action scripts and JavaScripts. You can decide which pieces you want to use in your own Web portfolio pages. The techniques provided are a concentrated workflow for creating Web pages. Let us begin to explore Web page authoring.


Author(s):  
Jie Zhao ◽  
Jianfei Wang ◽  
Jia Yang ◽  
Peiquan Jin

Company acquisition relation reflects a company's development intent and competitive strategies, which is an important type of enterprise competitive intelligence. In the traditional environment, the acquisition of competitive intelligence mainly relies on newspapers, internal reports, and so on, but the rapid development of the Web introduces a new way to extract company acquisition relation. In this paper, the authors study the problem of extracting company acquisition relation from huge amounts of Web pages, and propose a novel algorithm for company acquisition relation extraction. The authors' algorithm considers the tense feature of Web content and classification technology of semantic strength when extracting company acquisition relation from Web pages. It first determines the tense of each sentence in a Web page, which is then applied in sentences classification so as to evaluate the semantic strength of the candidate sentences in describing company acquisition relation. After that, the authors rank the candidate acquisition relations and return the top-k company acquisition relation. They run experiments on 6144 pages crawled through Google, and measure the performance of their algorithm under different metrics. The experimental results show that the algorithm is effective in determining the tense of sentences as well as the company acquisition relation.


2003 ◽  
Vol 29 (3) ◽  
pp. 381-419 ◽  
Author(s):  
Wessel Kraaij ◽  
Jian-Yun Nie ◽  
Michel Simard

Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.


Sign in / Sign up

Export Citation Format

Share Document