XML Mining for Semantic Web

Data Mining ◽  
2013 ◽  
pp. 625-649
Author(s):  
Rafael Berlanga ◽  
Victoria Nebot

This chapter describes the convergence of two influential technologies in the last decade, namely data mining (DM) and the Semantic Web (SW). The wide acceptance of new SW formats for describing semantics-aware and semistructured contents have spurred on the massive generation of semantic annotations and large-scale domain ontologies for conceptualizing their concepts. As a result, a huge amount of both knowledge and semantic-annotated data is available in the web. DM methods have been very successful in discovering interesting patterns which are hidden in very large amounts of data. However, DM methods have been largely based on simple and flat data formats which are far from those available in the SW. This chapter reviews and discusses the main DM approaches proposed so far to mine SW data as well as those that have taken into account the SW resources and tools to define semantics-aware methods.

Author(s):  
Rafael Berlanga ◽  
Victoria Nebot

This chapter describes the convergence of two influential technologies in the last decade, namely data mining (DM) and the Semantic Web (SW). The wide acceptance of new SW formats for describing semantics-aware and semistructured contents have spurred on the massive generation of semantic annotations and large-scale domain ontologies for conceptualizing their concepts. As a result, a huge amount of both knowledge and semantic-annotated data is available in the web. DM methods have been very successful in discovering interesting patterns which are hidden in very large amounts of data. However, DM methods have been largely based on simple and flat data formats which are far from those available in the SW. This chapter reviews and discusses the main DM approaches proposed so far to mine SW data as well as those that have taken into account the SW resources and tools to define semantics-aware methods.


Author(s):  
Christopher Walton

At the start of this book we outlined the challenges of automatic computer based processing of information on the Web. These numerous challenges are generally referred to as the ‘vision’ of the Semantic Web. From the outset, we have attempted to take a realistic and pragmatic view of this vision. Our opinion is that the vision may never be fully realized, but that it is a useful goal on which to focus. Each step towards the vision has provided new insights on classical problems in knowledge representation, MASs, and Web-based techniques. Thus, we are presently in a significantly better position as a result of these efforts. It is sometimes difficult to see the purpose of the Semantic Web vision behind all of the different technologies and acronyms. However, the fundamental purpose of the Semantic Web is essentially large scale and automated data integration. The Semantic Web is not just about providing a more intelligent kind of Web search, but also about taking the results of these searches and combining them in interesting and useful ways. As stated in Chapter 1, the possible applications for the Semantic Web include: automated data mining, e-science experiments, e-learning systems, personalized newspapers and journals, and intelligent devices. The current state of progress towards the Semantic Web vision is summarized in Figure 8.1. This figure shows a pyramid with the human-centric Web at the bottom, sometimes termed the Syntactic Web, and the envisioned Semantic Web at the top. Throughout this book, we have been moving upwards on this pyramid, and it should be clear that a great deal of progress that has been made towards the goal. This progress is indicated by the various stages of the pyramid, which can be summarized as follows: • The lowest stage on the pyramid is the basic Web that should be familiar to everyone. This Web of information is human-centric and contains very little automation. Nonetheless, the Web provides the basic protocols and technologies on which the Semantic Web is founded. Furthermore, the information which is represented on the Web will ultimately be the source of knowledge for the Semantic Web.


Author(s):  
Qiankun Zhao ◽  
Sourav Saha Bhowmick

Nowadays the Web poses itself as the largest data repository ever available in the history of humankind (Reis et al., 2004). However, the availability of huge amount of Web data does not imply that users can get whatever they want more easily. On the contrary, the massive amount of data on the Web has overwhelmed their abilities to find the desired information. It has been claimed that 99% of the data reachable on the Web is useless to 99% of the users (Han & Kamber, 2000, pp. 436). That is, an individual may be interested in only a tiny fragment of the Web data. However, the huge and diverse properties of Web data do imply that Web data provides a rich and unprecedented data mining source.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1152 ◽  
Author(s):  
Sander Vanden Hautte ◽  
Pieter Moens ◽  
Joachim Van Herwegen ◽  
Dieter De Paepe ◽  
Bram Steenwinckel ◽  
...  

In industry, dashboards are often used to monitor fleets of assets, such as trains, machines or buildings. In such industrial fleets, the vast amount of sensors evolves continuously, new sensor data exchange protocols and data formats are introduced, new visualization types may need to be introduced and existing dashboard visualizations may need to be updated in terms of displayed sensors. These requirements motivate the development of dynamic dashboarding applications. These, as opposed to fixed-structure dashboard applications, allow users to create visualizations at will and do not have hard-coded sensor bindings. The state-of-the-art in dynamic dashboarding does not cope well with the frequent additions and removals of sensors that must be monitored—these changes must still be configured in the implementation or at runtime by a user. Also, the user is presented with an overload of sensors, aggregations and visualizations to select from, which may sometimes even lead to the creation of dashboard widgets that do not make sense. In this paper, we present a dynamic dashboard that overcomes these problems. Sensors, visualizations and aggregations can be discovered automatically, since they are provided as RESTful Web Things on a Web Thing Model compliant gateway. The gateway also provides semantic annotations of the Web Things, describing what their abilities are. A semantic reasoner can derive visualization suggestions, given the Thing annotations, logic rules and a custom dashboard ontology. The resulting dashboarding application automatically presents the available sensors, visualizations and aggregations that can be used, without requiring sensor configuration, and assists the user in building dashboards that make sense. This way, the user can concentrate on interpreting the sensor data and detecting and solving operational problems early.


Author(s):  
Adiraju Prasanth Rao

The Semantic Web is a standard of Common Data Formats on WWW with aim to convert the current web data of unstructured and semi-structured documents into a common framework that allows data to be shared and reused across applications, enterprises. The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily. Humans are capable of using the Web to carry out tasks such as searching for the lowest price for a LAPTOP. However, machines cannot accomplish all of these tasks without human direction, because web pages are designed to be read by people, not machines. The semantic web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web. The chapter presents the architecture of semantic web, its challenging issues and also data quality principles. These principles provide a better decision making within organization and will maximize long term data integration and interoperability.


2009 ◽  
pp. 596-614 ◽  
Author(s):  
I. Koffina ◽  
G. Serfiotis ◽  
V. Christophides ◽  
V. Tannen

Semantic Web (SW) technology aims to facilitate the integration of legacy data sources spread worldwide. Despite the plethora of SW languages (e.g., RDF/S, OWL) recently proposed for supporting large-scale information interoperation, the vast majority of legacy sources still rely on relational databases (RDB) published on the Web or corporate intranets as virtual XML. In this article, we advocate a first-order logic framework for mediating high-level queries to relational and/or XML sources using community ontologies expressed in a SW language such as RDF/S. We describe the architecture and reasoning services of our SW integration middleware, termed SWIM, and we present the main design choices and techniques for supporting powerful mappings between different data models, as well as reformulation and optimization of queries expressed against mediator ontologies and views.


Author(s):  
Adiraju Prasanth Rao

The Semantic Web is a standard of Common Data Formats on WWW with aim to convert the current web data of unstructured and semi-structured documents into a common framework that allows data to be shared and reused across applications, enterprises. The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily. Humans are capable of using the Web to carry out tasks such as searching for the lowest price for a LAPTOP. However, machines cannot accomplish all of these tasks without human direction, because web pages are designed to be read by people, not machines. The semantic web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web. The chapter presents the architecture of semantic web, its challenging issues and also data quality principles. These principles provide a better decision making within organization and will maximize long term data integration and interoperability.


Web Services ◽  
2019 ◽  
pp. 1907-1916
Author(s):  
Adiraju Prasanth Rao

The Semantic Web is a standard of Common Data Formats on WWW with aim to convert the current web data of unstructured and semi-structured documents into a common framework that allows data to be shared and reused across applications, enterprises. The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily. Humans are capable of using the Web to carry out tasks such as searching for the lowest price for a LAPTOP. However, machines cannot accomplish all of these tasks without human direction, because web pages are designed to be read by people, not machines. The semantic web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web. The chapter presents the architecture of semantic web, its challenging issues and also data quality principles. These principles provide a better decision making within organization and will maximize long term data integration and interoperability.


Author(s):  
R. Arabsheibani ◽  
S. Ariannamazi ◽  
F. Hakimpour

The Web and its capabilities can be employed as a tool for data and information integration if comprehensive datasets and appropriate technologies and standards enable the web with interpretation and easy alignment of data and information. Semantic Web along with the spatial functionalities enable the web to deal with the huge amount of data and information. The present study investigate the advantages and limitations of the Spatial Semantic Web and compare its capabilities with relational models in order to build a spatial data infrastructure. An architecture is proposed and a set of criteria is defined for the efficiency evaluation. The result demonstrate that when using the data with special characteristics such as schema dynamicity, sparse data or available relations between the features, the spatial semantic web and graph databases with spatial operations are preferable.


2018 ◽  
Vol 25 (1) ◽  
pp. 174-200
Author(s):  
Daphné Kerremans ◽  
Jelena Prokić ◽  
Quirin Würschinger ◽  
Hans-Jörg Schmid

Abstract This paper presents the NeoCrawler – a tailor-made webcrawler, which identifies and retrieves neologisms from the Internet and systematically monitors the use of detected neologisms on the web by means of weekly searches. It enables researchers to use the web as a corpus in order to investigate the dynamics of lexical innovation on a large-scale and systematic basis. The NeoCrawler represents an innovative web-mining tool which opens up new opportunities for linguists to tackle a number of unresolved and under-researched issues in the field of lexical innovation. This paper presents the design as well as the most important characteristics of two modules, the Discoverer and the Observer, with regard to the usage-based study of lexical innovation and diffusion.


Sign in / Sign up

Export Citation Format

Share Document