Framework for Automatic Semantic Annotation of Arabic Websites

2016 ◽  
Vol 25 (01) ◽  
pp. 1650001
Author(s):  
Tarek Helmy ◽  
Saeed Al-Bukhitan

In order to achieve the vision of the semantic Web, it is important to have enough amount of semantic content on the Web sources. To produce the semantic content on the existing Web, semantic annotation of the Web sources is required. Semantic annotation adds machine-readable content to the Web sources. Because the Web is growing at an exponential rate, semantic annotation by hand is not possible. In this paper, we present an Automatic Semantic Annotation Framework (ASAF) for semantic annotation of Arabic Web sources based on the domain ontologies. We present a learning approach that utilizes public Arabic resources, such as Wikipedia and WordNet for building Arabic ontologies. Moreover, we present different approaches for extracting name entities and relationships from Arabic Web sources. As a case study, we have developed and expanded a set of Arabic ontologies related to food, health, and nutrition through a set of processes. We have also developed the ASAF prototype, and showed how it can utilize these ontologies for extracting health, food related name entities, and relationships from the Web sources in order to annotate and store them in the knowledge-base. We conducted several experiments to test the capability of ASAF in recognizing the name entities and relationships using different approaches. Empirical evaluations of ASAF show promising performance results in terms of precision, recall, and [Formula: see text]-measure. The outcome of the presented framework could be utilized by semantic Web searching applications to retrieve precise answers to the end user smarter queries. An important feature of ASAF is that it could be ported to other domains with minimal extension. ASAF also contributes to the vision of the semantic Web in the target domains in Arabic Web sources.

Author(s):  
Andrew Iliadis ◽  
Wesley Stevens ◽  
Jean-Christophe Plantin ◽  
Amelia Acker ◽  
Huw Davies ◽  
...  

This panel focuses on the way that platforms have become key players in the representation of knowledge. Recently, there have been calls to combine infrastructure and platform-based frameworks to understand the nature of information exchange on the web through digital tools for knowledge sharing. The present panel builds and extends work on platform and infrastructure studies in what has been referred to as “knowledge as programmable object” (Plantin, et al., 2018), specifically focusing on how metadata and semantic information are shaped and exchanged in specific web contexts. As Bucher (2012; 2013) and Helmond (2015) show, data portability in the context of web platforms requires a certain level of semantic annotation. Semantic interoperability is the defining feature of so-called "Web 3.0"—traditionally referred to as the semantic web (Antoniou et al, 2012; Szeredi et al, 2014). Since its inception, the semantic web has privileged the status of metadata for providing the fine-grained levels of contextual expressivity needed for machine-readable web data, and can be found in products as diverse as Google's Knowledge Graph, online research repositories like Figshare, and other sources that engage in platformizing knowledge. The first paper in this panel examines the international Schema.org collaboration. The second paper investigates the epistemological implications when platforms organize data sharing. The third paper argues for the use of patents to inform research methodologies for understanding knowledge graphs. The fourth paper discusses private platforms’ extraction and collection of user metadata and the enclosure of data access.


2012 ◽  
pp. 535-578
Author(s):  
Jie Tang ◽  
Duo Zhang ◽  
Limin Yao ◽  
Yi Li

This chapter aims to give a thorough investigation of the techniques for automatic semantic annotation. The Semantic Web provides a common framework that allows data to be shared and reused across applications, enterprises, and community boundaries. However, lack of annotated semantic data is a bottleneck to make the Semantic Web vision a reality. Therefore, it is indeed necessary to automate the process of semantic annotation. In the past few years, there was a rapid expansion of activities in the semantic annotation area. Many methods have been proposed for automating the annotation process. However, due to the heterogeneity and the lack of structure of the Web data, automated discovery of the targeted or unexpected knowledge information still present many challenging research problems. In this chapter, we study the problems of semantic annotation and introduce the state-of-the-art methods for dealing with the problems. We will also give a brief survey of the developed systems based on the methods. Several real-world applications of semantic annotation will be introduced as well. Finally, some emerging challenges in semantic annotation will be discussed.


2021 ◽  
Vol 5 (1) ◽  
pp. 45-56
Author(s):  
Poonam Chahal ◽  
Manjeet Singh

In today's era, with the availability of a huge amount of dynamic information available in world wide web (WWW), it is complex for the user to retrieve or search the relevant information. One of the techniques used in information retrieval is clustering, and then the ranking of the web documents is done to provide user the information as per their query. In this paper, semantic similarity score of Semantic Web documents is computed by using the semantic-based similarity feature combining the latent semantic analysis (LSA) and latent relational analysis (LRA). The LSA and LRA help to determine the relevant concepts and relationships between the concepts which further correspond to the words and relationships between these words. The extracted interrelated concepts are represented by the graph further representing the semantic content of the web document. From this graph representation for each document, the HCS algorithm of clustering is used to extract the most connected subgraph for constructing the different number of clusters which is according to the information-theoretic approach. The web documents present in clusters in graphical form are ranked by using the text-rank method in combination with the proposed method. The experimental analysis is done by using the benchmark datasets OpinRank. The performance of the approach on ranking of web documents using semantic-based clustering has shown promising results.


Author(s):  
B. KAMALA ◽  
J. M. NANDHINI

Ontologies have become the effective modeling for various applications and significantly in the semantic web. The difficulty of extracting information from the web, which was created mainly for visualising information, has driven the birth of the semantic web, which will contain much more resources than the web and will attach machine-readable semantic information to these resources. Ontological bootstrapping on a set of predefined sources, such as web services, must address the problem of multiple, largely unrelated concepts. The web services consist of basically two components, Web Services Description Language (WSDL) descriptors and free text descriptors. The WSDL descriptor is evaluated using two methods, namely Term Frequency/Inverse Document Frequency (TF/IDF) and web context generation. The proposed bootstrapping ontological process integrates TF/IDF and web context generation and applies validation using the free text descriptor service, so that, it offers more accurate definition of ontologies. This paper uses ranking adaption model which predicts the rank for a collection of web service documents which leads to the automatic construction, enrichment and adaptation of ontologies.


2011 ◽  
Vol 6 (1) ◽  
pp. 165-182 ◽  
Author(s):  
David Tarrant ◽  
Steve Hitchcock ◽  
Leslie Carr

The Web is increasingly becoming a platform for linked data. This means making connections and adding value to data on the Web. As more data becomes openly available and more people are able to use the data, it becomes more powerful. An example is file format registries and the evaluation of format risks. Here the requirement for information is now greater than the effort that any single institution can put into gathering and collating this information. Recognising that more is better, the creators of PRONOM, JHOVE, GDFR and others are joining to lead a new initiative: the Unified Digital Format Registry. Ahead of this effort, a new RDF-based framework for structuring and facilitating file format data from multiple sources, including PRONOM, has demonstrated it is able to produce more links, and thus provide more answers to digital preservation questions - about format risks, applications, viewers and transformations - than the native data alone. This paper will describe this registry, P2, and its services, show how it can be used, and provide examples where it delivers more answers than the contributing resources. The P2 Registry is a reference platform to allow and encourage publication of preservation data, and also an examplar of what can be achieved if more data is published openly online as simple machine-readable documents. This approach calls for the active participation of the digital preservation community to contribute data by simply publishing it openly on the Web as linked data.


2008 ◽  
Vol 8 (3) ◽  
pp. 249-269 ◽  
Author(s):  
TIM BERNERS-LEE ◽  
DAN CONNOLLY ◽  
LALANA KAGAL ◽  
YOSI SCHARF ◽  
JIM HENDLER

AbstractThe Semantic Web drives toward the use of the Web for interacting with logically interconnected data. Through knowledge models such as Resource Description Framework (RDF), the Semantic Web provides a unifying representation of richly structured data. Adding logic to the Web implies the use of rules to make inferences, choose courses of action, and answer questions. This logic must be powerful enough to describe complex properties of objects but not so powerful that agents can be tricked by being asked to consider a paradox. The Web has several characteristics that can lead to problems when existing logics are used, in particular, the inconsistencies that inevitably arise due to the openness of the Web, where anyone can assert anything. N3Logic is a logic that allows rules to be expressed in a Web environment. It extends RDF with syntax for nested graphs and quantified variables and with predicates for implication and accessing resources on the Web, and functions including cryptographic, string, math. The main goal of N3Logic is to be a minimal extension to the RDF data model such that the same language can be used for logic and data. In this paper, we describe N3Logic and illustrate through examples why it is an appropriate logic for the Web.


2011 ◽  
pp. 1027-1049
Author(s):  
Danica Damljanovic ◽  
Vladan Devedžic

Traditional E-Tourism applications store data internally in a form that is not interoperable with similar systems. Hence, tourist agents spend plenty of time updating data about vacation packages in order to provide good service to their clients. On the other hand, their clients spend plenty of time searching for the ‘perfect’ vacation package as the data about tourist offers are not integrated and are available from different spots on the Web. We developed Travel Guides - a prototype system for tourism management to illustrate how semantic web technologies combined with traditional E-Tourism applications: a.) help integration of tourism sources dispersed on the Web b) enable creating sophisticated user profiles. Maintaining quality user profiles enables system personalization and adaptivity of the content shown to the user. The core of this system is in ontologies – they enable machine readable and machine understandable representation of the data and more importantly reasoning.


Author(s):  
Florence Amardeilh

This chapter deals with issues related to semantic annotation and ontology population within the framework defined by the Semantic Web (SW). The vision of the Semantic Web, initiated in 1998 by Sir Tim Berners-Lee, aims to structure the information available on the Web. To achieve that goal, the resources, textual or multimedia, must be semantically tagged by metadata so that software agents can utilize them. The idea developed in this chapter is to combine the information extraction (IE) tools with knowledge representation tools from the SW for the achievement of the 2 parallel tasks of semantic annotation and ontology population. The goal is to extract relevant information from the resources based on an ontology, then to populate that ontology with new instances according to the extracted information, and finally to use those instances to semantically annotate the resource. Despite all integration efforts, there is currently a gap between the representation formats of the linguistic tools used to extract information and those of the knowledge representation tools used to model the ontology and store the instances or the semantic annotations. The stake consists in proposing a methodological reflexion on the interoperability of these technologies as well as designing operational solutions for companies and, on a broader scale, for the Web.


Author(s):  
Gian Piero Zarri

As Web-based content becomes an increasingly important knowledge management resource, Webbased technologies are developing to help harness that resource in a more effective way. The current state of these Web-based technology – the ‘first generation’ or ‘syntactic’ Web – gives rise to well known, serious problems when trying to accomplish in a non-trivial way essential management tasks like indexing, searching, extracting, maintaining and generating information. These tasks would, in fact, require some sort of ‘deep understanding’ of the information dealt with: in a ‘syntactic’ Web context, on the contrary, computers are only used as tools for posting and rendering information by brute force. Faced with this situation, Tim Berners-Lee first proposed a sort of ‘Semantic Web’ (SW) where the access to information is based mainly on the processing of the semantic properties of this information: “… the Semantic Web is an extension of the current Web in which information is given well-defined meaning (emphasis added), better enabling computers and people to work in co-operation” (Berners-Lee et al., 2001: 35). The Semantic Web’s challenge consists then in being able to manage information on the Web by ‘understanding’ its proper semantic content (its meaning), and not simply by matching some keywords.


2021 ◽  
Author(s):  
Gillian Byrne ◽  
Lisa Goddard

Since 1999 the W3C has been working on a set of Semantic Web standards that have the potential to revolutionize web search. Also known as Linked Data, the Machine‐Readable Web, the Web of Data, or Web3.0, the Semantic Web relies on highly structured metadata that allow computers to understand the relationships between objects. Semantic web standards are complex, and difficult to conceptualize, but they offer solutions to many of the issues that plague libraries, including precise web search, authority control, classification, data portability, and disambiguation. This article will outline some of the benefits that linked data could have for libraries, will discuss some of the non‐technical obstacles that we face in moving forward, and will finally offer suggestions for practical ways in which libraries can participate in the development of the semantic web.


Sign in / Sign up

Export Citation Format

Share Document