Towards the Representation of Etymological Data on the Semantic Web

In this article, we look at the potential for a wide-coverage modelling of etymological information as linked data using the Resource Data Framework (RDF) data model. We begin with a discussion of some of the most typical features of etymological data and the challenges that these might pose to an RDF-based modelling. We then propose a new vocabulary for representing etymological data, the Ontolex-lemon Etymological Extension (lemonETY), based on the ontolex-lemon model. Each of the main elements of our new model is motivated with reference to the preceding discussion.

Download Full-text

Efficiently Processing and Storing Library Linked Data using Apache Spark and Parquet

Information Technology and Libraries ◽

10.6017/ital.v37i3.10177 ◽

2018 ◽

Vol 37 (3) ◽

pp. 29-49

Author(s):

Kumar Sharma ◽

Ujjal Marjit ◽

Utpal Biswas

Keyword(s):

Large Volume ◽

Data Model ◽

Experimental Evaluation ◽

Linked Data ◽

Storage Systems ◽

Storage System ◽

File Systems ◽

Large Data ◽

Apache Spark ◽

Rdf Data

Resource Description Framework (RDF) is a commonly used data model in the Semantic Web environment. Libraries and various other communities have been using the RDF data model to store valuable data after it is extracted from traditional storage systems. However, because of the large volume of the data, processing and storing it is becoming a nightmare for traditional data-management tools. This challenge demands a scalable and distributed system that can manage data in parallel. In this article, a distributed solution is proposed for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. Apache Spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing RDF data. The storage system is built on top of Hadoop Distributed File Systems (HDFS) and uses the Apache Parquet format to store data in a compressed form. The experimental evaluation showed that storage requirements were reduced significantly as compared to Jena TDB, Sesame, RDF/XML, and N-Triples file formats. SPARQL queries are processed using Spark SQL to query the compressed data. The experimental evaluation showed a good query response time, which significantly reduces as the number of worker nodes increases.

Download Full-text

Conversion of the English-Xhosa Dictionary for Nurses to a Linguistic Linked Data Framework

Information ◽

10.3390/info9110274 ◽

2018 ◽

Vol 9 (11) ◽

pp. 274 ◽

Cited By ~ 1

Author(s):

Frances Gillis-Webber

Keyword(s):

Data Model ◽

Linked Data ◽

Language Resources ◽

The Public ◽

Data Framework ◽

Description Framework ◽

Machine Readable ◽

Methodological Guidelines ◽

Resource Description ◽

Language Pair

The English-Xhosa Dictionary for Nurses (EXDN) is a bilingual, unidirectional printed dictionary in the public domain, with English and isiXhosa as the language pair. By extending the digitisation efforts of EXDN from a human-readable digital object to a machine-readable state, using Resource Description Framework (RDF) as the data model, semantically interoperable structured data can be created, thus enabling EXDN’s data to be reused, aggregated and integrated with other language resources, where it can serve as a potential aid in the development of future language resources for isiXhosa, an under-resourced language in South Africa. The methodological guidelines for the construction of a Linguistic Linked Data framework (LLDF) for a lexicographic resource, as applied to EXDN, are described, where an LLDF can be defined as a framework: (1) which describes data in RDF, (2) using a model designed for the representation of linguistic information, (3) which adheres to Linked Data principles, and (4) which supports versioning, allowing for change. The result is a bidirectional lexicographic resource, previously bounded and static, now unbounded and evolving, with the ability to extend to multilingualism.

Download Full-text

N3Logic: A logical framework for the World Wide Web

Theory and Practice of Logic Programming ◽

10.1017/s1471068407003213 ◽

2008 ◽

Vol 8 (3) ◽

pp. 249-269 ◽

Cited By ~ 89

Author(s):

TIM BERNERS-LEE ◽

DAN CONNOLLY ◽

LALANA KAGAL ◽

YOSI SCHARF ◽

JIM HENDLER

Keyword(s):

Semantic Web ◽

Data Model ◽

Structured Data ◽

Minimal Extension ◽

Web Environment ◽

Knowledge Models ◽

Rdf Data ◽

Description Framework ◽

Resource Description ◽

The Web

AbstractThe Semantic Web drives toward the use of the Web for interacting with logically interconnected data. Through knowledge models such as Resource Description Framework (RDF), the Semantic Web provides a unifying representation of richly structured data. Adding logic to the Web implies the use of rules to make inferences, choose courses of action, and answer questions. This logic must be powerful enough to describe complex properties of objects but not so powerful that agents can be tricked by being asked to consider a paradox. The Web has several characteristics that can lead to problems when existing logics are used, in particular, the inconsistencies that inevitably arise due to the openness of the Web, where anyone can assert anything. N3Logic is a logic that allows rules to be expressed in a Web environment. It extends RDF with syntax for nested graphs and quantified variables and with predicates for implication and accessing resources on the Web, and functions including cryptographic, string, math. The main goal of N3Logic is to be a minimal extension to the RDF data model such that the same language can be used for logic and data. In this paper, we describe N3Logic and illustrate through examples why it is an appropriate logic for the Web.

Download Full-text

Web Service Discovery Using Bio-Inspired Holistic Matching Based Linked Data Clustering Model for RDF Data

Recent Patents on Computer Science ◽

10.2174/2213275912666190715164645 ◽

2019 ◽

Vol 12 ◽

Author(s):

Manish Kumar Mehrotra ◽

Suvendu Kanungo

Keyword(s):

Semantic Web ◽

Web Service ◽

Data Clustering ◽

Linked Data ◽

Service Discovery ◽

Data Retrieval ◽

Data Representation ◽

Web Service Discovery ◽

Clustering Model ◽

Rdf Data

: Resource description framework (RDF) is the de-facto standard language model for semantic data representation on Semantic Web. Designing an efficient management of RDF data with huge volume and efficient querying techniques are the primary research areas in semantic web. So far, several RDF management methods have been offered with data storage designs and query processing algorithms for data retrieval. We propose a Bio-inspired Holistic Matching based Linked Data Clustering (BHM-LDC) which works based on RDF data storing, clustering the linked data and web service discovery. Initially the BHM-LDC algorithm store the RDF dataset as graph based linked data. Then, an Integrated Holistic Entity Matching based Distributed Genetic Algorithm (IHEM-DGA) is proposed to cluster the linked data. Finally, modified sub-graph matching based Web Service Discovery Algorithm uses the clustered triples to find the best web services. The performance of the proposed web service discovery approach is established by business RDF dataset.

Download Full-text

Link maintenance for integrity in linked open data evolution: Literature survey and open challenges

Semantic Web ◽

10.3233/sw-200398 ◽

2020 ◽

pp. 1-25

Author(s):

Andre Gomes Regino ◽

Julio Cesar dos Reis ◽

Rodrigo Bonacin ◽

Ahsan Morshed ◽

Timos Sellis

Keyword(s):

Semantic Web ◽

Linked Data ◽

Open Data ◽

Linked Open Data ◽

Evolutionary Characteristic ◽

Rdf Data ◽

Link Maintenance ◽

Data Elements ◽

Data Evolution ◽

Over Time

RDF data has been extensively deployed describing various types of resources in a structured way. Links between data elements described by RDF models stand for the core of Semantic Web. The rising amount of structured data published in public RDF repositories, also known as Linked Open Data, elucidates the success of the global and unified dataset proposed by the vision of the Semantic Web. Nowadays, semi-automatic algorithms build connections among these datasets by exploring a variety of methods. Interconnected open data demands automatic methods and tools to maintain their consistency over time. The update of linked data is considered as key process due to the evolutionary characteristic of such structured datasets. However, data changing operations might influence well-formed links, which turns difficult to maintain the consistencies of connections over time. In this article, we propose a thorough survey that provides a systematic review of the state of the art in link maintenance in linked open data evolution scenario. We conduct a detailed analysis of the literature for characterising and understanding methods and algorithms responsible for detecting, fixing and updating links between RDF data. Our investigation provides a categorisation of existing approaches as well as describes and discusses existing studies. The results reveal an absence of comprehensive solutions suited to fully detect, warn and automatically maintain the consistency of linked data over time.

Download Full-text

TSV2RDF: Generating RDF Data Model from TSV File Format Using Semantic Web Technologies

Journal of Digital Information Management ◽

10.6025/jdim/2021/19/1/10-24 ◽

2021 ◽

Vol 19 (1) ◽

pp. 10

Author(s):

Mammadov Hasan ◽

Yan Li ◽

Muhammad Waqas Ahmad

Keyword(s):

Semantic Web ◽

Data Model ◽

File Format ◽

Semantic Web Technologies ◽

Web Technologies ◽

Rdf Data

Download Full-text

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

International Journal of Semantic Computing ◽

10.1142/s1793351x1100133x ◽

2011 ◽

Vol 05 (04) ◽

pp. 433-462 ◽

Cited By ~ 9

Author(s):

ANDRÉ FREITAS ◽

EDWARD CURRY ◽

JOÃO GABRIEL OLIVEIRA ◽

SEÁN O'RIAIN

Keyword(s):

Vector Space ◽

Data Model ◽

Linked Data ◽

Fundamental Problem ◽

Semantic Space ◽

Semantic Interpretation ◽

Rdf Graph ◽

Data Consumption ◽

Model Independent ◽

Rdf Data

The vision of creating a Linked Data Web brings together the challenge of allowing queries across highly heterogeneous and distributed datasets. In order to query Linked Data on the Web today, end users need to be aware of which datasets potentially contain the data and also which data model describes these datasets. The process of allowing users to expressively query relationships in RDF while abstracting them from the underlying data model represents a fundamental problem for Web-scale Linked Data consumption. This article introduces a distributional structured semantic space which enables data model independent natural language queries over RDF data. The center of the approach relies on the use of a distributional semantic model to address the level of semantic interpretation demanded to build the data model independent approach. The article analyzes the geometric aspects of the proposed space, providing its description as a distributional structured vector space, which is built upon the Generalized Vector Space Model (GVSM). The final semantic space proved to be flexible and precise under real-world query conditions achieving mean reciprocal rank = 0.516, avg. precision = 0.482 and avg. recall = 0.491.

Download Full-text

Linking Data and Descriptions on Moths Using the Wikimedia Ecosystem

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73806 ◽

2021 ◽

Vol 5 ◽

Author(s):

Andra Waagmeester ◽

Paul Braun ◽

Manoj Karingamadathil ◽

Jose Emilio Labra Gayo ◽

Siobhan Leachman ◽

...

Keyword(s):

New Zealand ◽

Semantic Web ◽

Data Model ◽

Linked Data ◽

Data Models ◽

Data Reuse ◽

Data Sources ◽

Scientific Article ◽

Global Biodiversity Information Facility ◽

Data Repositories

Moths form a diverse group of species that are predominantly active at night. They are colourful, have an ecological role, but are less well described compared to their closest relatives, the butterflies. Much remains to be understood about moths, which is shown by the many issues within their taxonomy, including being a paraphyletic group and the inability to clearly distinguish them from butterflies (Fig. 1). We present the Wikimedia architecture as a hub of knowledge on moths. This ecosystem consists of 312 language editions of Wikipedia and sister projects such as Wikimedia commons (a multimedia repository), and Wikidata (a public knowledge graph). Through Wikidata, external data repositories can be integrated into this knowledge landscape on moths. Wikidata contains links to (open) data repositories on biodiversity like iNaturalist, Global Biodiversity Information Facility (GBIF) and the Biodiversity Heritage Library (BHL) which in return contain detailed content like species occurrence data, images or publications on moths. We present a workflow that integrates crowd-sourced information and images from iNaturalist, with content from GBIF and BHL into the different language editions of Wikipedia. The Wikipedia articles in turn feed information to other sources. Taxon pages on iNaturalist, for example, have an "About" tab, which is fed by the Wikipedia article describing the respective taxon, where the current language of the (iNaturalist) interface fetches the appropriate language version from Wikipedia. This is a nice example of data reuse, which is one of the pillars of FAIR (Findable, Accessible, Interoperable and Reusable) (Wilkinson et al. 2016). Wikidata provides the linked data hub in this flow of knowledge. Since Wikidata is available in RDF, it aligns well with the data model of the semantic web. This allows rapid integration with other linked data sources, and provides an intuitive portal for non-linked data to be integrated as linked data with this semantic web. rapid integration with other linked data sources, and provides an intuitive portal for non-linked data to be integrated as linked data with this semantic web. Wikidata is includes information on all sorts of things (e.g., people, species, locations, events). Which is why it can structure data in a multitude of ways, thus leading to 9000+ properties. Because all those different domains and communities use the same source for different things it is important to have good structure and documentation for a specific topic so we and others can interpret the data. We present a schema that describes data about moth taxa on Wikidata. Since 2019, Wikidata has an EntitySchema namespace that allows contributors to specify applicable linked-data schemas. The schemas are expressed using Shape Expressions (ShEx) (Thornton et al. 2019), which is a formal modelling language for RDF, one of the data formats used on the Semantic Web. Since Wikidata is also rendered as RDF, it is possible to use ShEx to describe data models and user expectations in Wikidata (Waagmeester et al. 2021). These schemas can then be used to verify if a subset of Wikidata conforms to an expected or described data model. Starting from a document that describes an expected schema on moths, we have developed an EntitySchema (E321) for moths in Wikidata. This schema provides unambiguous guidance for contributors who have data they are not sure how to model. For example, a user with data about a particular species of moth may be working from a scientific article that states that the species is only found in New Zealand, and may be unsure of how to model that fact as a statement in Wikidata. After consulting Schema E321, the user will find out about Property P183 “endemic_to” and then use that property to state that the species is endemic to New Zealand. As more contributors follow the data model expressed in schema E321, there will be structural consistency across items for moths in Wikidata. This reduces the risk of contributors using different combinations of properties and qualifiers to express the same meaning. If a contributor needs to express something that is not yet represented in Schema E321 they can extend the schema itself, as each schema can be edited. The multilingual affordances of the Wikidata platform allow users to edit in over 300 languages. In this way, contributors edit in their preferred language and see the structure of the data as well as the schemas in their language of choice. This broadens the range of people who can contribute to these data models and reduces the dominance of English. There are approximately 160K+ estimated moth species. This number is equal to the number of moths described in iNaturalist, while Wikidata contains 220K items on moths. As the biggest language edition, the English Wikipedia contains 65K moth articles; other language editions contain far fewer Wikipedia articles. The higher number of items on moths in Wikidata can be partly explained by Wikidata taxon synonyms being treated as distinct taxa. Wikidata, as a proxy of knowledge on moths, is instrumental in getting them better described in Wikipedia and other (FAIR) sources. While in return, curation in Wikidata happens by a large community. This approach to data modelling has the advantage of allowing multilingual collaboration and iterative extension and improvement over time.

Download Full-text

SEMANTIC LINKING SPATIAL RDF DATA TO THE WEB DATA SOURCES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-639-2018 ◽

2018 ◽

Vol XLII-4 ◽

pp. 639-645 ◽

Cited By ~ 1

Author(s):

D. Ulutaş Karakol ◽

G. Kara ◽

C. Yılmaz ◽

Ç. Cömert

Keyword(s):

Semantic Web ◽

Spatial Data ◽

Linked Data ◽

Relational Databases ◽

Data Sources ◽

Data Conversion ◽

Web Data ◽

Link Discovery ◽

Rdf Data ◽

The Web

Abstract. Large amounts of spatial data are hold in relational databases. Spatial data in the relational databases must be converted to RDF for semantic web applications. Spatial data is an important key factor for creating spatial RDF data. Linked Data is the most preferred way by users to publish and share data in the relational databases on the Web. In order to define the semantics of the data, links are provided to vocabularies (ontologies or other external web resources) that are common conceptualizations for a domain. Linking data of resource vocabulary with globally published concepts of domain resources combines different data sources and datasets, makes data more understandable, discoverable and usable, improves data interoperability and integration, provides automatic reasoning and prevents data duplication. The need to convert relational data to RDF is coming in sight due to semantic expressiveness of Semantic Web Technologies. One of the important key factors of Semantic Web is ontologies. Ontology means “explicit specification of a conceptualization”. The semantics of spatial data relies on ontologies. Linking of spatial data from relational databases to the web data sources is not an easy task for sharing machine-readable interlinked data on the Web. Tim Berners-Lee, the inventor of the World Wide Web and the advocate of Semantic Web and Linked Data, layed down the Linked Data design principles. Based on these rules, firstly, spatial data in the relational databases must be converted to RDF with the use of supporting tools. Secondly, spatial RDF data must be linked to upper level-domain ontologies and related web data sources. Thirdly, external data sources (ontologies and web data sources) must be determined and spatial RDF data must be linked related data sources. Finally, spatial linked data must be published on the web. The main contribution of this study is to determine requirements for finding RDF links and put forward the deficiencies for creating or publishing linked spatial data. To achieve this objective, this study researches existing approaches, conversion tools and web data sources for relational data conversion to the spatial RDF. In this paper, we have investigated current state of spatial RDF data, standards, open source platforms (particularly D2RQ, Geometry2RDF, TripleGeo, GeoTriples, Ontop, etc.) and the Web Data Sources. Moreover, the process of spatial data conversion to the RDF and how to link it to the web data sources is described. The implementation of linking spatial RDF data to the web data sources is demonstrated with an example use case. Road data has been linked to the one of the related popular web data sources, DBPedia. SILK, a tool for discovering relationships between data items within different Linked Data sources, is used as a link discovery framework. Also, we evaluated other link discovery tools e.g. LIMES, Silk and results are compared to carry out matching/linking task. As a result, linked road data is shared and represented as an information resource on the web and enriched with definitions of related different resources. By this way, road datasets are also linked by the related classes, individuals, spatial relations and properties they cover such as, construction date, road length, coordinates, etc.

Download Full-text

Can Bibliographic Data be Put Directly onto the Semantic Web?

Information Technology and Libraries ◽

10.6017/ital.v28i2.3175 ◽

2009 ◽

Vol 28 (2) ◽

pp. 55 ◽

Cited By ~ 5

Author(s):

Martha M. Yee

Keyword(s):

Semantic Web ◽

Resource Description Framework ◽

Data Model ◽

Community Resource ◽

Work In Progress ◽

Library Community ◽

Bibliographic Data ◽

Rdf Data ◽

Description Framework ◽

Resource Description

This paper is a think piece about the possible future of bibliographic control; it provides a brief introduction to the Semantic Web and defines related terms, and it discusses granularity and structure issues and the lack of standards for the efficient display and indexing of bibliographic data. It is also a report on a work in progress—an experiment in building a Resource Description Framework (RDF) model of more FRBRized cataloging rules than those about to be introduced to the library community (Resource Description and Access) and in creating an RDF data model for the rules. I am now in the process of trying to model my cataloging rules in the form of an RDF model, which can also be inspected at <a href="http://myee.bol.ucla.edu/">http://myee.bol.ucla.edu/</a>. In the process of doing this, I have discovered a number of areas in which I am not sure that RDF is sophisticated enough yet to deal with our data. This article is an attempt to identify some of those areas and explore whether or not the problems I have encountered are soluble—in other words, whether or not our data might be able to live on the Semantic Web. In this paper, I am focusing on raising the questions about the suitability of RDF to our data that have come up in the course of my work.

Download Full-text