Measuring vocabulary use in the Linked Data Cloud

2017 ◽  
Vol 41 (2) ◽  
pp. 252-271 ◽  
Author(s):  
Alberto Nogales ◽  
Miguel Angel Sicilia-Urban ◽  
Elena García-Barriocanal

Purpose This paper reports on a quantitative study of data gathered from the Linked Open Vocabularies (LOV) catalogue, including the use of network analysis and metrics. The purpose of this paper is to gain insights into the structure of LOV and the use of vocabularies in the Web of Data. It is important to note that not all the vocabularies in it are registered in LOV. Given the de-centralised and collaborative nature of the use and adoption of these vocabularies, the results of the study can be used to identify emergent important vocabularies that are shaping the Web of Data. Design/methodology/approach The methodology is based on an analytical approach to a data set that captures a complete snapshot of the LOV catalogue dated April 2014. An initial analysis of the data is presented in order to obtain insights into the characteristics of the vocabularies found in LOV. This is followed by an analysis of the use of Vocabulary of a Friend properties that describe relations among vocabularies. Finally, the study is complemented with an analysis of the usage of the different vocabularies, and concludes by proposing a number of metrics. Findings The most relevant insight is that unsurprisingly the vocabularies with more presence are those used to model Semantic Web data, such as Resource Description Framework, RDF Schema and OWL, as well as broadly used standards as Simple Knowledge Organization System, DCTERMS and DCE. It was also discovered that the most used language is English and the vocabularies are not considered to be highly specialised in a field. Also, there is not a dominant scope of the vocabularies. Regarding the structural analysis, it is concluded that LOV is a heterogeneous network. Originality/value The paper provides an empirical analysis of the structure of LOV and the relations between its vocabularies, together with some metrics that may be of help to determine the important vocabularies from a practical perspective. The results are of interest for a better understanding of the evolution and dynamics of the Web of Data, and for applications that attempt to retrieve data in the Linked Data Cloud. These applications can benefit from the insights into the important vocabularies to be supported and the value added when mapping between and using the vocabularies.

Author(s):  
Leila Zemmouchi-Ghomari

The data on the web is heterogeneous and distributed, which makes its integration a sine qua non-condition for its effective exploitation within the context of the semantic web or the so-called web of data. A promising solution for web data integration is the linked data initiative, which is based on four principles that aim to standardize the publication of structured data on the web. The objective of this chapter is to provide an overview of the essential aspects of this fairly recent and exciting field, including the model of linked data: resource description framework (RDF), its query language: simple protocol, and the RDF query language (SPARQL), the available means of publication and consumption of linked data, and the existing applications and the issues not yet addressed in research.


2018 ◽  
Vol 52 (3) ◽  
pp. 405-423 ◽  
Author(s):  
Riccardo Albertoni ◽  
Monica De Martino ◽  
Paola Podestà

Purpose The purpose of this paper is to focus on the quality of the connections (linkset) among thesauri published as Linked Data on the Web. It extends the cross-walking measures with two new measures able to evaluate the enrichment brought by the information reached through the linkset (lexical enrichment, browsing space enrichment). It fosters the adoption of cross-walking linkset quality measures besides the well-known and deployed cardinality-based measures (linkset cardinality and linkset coverage). Design/methodology/approach The paper applies the linkset measures to the Linked Thesaurus fRamework for Environment (LusTRE). LusTRE is selected as testbed as it is encoded using a Simple Knowledge Organisation System (SKOS) published as Linked Data, and it explicitly exploits the cross-walking measures on its validated linksets. Findings The application on LusTRE offers an insight of the complementarities among the considered linkset measures. In particular, it shows that the cross-walking measures deepen the cardinality-based measures analysing quality facets that were not previously considered. The actual value of LusTRE’s linksets regarding the improvement of multilingualism and concept spaces is assessed. Research limitations/implications The paper considers skos:exactMatch linksets, which belong to a rather specific but a quite common kind of linkset. The cross-walking measures explicitly assume correctness and completeness of linksets. Third party approaches and tools can help to meet the above assumptions. Originality/value This paper fulfils an identified need to study the quality of linksets. Several approaches formalise and evaluate Linked Data quality focusing on data set quality but disregarding the other essential component: the connection among data.


2019 ◽  
Vol 37 (3) ◽  
pp. 513-524
Author(s):  
Thomas D. Steele

Purpose Bibliographic framework initiative (BIBFRAME) is a data model created by the Library of Congress to with the long-term goal of replacing Machine Readable Cataloging (MARC). The purpose of this paper is to inform catalogers and other library professionals why MARC is lacking in the needs of current users, and how BIBFRAME works better to meet these needs. It will also explain linked data and the principles of Resource Description Framework, so catalogers will have a better understanding of BIBFRAME’s basic goals. Design/methodology/approach The review of recent literature in print and online, as well as using the BIBFRAME editor to create a BIBFRAME record, was the basis for this paper. Findings The paper concludes the user experience with the library catalog has changed and requires more in-depth search capabilities using linked data and that BIBFRAME is a first step in meeting the user needs of the future. Originality/value The paper gives the reader an entry point into the complicated future catalogers and other professionals may feel trepidation about. With a systematic walkthrough of the creation of a BIBFRAME record, the reader should feel more informed where the future of cataloging is going.


2015 ◽  
Vol 8 (2) ◽  
Author(s):  
Jayalakshmi Srinivasan

In the last few years, the amount of structured data made available on the Web in semantic formats has grown by several orders of magnitude. On one side, the Linked Data effort has made available online hundreds of millions of entity descriptions based on the Resource Description Framework (RDF) in data sets. On the other hand, the Web 2.0 community has increasingly embraced the idea of data portability, and the first efforts have already produced billions of RDF equivalent triples either embedded inside HTML pages using micro formats or exposed directly using eRDF (embedded RDF) and RDFa (RDF attributes). In another side Cloud Computing is offering utility concerned IT services to users worldwide. It enables hosting of applications from consumers, scientific and business domains. The beauty of cloud computing is its simplicity. This paper focuses on the process of transitioning from IT architectures of today to Semantic Cloud Architecture. The emphasis is on collaborative work of business and enterprise architects to reduce operational costs and to achieve heights.


Author(s):  
Alberto Nogales Moyano ◽  
Miguel Angel Sicilia ◽  
Elena Garcia Barriocanal

This article describes how the Web of Data has emerged as the realization of a machine readable web relying on the resource description framework language as a way to provide richer semantics to datasets. While the web of data is based on similar principles as the original web, being interlinked in the principal mechanism to relate information, the differences in the structure of the information is evident. Several studies have analysed the graph structure of the web, yielding important insights that were used in relevant applications. However, those findings cannot be transposed to the Web of Data, due to fundamental differences in the production, link creation and usage. This article reports on a study of the graph structure of the Web of Data using methods and techniques from similar studies for the Web. Results show that the Web of Data also complies with the theory of the bow-tie. Other characteristics are the low distance between nodes or the closeness and degree centrality are low. Regarding the datasets, the biggest one is Open Data Euskadi but the one with more connections to other datasets is Dbpedia.


Author(s):  
Janailton Lopes Souza ◽  
Paulo George Miranda Martins ◽  
Rogério Aparecido Sá Ramalho

O termo Big Data se refere ao grande volume de dados produzidos e disponibilizados em ambientes digitais. Ao longo dos últimos anos novos modelos de representação têm sido propostos no intuito de aperfeiçoar as formas de representação de informações em ambientes digitais. O presente trabalho está vinculado a um projeto de pesquisa em andamento, financiado pelas agências FAPESP e CNPq, e possui como objetivo analisar os princípios que fundamentam o Big Data e sua relação com os novos padrões de representação Resource Description Framework (RDF); Simple Knowledge Organization System (SKOS) e Ontology Web Language (OWL). A pesquisa possui caráter teórico e abordagem qualitativa, pois busca apresentar características voltadas à descrição, compreensão e explicação das relações do Big Data com os novos modelos de representação. A partir do levantamento teórico realizado, foi verificado que os modelos de representação analisados contribuem para interligar grandes volumes de dados sem perder o contexto no qual são originados, favorecendo um melhor entendimento do Big Data e os novos paradigmas de representação em ambientes digitais.


2017 ◽  
Vol 44 (2) ◽  
pp. 203-229 ◽  
Author(s):  
Javier D Fernández ◽  
Miguel A Martínez-Prieto ◽  
Pablo de la Fuente Redondo ◽  
Claudio Gutiérrez

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
A. D'Amato

PurposeThe purpose of this paper is to analyze the relationship between intellectual capital and firm capital structure by exploring whether firm profitability and risk are drivers of this relationship.Design/methodology/approachBased on a comprehensive data set of Italian firms over the 2008–2017 period, this paper examines whether intellectual capital affects firm financial leverage. Moreover, it analyzes whether firm profitability and risk mediate the abovementioned relationship. Financial leverage is measured by the debt/equity ratio. Intellectual capital is measured via the value-added intellectual coefficient approach.FindingsThe findings show that firms with a high level of intellectual capital have lower financial leverage and are more profitable and riskier than firms with a low level of intellectual capital. Furthermore, this study finds that firm profitability and risk mediate the relationship between intellectual capital and financial leverage. Thus, the higher profitability and risk of intellectual capital-intensive firms help explain their lower financial leverage.Research limitations/implicationsThe findings have several implications. From a theoretical standpoint, the paper presents and tests a mediating model of the relationship between intellectual capital and financial leverage and its underlying processes. In terms of the more general managerial implications, the results provide managers with a clear interpretation of the relationship between intellectual capital and financial leverage and point to the need to strengthen the capital structure of intangible-intensive firms.Originality/valueThrough a mediation framework, this study provides empirical evidence on the relationship between intellectual capital and firm financial leverage by exploring the underlying mechanisms behind that relationship, which is a novel approach in the literature.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ceri Binding ◽  
Claudio Gnoli ◽  
Douglas Tudhope

PurposeThe Integrative Levels Classification (ILC) is a comprehensive “freely faceted” knowledge organization system not previously expressed as SKOS (Simple Knowledge Organization System). This paper reports and reflects on work converting the ILC to SKOS representation.Design/methodology/approachThe design of the ILC representation and the various steps in the conversion to SKOS are described and located within the context of previous work considering the representation of complex classification schemes in SKOS. Various issues and trade-offs emerging from the conversion are discussed. The conversion implementation employed the STELETO transformation tool.FindingsThe ILC conversion captures some of the ILC facet structure by a limited extension beyond the SKOS standard. SPARQL examples illustrate how this extension could be used to create faceted, compound descriptors when indexing or cataloguing. Basic query patterns are provided that might underpin search systems. Possible routes for reducing complexity are discussed.Originality/valueComplex classification schemes, such as the ILC, have features which are not straight forward to represent in SKOS and which extend beyond the functionality of the SKOS standard. The ILC's facet indicators are modelled as rdf:Property sub-hierarchies that accompany the SKOS RDF statements. The ILC's top-level fundamental facet relationships are modelled by extensions of the associative relationship – specialised sub-properties of skos:related. An approach for representing faceted compound descriptions in ILC and other faceted classification schemes is proposed.


Author(s):  
Zongmin Ma ◽  
Li Yan

The resource description framework (RDF) is a model for representing information resources on the web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the web, a huge amount of RDF data is being proliferated and becoming available. So, RDF data management is of increasing importance and has attracted attention in the database community as well as the Semantic Web community. Currently, much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (not only SQL) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


Sign in / Sign up

Export Citation Format

Share Document