Thesaurus and subject heading lists as Linked Data

Abstract Most libraries put a lot of effort into developing subject headings or thesauri, which are used to index and retrieve information. Nevertheless, in the library field, controlled vocabularies are associated to authority records as authority files. In order to become findable by search engines, these authority files should be modelled on semantic vocabularies. This research proposes an authority-record conversion process for publishing thesauri and subject headings as linked data, by using the Simple Knowledge Organization Systems data model. To this purpose, we undertook a bibliographic and documentary research on the World Wide Web Consortium recommendation guidelines, which were used to produce a set of procedures and technologies to support the conversion proposal. This research provides evidences that controlled vocabularies are an important resource for improving information retrieval on the web. The proposed conversion process works as a quick guide for controlled vocabulary integration and reuse among users and systems on the linked data environment. Although the proposal was originally intended for a library setting, it can be applied and tested in another type of institution, such as documentation centres, museums, or cultural heritage archives. It can also be used in other linked open data projects.

Download Full-text

Mitigating Bias in Metadata

Information Technology and Libraries ◽

10.6017/ital.v40i3.13053 ◽

2021 ◽

Vol 40 (3) ◽

Author(s):

Juliet Hardesty ◽

Allison Nolan

Keyword(s):

Cultural Heritage ◽

Linked Data ◽

Controlled Vocabulary ◽

Use Case ◽

Library Of Congress ◽

Controlled Vocabularies ◽

Marginalized Groups ◽

Subject Headings ◽

Standardize Terminology ◽

Marginalized Community

Controlled vocabularies used in cultural heritage organizations (galleries, libraries, archives, and museums) are a helpful way to standardize terminology but can also result in misrepresentation or exclusion of systemically marginalized groups. Library of Congress Subject Headings (LCSH) is one example of a widely used yet problematic controlled vocabulary for subject headings. In some cases, systemically marginalized groups are creating controlled vocabularies that better reflect their terminology. When a widely used vocabulary like LCSH and a controlled vocabulary from a marginalized community are both available as linked data, it is possible to incorporate the terminology from the marginalized community as an overlay or replacement for outdated or absent terms from more widely used vocabularies. This paper provides a use case for examining how the Homosaurus, an LGBTQ+ linked data controlled vocabulary, can provide an augmented and updated search experience to mitigate bias within a system that only uses LCSH for subject headings.

Download Full-text

27 pawns ready for action

Library Hi Tech ◽

10.1108/lht-11-2016-0123 ◽

2017 ◽

Vol 35 (1) ◽

pp. 99-119 ◽

Cited By ~ 1

Author(s):

Gonzalo Mochón ◽

Eva M. Méndez ◽

Gema Bueno de la Fuente

Keyword(s):

Data Model ◽

Evaluation System ◽

Evaluation Criteria ◽

Open Data ◽

Controlled Vocabulary ◽

Digital Information ◽

Management Tools ◽

Content Type ◽

Knowledge Organization System ◽

The Right

Purpose The purpose of this paper is to propose a methodology for assessing thesauri and other controlled vocabularies management tools that can represent content using the Simple Knowledge Organization System (SKOS) data model, and their use in a Linked Open Data (LOD) paradigm. It effectively analyses selected set of tools in order to prove the validity of the method. Design/methodology/approach A set of 27 criteria grouped in five evaluation indicators is proposed and applied to ten vocabulary management applications which are compliant with the SKOS data model. Previous studies of controlled vocabulary management software are gathered and analyzed, to compare the evaluation parameters used and the results obtained for each tool. Findings The results indicate that the tool that obtains the highest score in every indicator is Poolparty. The second and third tools are, respectively, TemaTres and Intelligent Theme Manager, but scoring lower in most of the evaluation items. The use of a broad set of criteria to evaluate vocabularies management tools gives satisfactory results. The set of five indicators and 27 criteria proposed here represents a useful evaluation system in the selection of current and future tools to manage vocabularies. Research limitations/implications The paper only assesses the ten most important/well know software tools applied for thesaurus and vocabulary management until October 2016. However, the evaluation criteria could be applied to new software that could appear in the future to create/manage SKOS vocabularies in compliance with LOD standards. Originality/value The originality of this paper relies on the proposed indicators and criteria to evaluate vocabulary management tools. Those criteria and indicators can be valuable also for future software that might appear. The indicators are also applied to the most exhaustive and qualified list of this kind of tools. The paper will help designers, information architects, metadata librarians, and other staff involved in the design of digital information systems, to choose the right tool to manage their vocabularies in a LOD/vocabulary scenario.

Download Full-text

Controlled vocabularies and tags: An analysis of research methods

NASKO ◽

10.7152/nasko.v3i1.12787 ◽

2011 ◽

Vol 3 (1) ◽

pp. 23 ◽

Cited By ~ 2

Author(s):

Margaret E. I. Kipp

Keyword(s):

Research Methods ◽

Controlled Vocabulary ◽

The Other ◽

Social Tagging ◽

Controlled Vocabularies ◽

Preliminary Results ◽

Bibliographic Records ◽

Subject Headings ◽

Author Keywords ◽

Library Websites

Social tagging has become increasingly common and is now often found in library catalogues or at least on library websites and blogs. Tags have been compared to controlled vocabulary indexing terms and have been suggested as replacements or enhancements for traditional indexing. This paper explored tagging and controlled vocabulary studies in the context of earlier studies examining title keywords, author keywords and user indexing and applied these results to a set of bibliographic records from PubMed which are also tagged on CiteULike. Preliminary results show that author and title keywords and tags are more similar to each other than to subject headings, though some user or author supplied terms do match subject headings exactly. Author keywords tend to be more specific than the other terms and could serve an additional distinguishing function when browsing.

Download Full-text

Information Sharing Pipeline

10.31219/osf.io/hbwf8 ◽

2018 ◽

Author(s):

Violeta Ilik ◽

Lukas Koster

Keyword(s):

Information Sharing ◽

Real Time ◽

Linked Data ◽

Information Channel ◽

Controlled Vocabularies ◽

Subject Headings ◽

Shared Information ◽

Real Time Information ◽

Time Information ◽

Exchange Data

In this paper we discuss a proposal for creating an information sharing pipeline/real-time information channel, where all stakeholders would be able to engage in exchange/verification of information about entities in real time. The entities in question include personal and organizational names as well as subject headings from different controlled vocabularies. The proposed solution is a shared information pipeline where all stakeholders/agents would be able to share and exchange data about entities in real time. Three W3C recommended protocols are considered as potential solutions: the Linked Data Notifications protocol, the ActivityPub protocol, and the WebSub protocol. We compare and explore the three protocols for the purpose of identifying the best way to create an information sharing pipeline that would provide access to most up to date information to all stakeholders.

Download Full-text

Europeana no Linked Open Data: conceitos de Web Semântica na dimensão aplicada das humanidades digitais

Pesquisa Brasileira em Ciência da Informação e Biblioteconomia ◽

10.22478/ufpb.1981-0695.2017v12n2.36529 ◽

2017 ◽

Vol 12 (2) ◽

Author(s):

Caio Saraiva Coneglian ◽

José Eduardo Santarem Segundo

Keyword(s):

Linked Data ◽

Open Data ◽

Linked Open Data

O surgimento de novas tecnologias, tem introduzido meios para a divulgação e a disponibilização das informações mais eficientemente. Uma iniciativa, chamada de Europeana, vem promovendo esta adaptação dos objetos informacionais dentro da Web, e mais especificamente no Linked Data. Desta forma, o presente estudo tem como objetivo apresentar uma discussão acerca da relação entre as Humanidades Digitais e o Linked Open Data, na figura da Europeana. Para tal, utilizamos uma metodologia exploratória e que busca explorar as questões relacionadas ao modelo de dados da Europeana, EDM, por meio do SPARQL. Como resultados, compreendemos as características do EDM, pela utilização do SPARQL. Identificamos, ainda, a importância que o conceito de Humanidades Digitais possui dentro do contexto da Europeana.Palavras-chave: Web semântica. Linked open data. Humanidades digitais. Europeana. EDM.Link: https://periodicos.ufsc.br/index.php/eb/article/view/1518-2924.2017v22n48p88/33031

Download Full-text

An Interoperable BIM-Based Toolkit for Efficient Renovation in Buildings

Buildings ◽

10.3390/buildings11070271 ◽

2021 ◽

Vol 11 (7) ◽

pp. 271

Author(s):

Bruno Daniotti ◽

Cecilia Maria Bolognesi ◽

Sonia Lupica Spagnolo ◽

Alberto Pavan ◽

Martina Signorini ◽

...

Keyword(s):

Energy Consumption ◽

Management System ◽

Linked Data ◽

Research Question ◽

Digital Tools ◽

European Research ◽

Construction Sector ◽

Research Project ◽

One Step ◽

Data Environment

Since the buildings and construction sector is one of the main areas responsible for energy consumption and emissions, focusing on their refurbishment and promoting actions in this direction will be helpful to achieve an EU Agenda objective of making Europe climate-neutral by 2050. One step towards the renovation action is the exploitation of digital tools into a BIM framework. The scope of the research contained in this paper is to improve the management of information throughout the different stages of the renovation process, allowing an interoperable exchange of data among the involved stakeholders; the development of an innovative BIM-based toolkit is the answer to the research question. The research and results obtained related with the development of an interoperable BIM-based toolkit for efficient renovation in buildings in the framework of the European research project BIM4EEB. Specifically, the developed BIM management system allows the exchange of the data among the different tools, using open interoperable formats (as IFC) and linked data, in a Common Data Environment, to be used by the different stakeholders. Additionally, the developed tools allow the stakeholders to manage different stages of the renovation process, facilitating efficiencies in terms of time reduction and improving the resulting quality. The validity of each tool with respect to existing practices is demonstrated here, and the strengths and weaknesses of the proposed tools are described in the workflow detailing issues such as interoperability, collaboration, integration of different solutions, and time consuming existing survey processes.

Download Full-text

The use of thesauri in online retrieval

Journal of Information Science ◽

10.1177/016555158400800204 ◽

1984 ◽

Vol 8 (2) ◽

pp. 63-66 ◽

Cited By ~ 8

Author(s):

C.P.R. Dubois

Keyword(s):

Information Retrieval ◽

Data Base ◽

Case Studies ◽

Controlled Vocabulary ◽

Free Text ◽

Data Bases ◽

Online Data ◽

Controlled Vocabularies ◽

Semantic Maps ◽

Actual Use

The controlled vocabulary versus the free text approach to information retrieval is reviewed from the mid 1960s to the early 1980s. The dominance of the free text approach following the Cranfield tests is increasingly coming into question as a result of tests on existing online data bases and case studies. This is supported by two case studies on the Coffeeline data base. The differences and values of the two approaches are explored considering thesauri as semantic maps. It is suggested that the most appropriate evaluatory technique for indexing languages is to study the actual use made of various techniques in a wide variety of search environments. Such research is becoming more urgent. Economic and other reasons for the scarcity of online thesauri are reviewed and suggestions are made for methods to secure revenue from thesaurus display facilities. Finally, the promising outlook for renewed develop ment of controlled vocabularies with more effective online display techniques is mentioned, although such development must be based on firm research of user behaviour and needs.

Download Full-text

The read–write Linked Data Web

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2012.0513 ◽

2013 ◽

Vol 371 (1987) ◽

pp. 20120513 ◽

Cited By ~ 15

Author(s):

Tim Berners-Lee ◽

Kieron O’Hara

Keyword(s):

Future Development ◽

Linked Data ◽

Open Data ◽

Linked Open Data ◽

The Future ◽

The Web

This paper discusses issues that will affect the future development of the Web, either increasing its power and utility, or alternatively suppressing its development. It argues for the importance of the continued development of the Linked Data Web, and describes the use of linked open data as an important component of that. Second, the paper defends the Web as a read–write medium, and goes on to consider how the read–write Linked Data Web could be achieved.

Download Full-text

A Rule-based Conversion of an EER Schema to Neo4j Schema Constraints

10.5753/sbbd.2021.17876 ◽

2021 ◽

Author(s):

Telmo Henrique Valverde da Silva ◽

Ronaldo dos Santos Mello

Keyword(s):

Data Model ◽

Design Methodology ◽

A Priori ◽

Database Management System ◽

Conversion Process ◽

Graph Database ◽

Graph Databases ◽

Rule Based ◽

Promising Solution ◽

Entity Relationship

Several application domains hold highly connected data, like supply chain and social network. In this context, NoSQL graph databases raise as a promising solution since relationships are first class citizens in their data model. Nevertheless, a traditional database design methodology initially defines a conceptual schema of the domain data, and the Enhanced Entity-Relationship (EER) model is a common tool. This paper presents a rule-based conversion process from an EER schema to Neo4j schema constraints, as Neo4j is the most representative NoSQL graph database management system with an expressive data model. Different from related work, our conversion process deals with all EER model concepts and generates rules for ensuring schema constraints through a set of Cypher instructions ready to run into a Neo4j database instance, as Neo4J is a schemaless system, and it is not possible to create a schema a priori. We also present an experimental evaluation that demonstrates the viability of our process in terms of performance.

Download Full-text

Awareness of Linked Open Data Among the Employees of Polish Libraries, Archives, and Museums

Zagadnienia Informacji Naukowej - Studia Informacyjne ◽

10.36702/zin.826 ◽

2022 ◽

Vol 59 (2(118)) ◽

pp. 7-25

Author(s):

Dorota Siwecka

Keyword(s):

Linked Data ◽

Open Data ◽

Linked Open Data ◽

Survey Method ◽

Doctorate Degree ◽

Research Libraries ◽

The People ◽

Central Statistical ◽

Central Statistical Office ◽

The Subject

Purpose/Thesis: This article presents the results of a survey conducted in January 2021 among employees of Polish libraries, museums, and archives, examining their awareness of open linked data technologies. The research had a pilot character and its results will be used to improve the questionnaire and to conduct research on a wider scale. Approach/Methods: The survey method was used in the study. Results and conclusions: On the basis of answers received, it can be concluded that open linked data is not yet very well-known among employees of Polish libraries, museums, and archives. Those most aware of technologies allowing for machine understanding of content shared on the Web are doctorate degree-holders employed in research libraries. Furthermore, awareness of the projects using LOD technologies does not correlate with awareness of these technological solutions. Research limitations: The number of respondents (415) constitutes 1% of all the people employed in libraries, archives, and museums in Poland (based on data provided by the Central Statistical Office of Poland). This is not a large number, but considering the variety among the respondents, the sample can be considered representative. Originality/Value: The awareness of Linked Open Data among employees of Polish libraries, archives, and museums has not been the subject of any study so far. In fact, this type of research has not been conducted in other countries either.

Download Full-text