Examining Studies Comparing Tags and Controlled Vocabularies

Author(s):  
Margaret E. I. Kipp ◽  
Jihee Beak

Tags have been compared to controlled vocabulary terms and have been suggested as replacements or enhancements in indexing. This paper explores tagging and controlled vocabulary studies in the context of studies examining title and author keywords or user search terms to determine the methods used and impact of these studies.Les étiquettes ont été comparées avec les vedettes matières et ont été suggérées comme remplacements ou comme addition à l'indexation. Cet article examine la recherche sur l'étiquetage et les vedettes matières dans le cadre d'études portant sur les mots-clés de titre ou d'auteur ou les termes des requêtes d'usager pour déterminer les méthodes utilisées et l'impact de ces études.

Author(s):  
Margaret E. I. Kipp

Tags have been compared to controlled vocabulary terms and have been suggested as replacements or enhancements in indexing. This paper explores tagging and controlled vocabulary studies in the context of studies examining title and author keywords or user search terms and uses the results to analyse 236000 PubMed records tagged in CiteULike.Les étiquettes ont été comparées avec les vedettes-matières et ont été suggérés comme remplacement ou comme addition à l'indexation conventionnelle. Cet article examine la recherche sur l'étiquetage et les vedette-matières en comparaison avec des études examinant les mots-clés de titre et d'auteur ou les mots-clés des requêtes d'usager et utilisera ses résultats pour analyser 236 000 notices catalographiques de PubMed étiquetées sur CiteULike. 


2020 ◽  
Vol 125 (3) ◽  
pp. 2955-2969 ◽  
Author(s):  
Robin Haunschild ◽  
Werner Marx

AbstractBibliometric information retrieval in databases can employ different strategies. Commonly, queries are performed by searching in title, abstract and/or author keywords (author vocabulary). More advanced queries employ database keywords to search in a controlled vocabulary. Queries based on search terms can be augmented with their citing papers if a research field cannot be curtailed by the search query alone. Here, we present another strategy to discover the most important papers of a research field. A marker paper is used to reveal the most important works for the relevant community. All papers co-cited with the marker paper are analyzed using reference publication year spectroscopy (RPYS). For demonstration of the marker paper approach, density functional theory is used as a research field. Comparisons between a prior RPYS on a publication set compiled using a keyword-based search in a controlled vocabulary and three different co-citation RPYS analyses show very similar results. Similarities and differences are discussed.


NASKO ◽  
2011 ◽  
Vol 3 (1) ◽  
pp. 23 ◽  
Author(s):  
Margaret E. I. Kipp

Social tagging has become increasingly common and is now often found in library catalogues or at least on library websites and blogs. Tags have been compared to controlled vocabulary indexing terms and have been suggested as replacements or enhancements for traditional indexing. This paper explored tagging and controlled vocabulary studies in the context of earlier studies examining title keywords, author keywords and user indexing and applied these results to a set of bibliographic records from PubMed which are also tagged on CiteULike. Preliminary results show that author and title keywords and tags are more similar to each other than to subject headings, though some user or author supplied terms do match subject headings exactly. Author keywords tend to be more specific than the other terms and could serve an additional distinguishing function when browsing.


1984 ◽  
Vol 8 (2) ◽  
pp. 63-66 ◽  
Author(s):  
C.P.R. Dubois

The controlled vocabulary versus the free text approach to information retrieval is reviewed from the mid 1960s to the early 1980s. The dominance of the free text approach following the Cranfield tests is increasingly coming into question as a result of tests on existing online data bases and case studies. This is supported by two case studies on the Coffeeline data base. The differences and values of the two approaches are explored considering thesauri as semantic maps. It is suggested that the most appropriate evaluatory technique for indexing languages is to study the actual use made of various techniques in a wide variety of search environments. Such research is becoming more urgent. Economic and other reasons for the scarcity of online thesauri are reviewed and suggestions are made for methods to secure revenue from thesaurus display facilities. Finally, the promising outlook for renewed develop ment of controlled vocabularies with more effective online display techniques is mentioned, although such development must be based on firm research of user behaviour and needs.


Author(s):  
Ioannis Papadakis ◽  
Konstantinos Kyprianos

One of the most important tasks of a librarian is the assignment of appropriate subject(s) to a resource within a library’s collection. The subjects usually belong to a controlled vocabulary that is specifically designed for such a task. The most widely adopted controlled vocabulary across libraries around the world is the Library of Congress Subject Headings (LCSH). However, there seems to be a shifting from traditional LCSH to modern thesauri. In this paper, a methodology is proposed, capable of incorporating thesauri into existing LCSH-based Information Retrieval–IR systems. In order to achieve this, a mapping methodology is proposed capable of providing a common structure consisting of terms belonging to LCSH and/or a thesaurus. The structure is modeled as a Simple Knowledge Organization System (SKOS) ontology, which can be employed by appropriate subject-based IR systems. As a proof of concept, the proposed methodology is applied to the DSpace-based University of Piraeus digital library.


Author(s):  
Daniela Lucas da Silva ◽  
Renato Rocha Souza ◽  
Maurício Barcellos Almeida

This chapter presents an analytical study about methodology and methods to build ontologies and controlled vocabularies, compiled by the analysis of a literature about methodologies for building ontologies and controlled vocabularies and the international standards for software engineering. Through theoretical and empirical research it was possible to build a comparative overview which can help as a support in the defining of methodological patterns for building ontologies, using theories from the computer science and information science.


Author(s):  
Dave Vieglais ◽  
Stephen Richard ◽  
Hong Cui ◽  
Neil Davies ◽  
John Deck ◽  
...  

Material samples form an important portion of the data infrastructure for many disciplines. Here, a material sample is a physical object, representative of some physical thing, on which observations can be made. Material samples may be collected for one project initially, but can also be valuable resources for other studies in other disciplines. Collecting and curating material samples can be a costly process. Integrating institutionally managed sample collections, along with those sitting in individual offices or labs, is necessary to faciliate large-scale evidence-based scientific research. Many have recognized the problems and are working to make data related to material samples FAIR: findable, accessible, interoperable, and reusable. The Internet of Samples (i.e., iSamples) is one of these projects. iSamples was funded by the United States National Science Foundation in 2020 with the following aims: enable previously impossible connections between diverse and disparate sample-based observations; support existing research programs and facilities that collect and manage diverse sample types; facilitate new interdisciplinary collaborations; and provide an efficient solution for FAIR samples, avoiding duplicate efforts in different domains (Davies et al. 2021) enable previously impossible connections between diverse and disparate sample-based observations; support existing research programs and facilities that collect and manage diverse sample types; facilitate new interdisciplinary collaborations; and provide an efficient solution for FAIR samples, avoiding duplicate efforts in different domains (Davies et al. 2021) The initial sample collections that will make up the internet of samples include those from the System for Earth Sample Registration (SESAR), Open Context, the Genomic Observatories Meta-Database (GEOME), and Smithsonian Institution Museum of Natural History (NMNH), representing the disciplines of geoscience, archaeology/anthropology, and biology. To achieve these aims, the proposed iSamples infrastructure (Fig. 1) has two key components: iSamples in a Box (iSB) and iSamples Central (iSC). The iSC component will be a permanent Internet service that preserves, indexes, and provides access to sample metadata aggregated from iSBs. It will also ensure that persistent identifiers and sample descriptions assigned and used by individual iSBs are synchronized with the records in iSC and with identifier authorities like International Geo Sample Number (IGSN) or Archival Resource Key (ARK). The iSBs create and maintain identifiers and metadata for their respective collection of samples. While providing access to the samples held locally, an iSB also allows iSC to harvest its metadata records. The metadata modeling strategy adopted by the iSamples project is a metadata profile-based approach, where core metadata fields that are applicable to all samples, form the core metadata schema for iSamples. Each individual participating collectionis free to include additional metadata in their records, which will also be harvested by iSC and are discoverable through the iSC user interface or APIs (Application Programming Interfaces), just like the core. In-depth analysis of metadata profiles used by participating collections, including Darwin Core, has resulted in an iSamples core schema currently being tested and refined through use. See the current version of the iSamples core schema. A number of properties require a controlled vocabulary. Controlled vocabularies used by existing records are kept, while new vocabularies are also being developed to support high-level grouping with consistent semantics across collection types. Examples include vocabularies for Context Category, Material Category, and Specimen Type (Table 1). These vocabularies were also developed in a bottom-up manner, based on the terms used in the existing collections. For each vocabulary, a decision tree graph was created to illustrate relations among the terms, and a card sorting exercise was conducted within the project team to collect feedback. Domain experts are invited to take part in this exercise here, here, and here. These terms will be used as upper-level terms to the existing category terms used in the participating collections and hence create connections among individual participating collections. iSample project members are also active in the TDWG Material Sample Task Group and the global consultation on Digital Extended Specimens. Many members of the iSamples project also lead or participate in a sister research coordination network (RCN), Sampling Nature. The goal of this RCN is to develop and refine metadata standards and controlled vocabularies for the iSamples and other projects focusing on material samples. We cordially invite you to participate in the Sampling Nature RCN and help shape the future standards for material samples. Contact Sarah Ramdeen ([email protected]) to engage with the RCN.


2021 ◽  
Vol 40 (3) ◽  
Author(s):  
Juliet Hardesty ◽  
Allison Nolan

Controlled vocabularies used in cultural heritage organizations (galleries, libraries, archives, and museums) are a helpful way to standardize terminology but can also result in misrepresentation or exclusion of systemically marginalized groups. Library of Congress Subject Headings (LCSH) is one example of a widely used yet problematic controlled vocabulary for subject headings. In some cases, systemically marginalized groups are creating controlled vocabularies that better reflect their terminology. When a widely used vocabulary like LCSH and a controlled vocabulary from a marginalized community are both available as linked data, it is possible to incorporate the terminology from the marginalized community as an overlay or replacement for outdated or absent terms from more widely used vocabularies. This paper provides a use case for examining how the Homosaurus, an LGBTQ+ linked data controlled vocabulary, can provide an augmented and updated search experience to mitigate bias within a system that only uses LCSH for subject headings.


2011 ◽  
Vol 7 (3) ◽  
pp. 74-90 ◽  
Author(s):  
Ioannis Papadakis ◽  
Konstantinos Kyprianos

One of the most important tasks of a librarian is the assignment of appropriate subject(s) to a resource within a library’s collection. The subjects usually belong to a controlled vocabulary that is specifically designed for such a task. The most widely adopted controlled vocabulary across libraries around the world is the Library of Congress Subject Headings (LCSH). However, there seems to be a shifting from traditional LCSH to modern thesauri. In this paper, a methodology is proposed, capable of incorporating thesauri into existing LCSH-based Information Retrieval–IR systems. In order to achieve this, a mapping methodology is proposed capable of providing a common structure consisting of terms belonging to LCSH and/or a thesaurus. The structure is modeled as a Simple Knowledge Organization System (SKOS) ontology, which can be employed by appropriate subject-based IR systems. As a proof of concept, the proposed methodology is applied to the DSpace-based University of Piraeus digital library.


This chapter describe Concept Parsing Algorithms (CPA), a novel methodology of using text analysis tools for discovery of ‘building blocks' of concepts, with semantic searches of the full text of potentially relevant documents in relevant knowledge domains, for lexical labels of concepts in controlled vocabularies. The meaning of lexical label of a super-ordinate concept C' in a sublanguage with controlled vocabulary is encoded in a set that contains three sets of building blocks: Ci (set of co-occurring sub-ordinate concepts); Rj (set of relations); and Lk (set of linguistic elements/descriptors).


Sign in / Sign up

Export Citation Format

Share Document