Concept Parsing Algorithms (CPA)

Concept Parsing Algorithms (CPA) for Textual Analysis and Discovery - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2176-1.ch003 ◽

2017 ◽

pp. 39-48

Keyword(s):

Text Analysis ◽

Full Text ◽

Building Blocks ◽

Controlled Vocabulary ◽

Controlled Vocabularies ◽

Knowledge Domains ◽

Analysis Tools

This chapter describe Concept Parsing Algorithms (CPA), a novel methodology of using text analysis tools for discovery of ‘building blocks' of concepts, with semantic searches of the full text of potentially relevant documents in relevant knowledge domains, for lexical labels of concepts in controlled vocabularies. The meaning of lexical label of a super-ordinate concept C' in a sublanguage with controlled vocabulary is encoded in a set that contains three sets of building blocks: Ci (set of co-occurring sub-ordinate concepts); Rj (set of relations); and Lk (set of linguistic elements/descriptors).

Download Full-text

An exploratory full-text analysis of Science Careers in a changing academic job market

Scientometrics ◽

10.1007/s11192-021-03905-2 ◽

2021 ◽

Author(s):

Clara Boothby ◽

Staša Milojević

Keyword(s):

Text Analysis ◽

Full Text ◽

Science Careers ◽

Job Market ◽

Academic Job Market

Download Full-text

The use of thesauri in online retrieval

Journal of Information Science ◽

10.1177/016555158400800204 ◽

1984 ◽

Vol 8 (2) ◽

pp. 63-66 ◽

Cited By ~ 8

Author(s):

C.P.R. Dubois

Keyword(s):

Information Retrieval ◽

Data Base ◽

Case Studies ◽

Controlled Vocabulary ◽

Free Text ◽

Data Bases ◽

Online Data ◽

Controlled Vocabularies ◽

Semantic Maps ◽

Actual Use

The controlled vocabulary versus the free text approach to information retrieval is reviewed from the mid 1960s to the early 1980s. The dominance of the free text approach following the Cranfield tests is increasingly coming into question as a result of tests on existing online data bases and case studies. This is supported by two case studies on the Coffeeline data base. The differences and values of the two approaches are explored considering thesauri as semantic maps. It is suggested that the most appropriate evaluatory technique for indexing languages is to study the actual use made of various techniques in a wide variety of search environments. Such research is becoming more urgent. Economic and other reasons for the scarcity of online thesauri are reviewed and suggestions are made for methods to secure revenue from thesaurus display facilities. Finally, the promising outlook for renewed develop ment of controlled vocabularies with more effective online display techniques is mentioned, although such development must be based on firm research of user behaviour and needs.

Download Full-text

Text Analysis of Project Completion Reports

10.18235/0003611 ◽

2021 ◽

Author(s):

César E. Montiel Olea ◽

Leonardo R. Corral

Keyword(s):

Text Analysis ◽

Text Documents ◽

Project Completion ◽

Analysis Tools ◽

Development Effectiveness ◽

Different Types ◽

Potential Applications ◽

Project Cycle ◽

Main Instrument ◽

Unique Dataset

Project Completion Reports (PCRs) are the main instrument through which different multilateral organizations measure the success of a project once it closes. PCRs are important for development effectiveness as they serve to understand achievements, failures, and challenges within the project cycle they can feed back into the design and execution of new projects. The aim of this paper is to introduce text analysis tools for the exploration of PCR documents. We describe and apply different text analysis tools to explore the content of a sample of PCRs. We seek to illustrate a way in which PCRs can be summarized and analyzed using innovative tools applied to a unique dataset. We believe that the methods presented in this investigation have numerous potential applications to different types of text documents routinely prepared within the Inter-American Development Bank (IDB).

Download Full-text

Merging Controlled Vocabularies for More Efficient Subject-Based IR Systems

Dynamic Models for Knowledge-Driven Organizations ◽

10.4018/978-1-4666-2485-6.ch015 ◽

2013 ◽

pp. 254-267

Author(s):

Ioannis Papadakis ◽

Konstantinos Kyprianos

Keyword(s):

Information Retrieval ◽

Digital Library ◽

Controlled Vocabulary ◽

Knowledge Organization ◽

Proof Of Concept ◽

Controlled Vocabularies ◽

Common Structure ◽

Simple Knowledge Organization System ◽

Knowledge Organization System ◽

The World

One of the most important tasks of a librarian is the assignment of appropriate subject(s) to a resource within a library’s collection. The subjects usually belong to a controlled vocabulary that is specifically designed for such a task. The most widely adopted controlled vocabulary across libraries around the world is the Library of Congress Subject Headings (LCSH). However, there seems to be a shifting from traditional LCSH to modern thesauri. In this paper, a methodology is proposed, capable of incorporating thesauri into existing LCSH-based Information Retrieval–IR systems. In order to achieve this, a mapping methodology is proposed capable of providing a common structure consisting of terms belonging to LCSH and/or a thesaurus. The structure is modeled as a Simple Knowledge Organization System (SKOS) ontology, which can be employed by appropriate subject-based IR systems. As a proof of concept, the proposed methodology is applied to the DSpace-based University of Piraeus digital library.

Download Full-text

Ontologies and Controlled Vocabulary

Semantic Technologies for Business and Information Systems Engineering ◽

10.4018/978-1-60960-126-3.ch001 ◽

2011 ◽

pp. 1-15

Author(s):

Daniela Lucas da Silva ◽

Renato Rocha Souza ◽

Maurício Barcellos Almeida

Keyword(s):

Software Engineering ◽

Computer Science ◽

Empirical Research ◽

Analytical Study ◽

Information Science ◽

Controlled Vocabulary ◽

International Standards ◽

Controlled Vocabularies

This chapter presents an analytical study about methodology and methods to build ontologies and controlled vocabularies, compiled by the analysis of a literature about methodologies for building ontologies and controlled vocabularies and the international standards for software engineering. Through theoretical and empirical research it was possible to build a comparative overview which can help as a support in the defining of methodological patterns for building ontologies, using theories from the computer science and information science.

Download Full-text

Nationalsozialismus in deutschen, österreichischen und englischen Lehrwerken der Sekundarstufe I (1980–2017)

Journal of Educational Media Memory and Society ◽

10.3167/jemms.2018.100205 ◽

2018 ◽

Vol 10 (2) ◽

pp. 84-108

Author(s):

Philipp Mittnik

Keyword(s):

Secondary School ◽

Full Text ◽

National Socialism ◽

Building Blocks ◽

Historical Consciousness ◽

Textbook Analysis ◽

Sekundarstufe I ◽

English Textbooks ◽

Selection Of

*Full text is in GermanNational Socialism in German, Austrian and English Secondary School Textbooks (1980–2017)English AbstractThis article analyzes a selection of German, Austrian and English textbooks dealing with National Socialism. By adopting Waltraud Schreiber’s methodology of categorial textbook analysis, the article presents the surface structure and building blocks as a basis for further analysis. The occurrence (or absence) of the pedagogical historical principle of multiperspectivity is examined with reference to the example of sections concerning “Youth in National Socialism.” Subsequently, the study explores the role of multiperspectivity in the construction of critical historical consciousness. This is followed by a deconstruction of the image of women presented in the textbooks, with particular emphasis on simplifications.German AbstractDie Analyse von Schulbüchern aus Deutschland, Österreich und England zum Themenbereich Nationalsozialismus stehen im Zentrum dieses Artikels. Als Methodologie wird die kategoriale Schulbuchanalyse nach Waltraud Schreiber angewandt. Die Erarbeitung der Oberflächenstruktur und der Bausteine werden als Grundlage für weitere Analyseschritte präsentiert. Das (Nicht-) Vorkommen des bedeutenden geschichtsdidaktischen Prinzips der Multiperspektivität wird am Beispiel des Abschnittes „Jugend im Nationalsozialismus“ beschrieben. Multiperspektivität und deren Bedeutung für den Aufbau eines kritischen Geschichtsbewusstseins wird in einem weiteren Schritt hervorgehoben. Abschließend wird das in den Schulbüchern präsentierte Frauenbild dekonstruiert und auf die problematischen Vereinfachungen hingewiesen.

Download Full-text

Internet of Samples: Progress report

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75797 ◽

2021 ◽

Vol 5 ◽

Author(s):

Dave Vieglais ◽

Stephen Richard ◽

Hong Cui ◽

Neil Davies ◽

John Deck ◽

...

Keyword(s):

Efficient Solution ◽

The United States ◽

Controlled Vocabulary ◽

The Internet ◽

Controlled Vocabularies ◽

The Core ◽

Research Programs ◽

Depth Analysis ◽

Interdisciplinary Collaborations ◽

Material Sample

Material samples form an important portion of the data infrastructure for many disciplines. Here, a material sample is a physical object, representative of some physical thing, on which observations can be made. Material samples may be collected for one project initially, but can also be valuable resources for other studies in other disciplines. Collecting and curating material samples can be a costly process. Integrating institutionally managed sample collections, along with those sitting in individual offices or labs, is necessary to faciliate large-scale evidence-based scientific research. Many have recognized the problems and are working to make data related to material samples FAIR: findable, accessible, interoperable, and reusable. The Internet of Samples (i.e., iSamples) is one of these projects. iSamples was funded by the United States National Science Foundation in 2020 with the following aims: enable previously impossible connections between diverse and disparate sample-based observations; support existing research programs and facilities that collect and manage diverse sample types; facilitate new interdisciplinary collaborations; and provide an efficient solution for FAIR samples, avoiding duplicate efforts in different domains (Davies et al. 2021) enable previously impossible connections between diverse and disparate sample-based observations; support existing research programs and facilities that collect and manage diverse sample types; facilitate new interdisciplinary collaborations; and provide an efficient solution for FAIR samples, avoiding duplicate efforts in different domains (Davies et al. 2021) The initial sample collections that will make up the internet of samples include those from the System for Earth Sample Registration (SESAR), Open Context, the Genomic Observatories Meta-Database (GEOME), and Smithsonian Institution Museum of Natural History (NMNH), representing the disciplines of geoscience, archaeology/anthropology, and biology. To achieve these aims, the proposed iSamples infrastructure (Fig. 1) has two key components: iSamples in a Box (iSB) and iSamples Central (iSC). The iSC component will be a permanent Internet service that preserves, indexes, and provides access to sample metadata aggregated from iSBs. It will also ensure that persistent identifiers and sample descriptions assigned and used by individual iSBs are synchronized with the records in iSC and with identifier authorities like International Geo Sample Number (IGSN) or Archival Resource Key (ARK). The iSBs create and maintain identifiers and metadata for their respective collection of samples. While providing access to the samples held locally, an iSB also allows iSC to harvest its metadata records. The metadata modeling strategy adopted by the iSamples project is a metadata profile-based approach, where core metadata fields that are applicable to all samples, form the core metadata schema for iSamples. Each individual participating collectionis free to include additional metadata in their records, which will also be harvested by iSC and are discoverable through the iSC user interface or APIs (Application Programming Interfaces), just like the core. In-depth analysis of metadata profiles used by participating collections, including Darwin Core, has resulted in an iSamples core schema currently being tested and refined through use. See the current version of the iSamples core schema. A number of properties require a controlled vocabulary. Controlled vocabularies used by existing records are kept, while new vocabularies are also being developed to support high-level grouping with consistent semantics across collection types. Examples include vocabularies for Context Category, Material Category, and Specimen Type (Table 1). These vocabularies were also developed in a bottom-up manner, based on the terms used in the existing collections. For each vocabulary, a decision tree graph was created to illustrate relations among the terms, and a card sorting exercise was conducted within the project team to collect feedback. Domain experts are invited to take part in this exercise here, here, and here. These terms will be used as upper-level terms to the existing category terms used in the participating collections and hence create connections among individual participating collections. iSample project members are also active in the TDWG Material Sample Task Group and the global consultation on Digital Extended Specimens. Many members of the iSamples project also lead or participate in a sister research coordination network (RCN), Sampling Nature. The goal of this RCN is to develop and refine metadata standards and controlled vocabularies for the iSamples and other projects focusing on material samples. We cordially invite you to participate in the Sampling Nature RCN and help shape the future standards for material samples. Contact Sarah Ramdeen ([email protected]) to engage with the RCN.

Download Full-text

Mitigating Bias in Metadata

Information Technology and Libraries ◽

10.6017/ital.v40i3.13053 ◽

2021 ◽

Vol 40 (3) ◽

Author(s):

Juliet Hardesty ◽

Allison Nolan

Keyword(s):

Cultural Heritage ◽

Linked Data ◽

Controlled Vocabulary ◽

Use Case ◽

Library Of Congress ◽

Controlled Vocabularies ◽

Marginalized Groups ◽

Subject Headings ◽

Standardize Terminology ◽

Marginalized Community

Controlled vocabularies used in cultural heritage organizations (galleries, libraries, archives, and museums) are a helpful way to standardize terminology but can also result in misrepresentation or exclusion of systemically marginalized groups. Library of Congress Subject Headings (LCSH) is one example of a widely used yet problematic controlled vocabulary for subject headings. In some cases, systemically marginalized groups are creating controlled vocabularies that better reflect their terminology. When a widely used vocabulary like LCSH and a controlled vocabulary from a marginalized community are both available as linked data, it is possible to incorporate the terminology from the marginalized community as an overlay or replacement for outdated or absent terms from more widely used vocabularies. This paper provides a use case for examining how the Homosaurus, an LGBTQ+ linked data controlled vocabulary, can provide an augmented and updated search experience to mitigate bias within a system that only uses LCSH for subject headings.

Download Full-text

ANALISIS WACANA KRITIS KORUPSI MELALUI LITERASI MEDIA

Edukasi Lingua Sastra ◽

10.47637/elsa.v18i1.233 ◽

2020 ◽

Vol 18 (1) ◽

pp. 50-56

Author(s):

Rahmat Prayogi ◽

Bambang Riadi ◽

Rian Andri Prasetya

Keyword(s):

Mass Media ◽

Public Space ◽

Text Analysis ◽

Analysis Tools ◽

The Law ◽

Media Texts

Mass media is part of public space which cannot be seen as a mere passive hegemony. The discourse constructed by Tempo magazine reporters through Indonesiana is not completely neutral or naturally reporting news about corruption, and violations of the law. However, it has been influenced by the ideas or viewpoints of text writers (journalists) in responding to the events constructed in their reporting. This paper aims to show how the Fairclough text analysis tools work in dissecting dubious media texts. Media masa merupakan bagian dari ruang publik yang tidak dapat dilihat sebagai alat hegemoni yang bersifat pasif semata. Wacana yang dikonstruksikan oleh wartawan majalah Tempo melalui Indonesiana tidak sepenuhnya netral atau alami melaporkan berita tentang korupsi, dan pelangggaran-pelanggaran hukum. Akan tetapi, telah dipengaruh oleh ide-ide atau sudut pandang penulis teks (wartawan) dalam menyingkapi peristiwa yang dikonstruksikan di dalam pemberitaannya. Tulisan ini memiliki tujuan untuk menunjukkan bagaimana alat-alat analisis teks model Fairclough bekerja dalam membedah teks-teks media yang dianggap meragukan.

Download Full-text

Just What do Scholars do? A Qualitative Exploration of Text Analysis Tools for Information Visualization

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais639 ◽

2013 ◽

Author(s):

Lisa M. Given ◽

Ali Grotkowski

Keyword(s):

Text Analysis ◽

Qualitative Interviews ◽

Tool Development ◽

Information System Design ◽

Analysis Tools ◽

Online Tools ◽

Sciences Humaines ◽

Systèmes D’Information ◽

Qualitative Exploration ◽

Future Work

This paper examines the use of text analysis tools by humanities scholars. The results of approximately 20 qualitative interviews with academics at different stages of career show both usability issues with current online tools and recommendations for future work in tool development. Implications for information system design are explored.Cette communication porte sur l’utilisation d’outils d’analyse de texte par les chercheurs en sciences humaines. Les résultats d’environ 20 entrevues qualitatives auprès d’universitaires à différents jalons de leur carrière relèvent des enjeux de convivialité d’utilisation pour les outils en ligne actuels et permettent de recommander des améliorations pour les outils. On y explore également les implications sur le design de systèmes d’information.

Download Full-text