text encoding initiative
Recently Published Documents


TOTAL DOCUMENTS

57
(FIVE YEARS 14)

H-INDEX

5
(FIVE YEARS 1)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Pascal Belouin ◽  
Shih-Pei Chen ◽  
Sean Wang

Designing a protocol for the interoperability of digital textual resources—or, more simply, a “IIIF for texts”—remains a challenge, as such a protocol must cater to their vastly heterogenous formats, structures, languages, text encodings and metadata. There have been many attempts to propose a standard for textual resource interoperability, from the ubiquitous Text Encoding Initiative (TEI) format to more recent proposals like the Distributed Text Services (DTS) protocol. In this paper, we introduce our proposal called SHINE, which prioritizes instead the ease for software developers to represent and exchange textual resources and their associated metadata. We do so by combining a hierarchical model of textual structure with a flexible metadata scheme in SHINE, and we continue to define and develop it based on user-centered and iterative design principles. Therefore, we argue that SHINE is a protocol for textual interoperability that successfully balances flexibility of resource representation, consistency across resource representation, and overall simplicity of implementation.RésuméConcevoir un protocole pour l’interopérabilité des ressources textuelles numériques – c’est-à-dire, un IIIF pour des textes – demeure un défi, puisqu’un tel protocole doit correspondre à leurs formats considérablement hétérogènes, ainsi qu’à leurs structures, langues, encodages textuels et métadonnées. Il existe déjà plusieurs tentatives de proposer des standards pour l’interopérabilité des ressources textuelles, tel que l’ubiquiste Text Encoding Initiative (TEI – Initiative d’encodage textuel) ou des propositions plus récentes comme le protocole de Distributed Text Services (DTS – Services de texte distribuées). Dans cet article, nous présenterons une proposition que nous appelons SHINE, qui priorise la facilité de la représentation et de l’échange des ressources textuelles et des métadonnées associées pour les développeurs de logiciel. Nous le ferons en combinant un modèle de structure textuelle hiérarchique avec un schéma de métadonnées flexible dans SHINE et nous le définirons et le développerons selon des principes axés sur l’utilisateur et selon des principes de conceptions itératifs. Par conséquent, nous avançons que SHINE est un protocole pour l’interopérabilité textuelle qui équilibre systématiquement la flexibilité de la représentation de ressources, ainsi que la simplicité globale de l’implémentation, pour toute représentation de ressources.Mots-clés: format d’échange; modélisation de documents; métadonnées; infrastructure numériques; interopérabilité


Ethnohistory ◽  
2021 ◽  
Vol 68 (4) ◽  
pp. 493-518
Author(s):  
Rafael C. Alvarado ◽  
Aldo Ismael Barriente ◽  
Allison Margaret Bigelow

Abstract The Popol Wuj is one of the most important, commonly studied, and widely circulated Indigenous literary works from colonial Mesoamerica. By some accounts, there are 1,200 editions of the work published in thirty world languages, all of which trace back to a single manuscript—itself a copy of an earlier Mayan work. To protect their work from being destroyed by colonial officials or Inquisitional authorities, the original K’iche’ authors of the Popol Wuj had to embed their ways of knowing in a language and narrative structure that could not be detected by Spanish readers. Each edition of the Popol Wuj therefore helps to uncover different elements of the cosmovisión that is embedded in the text. This article draws from recent collaborative efforts to prepare a digital critical edition of the Popol Wuj based on the editorial standards and scholarly conventions of the Text Encoding Initiative (TEI). By comparing and contrasting the advantages and drawbacks of this edition relative to printed works and digital editions, we suggest how methods from the digital humanities can shed new light on texts like the Popol Wuj.


Author(s):  
Hugh Cayless ◽  
Thibault Clérice ◽  
Jonathan Robie

Text Encoding Initiative documents are notoriously heterogeneous in structure, since the Guidelines are intended to permit the encoding on any type of text, from tax receipts written on papyrus to Shakespeare plays or novels. Citation Structures are a new feature in the TEI Guidelines that provide a way for documents to declare their own internal structure along with a way to resolve citations conforming to that structure. This feature will allow systems ike the Distributed Text Services (DTS) API, which process heterogeneous TEI documents to handle tasks like automated table of contents generation, the extraction of structural metadata, and the resolution of citations without prior knowledge of document structure.


2020 ◽  
pp. 55-74
Author(s):  
Geoffrey Williams

Ce texte pose la question de la nature des humanités numériques, ainsi que des disciplines et des outils qu’elles impliquent. Si la désignation « Humanités numériques » est récente, certains des constituants, notamment la linguistique de corpus et la TEI, Text Encoding Initiative, ont des racines dans les années quatre-vingt, alors que d’autres outils comme les CAQDAS, Computer Assisted Qualitative Data Analysis Software, sont plus récents. Après un brève historique de ces trois derniers, le texte examinera le rôle de ces disciplines et outils à travers deux exemples : la numérisation des pièces de théâtre de Louis de Boissy et du Dictionnaire universel d’Antoine Furetière dans sa version élargie de 1701.


2020 ◽  
pp. 232-238
Author(s):  
Michelle Taylor ◽  
Andrew Keck

The Text Encoding Initiative (TEI), a branch of XML, is a mature standard for encoding texts that was developed three decades ago and continues to be improved and expanded upon today. Learn about how TEI was centrally imagined for a project devoted to a corpus of John Wesley material. We will begin by explaining why we chose to use TEI for the project and reviewing the considerations inherent in transitioning from a longstanding print-based project to a digital project, including the challenges of converting thousands of pages of text across different file types into rudimentary TEI. Next, we will move into topics specific to TEI encoding practices, including the creation of XML tagsets designed to maximize the use value of the Wesley Works for its various audiences: scholars, librarians, and clergy. Finally, we will show the TEI in action by sharing an example of an XML file from our first round of encoding.


2020 ◽  
pp. 17-31
Author(s):  
Krzysztof Opaliński ◽  
Patrycja Potoniec

The original purpose of creating the corpus of the 16th Polish language was to preserve the material basis of Słownik polszczyzny XVI wieku (Dictionary of the 16th-Century Polish Language) (SPXVI) comprising 272 texts transliterated in accordance with standardised principles, which is of great value. The project described here consists in creating an online base of the resources and using a part of it as a germ of a language corpus with texts designated with morphosyntactic markers. The works adopted XML encoding in the TEI (Text Encoding Initiative) formalism, version P5, adjusted to a 16th-century text. Typographical elements as well as grammatical categories and forms of words were designated in the texts. The germ of the corpus of the 16th-century Polish language comprises 135 thousand segments and it will be expanded by another 100 thousand in the future to provide material for an automated form designation tool. Ultimately, integration with the Diachronic Corpus of Polish is planned. Keywords: lexicography – history of Polish – diachronic corpus of Polish


Author(s):  
Jacob Murel

Abstract Building upon Walsh’s Comic Book Markup Language (CMBL) used for encoding text features of comics documents, this essay explores how CBML can be modified and expanded using additional Text Encoding Initiative (TEI) features to reflect alternative theoretical and critical approaches to comics. In doing so, this essay argues that markup languages offer not only a means for analyzing encoded documents but also a means for analyzing critical approaches to documents. Because markup language reflects the critical stance of whoever produces the encoding, any revision to the markup potentially reflects a revision to the critical theoretical framework from which the encoder operates. As such, implementation of markup language in comics studies can function not only as a metalanguage for describing comics but also as a form of meta-criticism. To this end, this essay explores methods for incorporating CBML and TEI to reflect commonly opposed approaches to analyzing comics documents.


Author(s):  
Jerry Bonnell ◽  
Mitsunori Ogihara

Abstract Morphological adornment of text in Text Encoding Initiative (TEI) XML can be useful for studies in textual analysis. MorphAdorner is a principal tool for providing such functionality in English texts. However, its practical use is limited when the input XML contains branching text, e.g. when <choice> appears, as it modifies the input document. In such cases, preprocessing is required to obtain the desired results. This article introduces a new tool Hoshi with the purpose of determining how this issue can be best handled with minimal input modification and preprocessing needed. It also investigates whether parsing software available online can be used to supply morphological information that can be encoded in an output format like MorphAdorner, and whether such a tool can be developed to adorn text in other languages. Challenges include those posed by the target language, the current software available for providing morphological analysis in it, and the schema needed for encoding the results. Moreover, technical hurdles presented by segmented and branching text can complicate the alignment process, especially when the intent is to guarantee input document integrity. Our approach for handling these is presented, and the article ends by outlining future applications of Hoshi that can help to enhance TEI scholarship that prioritizes the use of morphological word metadata.


2019 ◽  
Author(s):  
Pascal Belouin ◽  
Sean H. Wang

Designing a protocol for the interoperability of digital textual resources—or, more simply, a “IIIF for texts”—remains a challenge, as such a protocol must cater to their vastly heterogenous formats, structures, languages, text encodings and metadata. There have been many attempts to propose a standard for textual resource interoperability, from the ubiquitous Text Encoding Initiative (TEI) format to more recent proposals like the Distributed Text Services (DTS) protocol. In this paper, we critically survey these attempts and introduce our proposal called SHINE, which aims to escape from TEI’s legacy and prioritize instead the ease for software developers to representation and exchange textual resources and their associated metadata. We do so by combining a hierarchical model of textual structure with a flexible metadata scheme in SHINE, and we continue to define and develop it based on user-centered and iterative design principles. Therefore, we argue that SHINE is a protocol for textual interoperability that successfully balances flexibility of resource representation, consistency across resource representation, and overall simplicity of implementation.


Sign in / Sign up

Export Citation Format

Share Document