text encoding
Recently Published Documents


TOTAL DOCUMENTS

114
(FIVE YEARS 27)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Pascal Belouin ◽  
Shih-Pei Chen ◽  
Sean Wang

Designing a protocol for the interoperability of digital textual resources—or, more simply, a “IIIF for texts”—remains a challenge, as such a protocol must cater to their vastly heterogenous formats, structures, languages, text encodings and metadata. There have been many attempts to propose a standard for textual resource interoperability, from the ubiquitous Text Encoding Initiative (TEI) format to more recent proposals like the Distributed Text Services (DTS) protocol. In this paper, we introduce our proposal called SHINE, which prioritizes instead the ease for software developers to represent and exchange textual resources and their associated metadata. We do so by combining a hierarchical model of textual structure with a flexible metadata scheme in SHINE, and we continue to define and develop it based on user-centered and iterative design principles. Therefore, we argue that SHINE is a protocol for textual interoperability that successfully balances flexibility of resource representation, consistency across resource representation, and overall simplicity of implementation.RésuméConcevoir un protocole pour l’interopérabilité des ressources textuelles numériques – c’est-à-dire, un IIIF pour des textes – demeure un défi, puisqu’un tel protocole doit correspondre à leurs formats considérablement hétérogènes, ainsi qu’à leurs structures, langues, encodages textuels et métadonnées. Il existe déjà plusieurs tentatives de proposer des standards pour l’interopérabilité des ressources textuelles, tel que l’ubiquiste Text Encoding Initiative (TEI – Initiative d’encodage textuel) ou des propositions plus récentes comme le protocole de Distributed Text Services (DTS – Services de texte distribuées). Dans cet article, nous présenterons une proposition que nous appelons SHINE, qui priorise la facilité de la représentation et de l’échange des ressources textuelles et des métadonnées associées pour les développeurs de logiciel. Nous le ferons en combinant un modèle de structure textuelle hiérarchique avec un schéma de métadonnées flexible dans SHINE et nous le définirons et le développerons selon des principes axés sur l’utilisateur et selon des principes de conceptions itératifs. Par conséquent, nous avançons que SHINE est un protocole pour l’interopérabilité textuelle qui équilibre systématiquement la flexibilité de la représentation de ressources, ainsi que la simplicité globale de l’implémentation, pour toute représentation de ressources.Mots-clés: format d’échange; modélisation de documents; métadonnées; infrastructure numériques; interopérabilité


2021 ◽  
Author(s):  
Filipa da Gama Calado

Literary scholars generally agree that the aesthetic qualities of Oscar Wilde’s influential text, The Picture of Dorian Gray (1891) classify it as a modernist work. At the same time, textual scholars have long speculated over the role of aesthetics in Wilde’s revision process in an apparent effort to reduce or obscure the homoerotic themes in the manuscript. Electronic editing standards such as the Text Encoding Initiative (TEI) enable scholars to trace in detail the development of homoerotic themes within a digital space. Using the TEI standard, my project transcribes and encodes the first chapter of this manuscript, which introduces the story’s three main characters, Basil Hallward, Lord Henry Wooten, and Dorian Gray. In analyzing Wilde’s suppression of the homoerotic elements, I draw from debates in Textual Scholarship and Queer Historiography to explore how electronic editing might restore or "rescue" queer subjects and themes. I end with proposing a method for electronic editing that marks Wilde's alterations and deletions in TEI formal language in a way that probes the potential of TEI's “queerability.” My method examines how TEI might work as a tool of containment that suggests elusiveness through constraint. My work here manifests the intricate handling of homoerotic elements within a distinctly queer ethos.


2021 ◽  
pp. 8-10
Author(s):  
Michelle M. Taylor ◽  
Andrew Keck

In this session, in many ways a follow-up to last year's Atla session "Proposing a TEI-Encoding Project for the Wesley Works," we introduced participants to the principles of text encoding with XML/TEI. While last year we discussed the rationale for using TEI to create a digital version of the Bicentennial Edition of the Works of John Wesley, as well as our plans for orchestrating such a large-scale project, this year we will offer introductory, hands-on training in TEI. Workshop participants will begin with the basics of text encoding common to any TEI project, then move on to a description of how the Wesley Works Digital Edition, specifically, has adopted and adapted these principles to meet its goal of creating a digital version of the Bicentennial Edition of the Works of John Wesley.


Ethnohistory ◽  
2021 ◽  
Vol 68 (4) ◽  
pp. 493-518
Author(s):  
Rafael C. Alvarado ◽  
Aldo Ismael Barriente ◽  
Allison Margaret Bigelow

Abstract The Popol Wuj is one of the most important, commonly studied, and widely circulated Indigenous literary works from colonial Mesoamerica. By some accounts, there are 1,200 editions of the work published in thirty world languages, all of which trace back to a single manuscript—itself a copy of an earlier Mayan work. To protect their work from being destroyed by colonial officials or Inquisitional authorities, the original K’iche’ authors of the Popol Wuj had to embed their ways of knowing in a language and narrative structure that could not be detected by Spanish readers. Each edition of the Popol Wuj therefore helps to uncover different elements of the cosmovisión that is embedded in the text. This article draws from recent collaborative efforts to prepare a digital critical edition of the Popol Wuj based on the editorial standards and scholarly conventions of the Text Encoding Initiative (TEI). By comparing and contrasting the advantages and drawbacks of this edition relative to printed works and digital editions, we suggest how methods from the digital humanities can shed new light on texts like the Popol Wuj.


2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Seth Erickson

Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability. Despite its ubiquity, plain text is not as plain as it may seem. The set of standards used in modern text encoding (principally, the Unicode Character Set and the related encoding format, UTF-8) have complex architectures when compared to historical standards like ASCII. Further, while the Unicode standard has gained in prominence, text encoding problems are not uncommon in research data curation. This primer provides conceptual foundations for modern text encoding and guidance for common curation and preservation actions related to textual data.


Author(s):  
Hugh Cayless ◽  
Thibault Clérice ◽  
Jonathan Robie

Text Encoding Initiative documents are notoriously heterogeneous in structure, since the Guidelines are intended to permit the encoding on any type of text, from tax receipts written on papyrus to Shakespeare plays or novels. Citation Structures are a new feature in the TEI Guidelines that provide a way for documents to declare their own internal structure along with a way to resolve citations conforming to that structure. This feature will allow systems ike the Distributed Text Services (DTS) API, which process heterogeneous TEI documents to handle tasks like automated table of contents generation, the extraction of structural metadata, and the resolution of citations without prior knowledge of document structure.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Gongbin Chen ◽  
◽  
Wei Xiang ◽  
Yansong Deng ◽  
◽  
...  

Information aggregation is an essential component of text encoding, but it has been paid less attention. The pooling-based (max or average pooling) aggregation method is a bottom-up and passive aggregation method, and loses a lot of important information. Recently, attention mechanism and dynamic routing policy are separately used to aggregate information, but their aggregation capabilities can be further improved. In this paper, we proposed an novel aggregation method combining attention mechanism and dynamic routing, which can strengthen the ability of information aggregation and improve the quality of text encoding. Then, a novel Leaky Natural Logarithm (LNL) squash function is designed to alleviate the “saturation” problem of the squash function of the original dynamic routing. Layer Normalization is added to the dynamic routing policy for speeding up routing convergence as well. A series of experiments are conducted on five text classification benchmarks. Experimental results show that our method outperforms other aggregating methods.


2021 ◽  
Vol 65 (2) ◽  
pp. 49-56
Author(s):  
Maria Smaranda Rusu

"Encoding youthful perspectives of the Anti-Communist Revolution” presents in a captivating manner two interviews dating back to the time in the history of Romania when the country was struggling with the Communist revolution which started in Timisoara. The perspective in which this information is described is the XML language. In order to simplify the data and to make it more accesible, there were used tags in a scheme. By using this method, the readers can have a better understanding of the text while having an over-all look upon the discussed historical issue. Keywords: XML, Text encoding, Anti-Communist Revolution, Testimonies, Oxygen XML Editor "


2020 ◽  
pp. 55-74
Author(s):  
Geoffrey Williams

Ce texte pose la question de la nature des humanités numériques, ainsi que des disciplines et des outils qu’elles impliquent. Si la désignation « Humanités numériques » est récente, certains des constituants, notamment la linguistique de corpus et la TEI, Text Encoding Initiative, ont des racines dans les années quatre-vingt, alors que d’autres outils comme les CAQDAS, Computer Assisted Qualitative Data Analysis Software, sont plus récents. Après un brève historique de ces trois derniers, le texte examinera le rôle de ces disciplines et outils à travers deux exemples : la numérisation des pièces de théâtre de Louis de Boissy et du Dictionnaire universel d’Antoine Furetière dans sa version élargie de 1701.


Sign in / Sign up

Export Citation Format

Share Document