scholarly journals Converting raw transcripts into an annotated and turn-aligned TEI-XML corpus: the example of the Corpus of Serbian Forms of Address

Author(s):  
Dolores Lemmenmeier-Batinić

This paper describes the procedure of building a TEI-XML corpus of spoken Serbian starting from raw transcripts. The corpus consists of semi–structured interviews, which were gathered with the aim of investigating forms of address in Serbian. The interviews were thoroughly transcribed according to GAT transcribing conventions. However, the transcription was carried out without tools that would control the validity of the GAT syntax, or align the transcript with the audio records. In order to offer this resource to a broader audience, we resolved the inconsistencies in the original transcripts, normalised the semi-orthographic transcriptions and converted the corpus into a TEI-format for transcriptions of speech. Further, we enriched the corpus by tagging and lemmatising the data. Lastly, we aligned the corpus turns to the corresponding audio segments by using a force-alignment tool. In addition to presenting the main steps involved in converting the corpus to the XML-format, this paper also discusses current challenges in the processing of spoken data, and the implications of data re-use regarding transcriptions of speech. This corpus can be used for studying Serbian from the perspective of interactional linguistics, for investigating morphosyntax, grammar, lexicon and phonetics of spoken Serbian, for studying disfluencies, as well as for testing models for automatic speech recognition and forced alignment. The corpus is freely available for research purposes.

2021 ◽  
pp. 167-171
Author(s):  
Carol Johnson ◽  
Walcir Cardoso

This mixed-methods one-shot study examines L2 writers’ perceptions of using Automatic Speech Recognition (ASR) to write using the Technology Acceptance Model (TAM), based on three criteria: usefulness, ease of use, and intention to use. After receiving training on Google voice typing in Google Docs, 17 English as a Second Language (ESL) students carried out two ASR-based writing tasks over a two-hour period. After the treatment, participants filled in a TAM-informed survey and participated in semi-structured interviews to measure their perceptions based on the target criteria. Findings indicate positive perceptions of ASR as a writing tool in terms of usefulness (language learning potential) and ease of use (e.g. user-friendly voice commands). We believe that these positive perceptions might lead to an intention to continue to use ASR, suggesting that the technology has L2 pedagogical potential.


Author(s):  
Peter A. Heeman ◽  
Rebecca Lunsford ◽  
Andy McMillin ◽  
J. Scott Yaruss

Author(s):  
Manoj Kumar ◽  
Daniel Bone ◽  
Kelly McWilliams ◽  
Shanna Williams ◽  
Thomas D. Lyon ◽  
...  

2020 ◽  
Author(s):  
Ryo Masumura ◽  
Naoki Makishima ◽  
Mana Ihori ◽  
Akihiko Takashima ◽  
Tomohiro Tanaka ◽  
...  

2019 ◽  
Author(s):  
Jack Serrino ◽  
Leonid Velikovich ◽  
Petar Aleksic ◽  
Cyril Allauzen

Sign in / Sign up

Export Citation Format

Share Document