Review of Fuster-Márquez, Miguel, Carmen Gregori-Signes and José Santaemilia Ruiz eds. 2020. Multiperspectives in Analysis and Corpus Design. Granada: Comares. ISBN: 978-84-1369-009-4

This paper reports on two research results: ( 1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and ( 2) a case study to test the design to cater for creating a specific vocabulary list using the compiled corpus. The first half of this study involved designing a precisely structured esp corpus from 190 veterinary medical charts with a hierarchy of the data. The data hierarchy in the corpus consists of document types, outline elements and inline elements, such as species and breed. Perl scripts extracted the data attached to veterinary-specific categories, and the extraction led to creating wordlists. The second part of the research tested the corpus mode, creating a list of commonly observed lexical items in veterinary medicine. The coverage rate of the wordlists by General Service List (gsl) and Academic Word List (awl) was tested, with the result that 66.4 percent of all lexical items appeared in gsl and awl, whereas 33.7 percent appeared in none of those lists. The corpus compilation procedures as well as the annotation scheme introduced in this study enable the compilation of specific corpora with explicit annotations, allowing teachers to have access to data required for creating esp classroom materials.

Download Full-text

Corpus Design and Construction in Minoritised Language Contexts - Cynllunio a Chreu Corpws mewn Cyd-destunau Ieithoedd Lleiafrifoledig

10.1007/978-3-030-72484-9 ◽

2021 ◽

Author(s):

Dawn Knight ◽

Steve Morris ◽

Tess Fitzpatrick

Keyword(s):

Corpus Design ◽

Design And Construction

Download Full-text

A 700M+ Arabic corpus: KACST Arabic corpus design and construction

Language Resources and Evaluation ◽

10.1007/s10579-014-9284-1 ◽

2014 ◽

Vol 49 (3) ◽

pp. 721-751 ◽

Cited By ~ 12

Author(s):

Abdulmohsen O. Al-Thubaity

Keyword(s):

Corpus Design ◽

Design And Construction

Download Full-text

Aspects of speaking-face data corpus design methodology

10.21437/interspeech.2004-417 ◽

2004 ◽

Author(s):

J. Bruce Millar ◽

Michael Wagner ◽

Roland Goecke

Keyword(s):

Design Methodology ◽

Corpus Design

Download Full-text

Corpus-based Studies of Lexical and Semantic Variation: The Importance of Both Corpus Size and Corpus Design

From Data to Evidence in English Language Research ◽

10.1163/9789004390652_004 ◽

2018 ◽

pp. 66-87 ◽

Cited By ~ 2

Keyword(s):

Semantic Variation ◽

Corpus Design ◽

Corpus Size

Download Full-text

Investigating the Relation Between Voice Corpus Design and Hybrid Synthesis Under Reduction Constraint

Statistical Language and Speech Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-31372-2_14 ◽

2019 ◽

pp. 162-173

Author(s):

Meysam Shamsi ◽

Damien Lolive ◽

Nelly Barbot ◽

Jonathan Chevelu

Keyword(s):

Corpus Design ◽

Reduction Constraint

Download Full-text

Building representative multi-genre corpora for legal and institutional translation research

Translation Spaces ◽

10.1075/ts.00014.pri ◽

2019 ◽

Vol 8 (1) ◽

pp. 93-116

Author(s):

Fernando Prieto Ramos ◽

Giorgina Cerutti ◽

Diego Guzmán

Keyword(s):

Target Population ◽

Sampling Techniques ◽

Sequential Approach ◽

Translation Research ◽

Restricted Area ◽

Institutional Settings ◽

Legal Translation ◽

Corpus Design ◽

Legal Perspective ◽

Institutional Texts

Abstract Exploring questions of representativeness, balance and comparability is essential to tailoring corpus design and compilation to research goals, and to ensuring the validity of research results. This is especially true when the target population of texts under examination is very large and transcends a restricted area of specialization and/or covers multiple genres, as in the case of texts translated in institutional settings. This paper describes the multilayered sequential approach to corpus building applied in a comparative study on legal translation in three of these settings. The approach is based on a full mapping and categorization of institutional texts from a legal perspective; it applies an innovative combination of stratified sampling techniques integrating quantitative and qualitative criteria adapted to the research aims. The resulting corpora, categorization matrix and selection records, together with the methodological detail provided, can be useful for building other multi-genre corpora in translation studies and further afield.

Download Full-text