ESP corpus design: compilation of the Veterinary Nursing Medical Chart Corpus and the Veterinary Nursing Wordlist

Corpora ◽  
2020 ◽  
Vol 15 (2) ◽  
pp. 125-140
Author(s):  
Yukiko Ohashi ◽  
Noriaki Katagiri ◽  
Katsutoshi Oka ◽  
Michiko Hanada

This paper reports on two research results: ( 1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and ( 2) a case study to test the design to cater for creating a specific vocabulary list using the compiled corpus. The first half of this study involved designing a precisely structured esp corpus from 190 veterinary medical charts with a hierarchy of the data. The data hierarchy in the corpus consists of document types, outline elements and inline elements, such as species and breed. Perl scripts extracted the data attached to veterinary-specific categories, and the extraction led to creating wordlists. The second part of the research tested the corpus mode, creating a list of commonly observed lexical items in veterinary medicine. The coverage rate of the wordlists by General Service List (gsl) and Academic Word List (awl) was tested, with the result that 66.4 percent of all lexical items appeared in gsl and awl, whereas 33.7 percent appeared in none of those lists. The corpus compilation procedures as well as the annotation scheme introduced in this study enable the compilation of specific corpora with explicit annotations, allowing teachers to have access to data required for creating esp classroom materials.

Corpora ◽  
2017 ◽  
Vol 12 (3) ◽  
pp. 393-423
Author(s):  
Karin Puga ◽  
Sandra Götz

In this paper, we introduce the language-pedagogic potential of the Corpus of Product Information (CoPI). The corpus is XML-annotated and contains about 100,000 words of product descriptions of health products, cleaning supplies and products for beauty and personal care, divided into three textual moves: (1) overview, (2) directions and (3) warnings. First, we describe the data collection, corpus design and annotation scheme of the corpus, and then we present the findings of an analysis of CoPI's most frequent words, clusters and its type–token ratio. Finally, we show its potential for language-pedagogic purposes and suggest how the CoPI analyses can be used for paper- and computer-based DDL activities that foster corpus-based genre teaching in the advanced EFL classroom. We conclude this paper by summarising the outcomes of a first case study we conducted to test these activities with advanced learners of English.


2020 ◽  
Author(s):  
C Sutarsyah ◽  
Paul Nation ◽  
G Kennedy

This study compares the vocabulary of a single Economics text of almost 300,000 running words with the vocabulary of a corpus of similar length made up of a variety of academic texts. It was found that the general academic corpus used a very much larger vocabulary than the more focused Economics text. A small number of words that were closely related to the topic of the text occurred with very high frequency in the Economics text. The general academic corpus had a very large number of low frequency words. Beyond the words in West's General Service List and the University Word List, there was little overlap between the vocabulary of the two corpora. This indicates that as far as vocabulary is concerned, EAP courses that go beyond the high frequency academic vocabulary are of little value for learners with specific purposes. © 1994, Sage Publications. All rights reserved.


2020 ◽  
Author(s):  
C Sutarsyah ◽  
Paul Nation ◽  
G Kennedy

This study compares the vocabulary of a single Economics text of almost 300,000 running words with the vocabulary of a corpus of similar length made up of a variety of academic texts. It was found that the general academic corpus used a very much larger vocabulary than the more focused Economics text. A small number of words that were closely related to the topic of the text occurred with very high frequency in the Economics text. The general academic corpus had a very large number of low frequency words. Beyond the words in West's General Service List and the University Word List, there was little overlap between the vocabulary of the two corpora. This indicates that as far as vocabulary is concerned, EAP courses that go beyond the high frequency academic vocabulary are of little value for learners with specific purposes. © 1994, Sage Publications. All rights reserved.


Author(s):  
Ebtisam Saleh Aluthman

The present study is conducted within the borders of lexicographic research, where corpora have increasingly become all-pervasive. The overall goal of this study is to compile an open-source OPEC[1] Word List (OWL) that is available for lexicographic research and vocabulary learning related to English language learning for the purpose of oil marketing and oil industries. To achieve this goal, an OPEC Monthly Reports Corpus (OMRC) comprising of 1,004,542 words was compiled. The OMRC consists of 40 OPEC monthly reports released between 2003 and 2015. Consideration was given to both range and frequency criteria when compiling the OWL which consists of 255 word types. Along with this basic goal, this study aims to investigate the coverage of the most well-recognised word lists, the General Service List of English Words (GSL) (West ,1953)  and  the Academic Word List (AWL) (Coxhead, 2000) in the OMRC corpus. The 255 word types included in the OWL are not overlapping with either the AWL or the GSL. Results suggest the necessity of making this discipline-specific word list for ESL students of oil marketing industries. The availability of the OWL has significant pedagogical contributions to curriculum design, learning activities and the overall process of vocabulary learning in the context of teaching English for specific purposes (ESP).Keywords: Vocabulary Profiling- Vocabulary Learning- Word List- OPEC- ESPOPEC stands for Organisation of Petroleum Exporting Countries.


1987 ◽  
Vol 26 (02) ◽  
pp. 73-76 ◽  
Author(s):  
Kathryn Rowan ◽  
P. Byass ◽  
R. W. Snow

SummaryThis paper reports on a computerised approach to the management of an epidemiological field trial, which aimed at determining the effects of insecticide-impregnated bed nets on the incidence of malaria in children. The development of a data system satisfying the requirements of the project and its implementation using a database management system are discussed. The advantages of this method of management in terms of rapid processing of and access to data from the study are described, together with the completion rates and error rates observed in data collection.


2005 ◽  
Vol 14 (1) ◽  
pp. 71-83 ◽  
Author(s):  
Harriet B. Klein

This case study considers the phonological forms of early lexical items produced by 1 normally developing boy, from 19 to 22 months of age, who began to produce all monosyllabic words as bisyllabic. In order to link this empirical data (the apparent creation of increased complexity) with universal tendencies (motivated by the reduction of complexity), the functions of reduplication were revisited. Phonological processes (i.e., reduplication and final consonant deletion) are viewed as repairs motivated by 2 interacting constraints (i.e., constraints on monosyllabic words and on word-final consonants). These longitudinal case study data provide further evidence for a relationship between final consonant deletion and reduplication. A possible treatment approach for similar patterns demonstrated clinically is recommended.


Author(s):  
Nadezhda Shamova ◽  

A broad scope of application of corpus technologies indicates their importance in applied linguistics. Employing the comparative-contrastive method and the method of computer analysis, the author seeks to compare and contrast the main corpus tools of the programs Sketch Engine, AntConc, and WordSmith Tools, focusing on texts from specialized periodicals about cinematography ‘Total Film’ and ‘American Cinematographer’ for 2019–2020. The primary goal of this comparison is to provide recommendations for optimal choice of tools and programs for obtaining certain types of information. The author processed the total volume of texts that contained over 900,000 words, using the functions “concordance”, “word list”, “collocations” + “word s etch”, “N-grams”, “keywords” in Sketch Engine and AntConc (Word-Smith Tools has only “concordance”, “wordlist,” and “keywords”). Information about specific tools available in various corpora is collected and presented in a specially developed table. Different software programs described in the article have functions that perform the same tasks, but there are some differences in how data is presented. Among the software programs featured in this case study, the Sketch Engine platform gives the most options for choosing personal settings. The “concordance” function shows the word in context, “Wordlist” shows all the words on a given list with a record of their frequency in the corpus. The “collocation” function (or “word s etch”) recognizes fixed expressions, “N-grams” finds phrases that comprise a certain number of elements, while the “ eywords” function allows users to identify words that are specific to a particular subject area. Information thus obtained from the corpora may be helpful in updating English LSP dictionaries and glossaries of cinematography. The theoretical significance of the present study lies in systematizing the material about existing corpus tools, while its practical value is in using the tools of three corpus programs for the study of cinematic discourse, understood here as language used by the community of movie goers and filmmakers in their discussions of cinematography in specialized periodicals ‘Total Film’ and ‘American Cinematographer’.


2021 ◽  
Vol 7 ◽  
Author(s):  
Cody Ising ◽  
Pedro Rodriguez ◽  
Daniel Lopez ◽  
Jeffrey Santner

In combustion chemistry experiments, reaction rates are often extracted from complex experiments using detailed models. To aid in this process, experiments are performed such that measurable quantities, such as species concentrations, flame speed, and ignition delay, are sensitive to reaction rates of interest. In this work, a systematic method for determining such sensitized experimental conditions is demonstrated. An open-source python script was created using the Cantera module to simulate thousands of 0D and hundreds of 1D combustion chemistry experiments in parallel across a broad, user-defined range of mixture conditions. The results of the simulation are post-processed to normalize and compare sensitivity values among reactions and across initial conditions for time-varying and steady-state simulations, in order to determine the “most useful” experimental conditions. This software can be utilized by researchers as a fast, user-friendly screening tool to determine the thermodynamic and mixture parameters for an experimental campaign. We demonstrate this software through two case studies comparing results of the 0D script against a shock tube experiment and results of the 1D script against a spherical flame experiment. In the shock tube case study we present mixture conditions compared to those used in the literature to study H + O2 (+M)→HO2(+M). In the flame case study, we present mixture conditions compared to those in the literature to study formyl radical (HCO) decomposition and oxidation reactions. The systematically determined experimental conditions identified in the present work are similar to the conditions chosen in the literature.


Sign in / Sign up

Export Citation Format

Share Document