Doing Corpus Linguistics: Toward a Conceptual Framework for Indicator of Gender in English Language and Education

Technology plays a pivotal role in the ESL teaching and education sector. In language teaching, gender and language research mostly favors the idea of potential differences in language use between men and women. This paper explores different indicators of gender in the writing of males and females in a large subset of the British National Corpus (BNC) covering the domain of fiction with the application of the Corpus tool. Robin Lakoff's four key linguistic terms that mark female language have been used as benchmarks against which the study has been conducted. Previous researchers like Argamon, Koppel, and Shimoni claim that females use more pronouns and a smaller number of nouns as compared to men. The hits and frequencies of Lakoff's terms and researchers' claims have been checked on BNC to get at the empirical findings. Taking general corpus BNC, corpus research method has been used to answer the research questions. The study found a substantial difference in the documents authored by male and female written text. It was also found that females use many more pronouns and males use many more nouns. Assumptions made regarding Lakoff's terms have been partially substantiated since the results vary a little concerning the use of empty adjectives like 'cute' and 'divine'. The work is a valuable addition to the existing corpus of knowledge about gender differences in language and it provides space for researchers to work in even broader perspectives.

Download Full-text

Skinny, Slim, dan Thin: Analisis Berbasis Korpus Kata Sifat Identik dan Implikasinya pada Pengajaran Bahasa Inggris

Ranah Jurnal Kajian Bahasa ◽

10.26499/rnh.v8i1.894 ◽

2019 ◽

Vol 8 (1) ◽

pp. 19

Author(s):

Millatul Islamiyah ◽

Muchamad Sholakhuddin Al Fajri

Keyword(s):

Data Analysis ◽

Corpus Linguistics ◽

English Language ◽

Language Teaching ◽

English Language Teaching ◽

Metaphorical Meaning ◽

Analysis Instrument ◽

Idiomatic Expressions ◽

British National Corpus ◽

National Corpus

This paper is an exploratory corpus-based investigation into a group of near synonymous adjectives: skinny, slim, and thin. It employs the British National Corpus (BNC) as data and Sketch Engine as data analysis instrument. By using corpus linguistics techniques such as concordance and collocation analysis, it compares the synonymous words’ usage, meaning, and pattern to identify which synonymous words are more appropriate in a certain context. The results suggest that thin has neutral nuance expression and slim tends to carry positive connotation, while skinny is often used by speakers when they want to be more pejorative or deprecating. Moreover, unlike skinny which mainly modifies animate-related nouns, slim is more heterogeneous as they also can modify inanimate-related nouns and when it collocates with inanimate nouns, it often extends its’ meaning into metaphor expression which means ‘small’. Thin is used in many idiomatic expressions and when combined with common words it can also be used to denote metaphorical meaning. These findings can be applied in English language teaching so that students will be able to use the synonymous adjectives in an apt context and to avoid undesirable implication. ABSTRAKPenelitian ini adalah analisis berbasis korpus pada kelompok kata sifat yang hampir sama: “skinny”, “slim”, dan “thin”. Penelitian ini menggunakan British National Corpus (BNC) sebagai data dan Sketch Engine sebagai instrumen analisis data. Dengan menggunakan teknik linguistik korpus seperti konkordansi dan analisis kolokasi, artikel ini membandingkan penggunaan, makna, dan pola kata sinonim untuk mengidentifikasi kata-kata sinonim yang lebih tepat dalam konteks tertentu. Hasilnya menunjukkan bahwa “thin” memiliki ekspresi nuansa netral dan “slim” cenderung membawa konotasi positif, sementara “skinny” sering digunakan oleh pembicara ketika mereka ingin lebih merendahkan atau mencela. Selain itu, tidak seperti “skinny” yang banyak memodifikasi nomina yang berhubungan dengan benda hidup, “slim” lebih heterogen karena mereka juga dapat memodifikasi nomina yang tidak hidup dan ketika ia bertaut dengan kata benda tak hidup, ia sering memperluas maknanya menjadi ekspresi metafora yang berarti “small”/kecil. “Thin” digunakan dalam banyak ekspresi idiomatis dan ketika dikombinasikan dengan kata-kata umum, “thin” juga dapat digunakan untuk menunjukkan makna metaforis. Temuan ini dapat diterapkan dalam pengajaran bahasa Inggris sehingga siswa akan dapat menggunakan kata sifat sinonim dalam konteks yang tepat dan untuk menghindari implikasi yang tidak diinginkan.

Download Full-text

Advances in Corpus Linguistics: Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) (review)

Language ◽

10.1353/lan.2007.0029 ◽

2007 ◽

Vol 83 (1) ◽

pp. 215-215

Author(s):

Hans. Lindquist

Keyword(s):

Corpus Linguistics ◽

English Language ◽

International Conference ◽

Language Research

Download Full-text

A Corpus-based Comparative Study of Learn and Acquire

English Language Teaching ◽

10.5539/elt.v9n1p209 ◽

2015 ◽

Vol 9 (1) ◽

pp. 209

Author(s):

Bei Yang

Keyword(s):

Second Language ◽

Comparative Study ◽

Language Learners ◽

English Language ◽

Second Language Learners ◽

Linguistic Feature ◽

Natural Discourse ◽

British National Corpus ◽

National Corpus

As an important yet intricate linguistic feature in English language, synonymy poses a great challenge for second language learners. Using the 100 million-word British National Corpus (BNC) as data and the software Sketch Engine (SkE) as an analyzing tool, this article compares the usage of learn and acquire used in natural discourse by conducting the analysis of concordance, collocation, word sketches and sketch difference. The results show that different functions of SkE can make different contributions to the discrimination of learn and acquire. Pedagogical implications are discussed when the results are introduced into the classroom.

Download Full-text

Material Critique for Touchstone 3rd Edition: on corpus analysis and spoken grammar

EDULINK : EDUCATION AND LINGUISTICS KNOWLEDGE JOURNAL ◽

10.32503/edulink.v1i2.606 ◽

2019 ◽

Vol 1 (2) ◽

pp. 34

Author(s):

Entusiastik -

Keyword(s):

Natural Language ◽

English Language ◽

Spoken Language ◽

Corpus Analysis ◽

English Language Teaching ◽

Free Access ◽

Lexical Bundles ◽

Language Analysis ◽

British National Corpus ◽

National Corpus

This paper analysed the use of corpus and spoken language features in the English Language Teaching (ELT) coursebook “Touchstone”. The corpus analysis was carried out by using the British National Corpus (BNC) which was chosen for its easy and free access. In doing the spoken language analysis, I refer to McCarthy and Carter’s (2015, p.5) argument which take the grammar of conversation as ‘the benchmark for a grammar of speaking’ by considering features such as ellipsis, heads and teailsm lexical bundles, and vagueness. The analysis indicated that the language used in this coursebook signified a certain level of authentic and natural language, although areas of improvement were also found.

Download Full-text

TRANSFORMAÇÕES LEXICO-SEMÂNTICAS CORRELATAS À INFLUÊNCIA DA INTERNET

Trama ◽

10.48075/rt.v16i37.23604 ◽

2020 ◽

Vol 16 (37) ◽

pp. 4-17

Author(s):

Luiz Henrique Mendes BRANDÃO ◽

Jesiel Soares SILVA

Keyword(s):

New York ◽

Los Angeles ◽

Corpus Linguistics ◽

English Language ◽

American English ◽

English Word ◽

Annual Review ◽

Syntactic Variation ◽

Language Research ◽

Causative Construction

Neste trabalho, objetivou-se analisar as transformações ocorridas no uso da linguagem por parte de seus usuários tendo como base o período correspondente ao início dos anos 90, momento histórico em que a internet ainda não havia sido popularizada no mundo, em comparação ao ano de 2017, período marcado pelo amplo acesso à internet, principalmente nos países mais desenvolvidos. Para tal, realizou-se uma investigação tendo como base o COCA (Corpus of Contemporary American English) com o intuito de se verificar, através da associação de palavras com seus colocados, como alguns termos eram utilizados antes da popularização da internet e após o mesmo fenômeno. Através da análise estatística dos insumos, foi possível identificar que certos termos da língua (neste caso da língua inglesa) passaram a ser utilizados mais frequentemente para expressar algo relacionado à tecnologia, tendo sido os sentidos anteriores rebaixados, nesta transformação semântica, a uma frequência menor ou muito menor de uso após a realidade do acesso amplo à internet, o que representa uma transformação léxico-semântica propiciada por um fenômeno de alcance global que influencia a vida das pessoas de modo a ressignificar o uso que fazem do mundo e consequentemente a metalinguagem que utilizam nas trocas que realizam com o mesmo.REFERÊNCIASBENSON, M., BENSON, E., ILSON, R. (orgs.)The BBI dictionary of english word combinations. Amsterdã/Filadélfia: John Benjamins, 1986.BIBER, D. Variation across speech and writing. Cambridge: Cambridge University Press, 1988Davies, Mark. The Corpus of Contemporary American English (COCA): 600 million words, 1990-present, 2008. Disponível em: https://www.english-corpora.org/coca/. Acesso em: 19 fev. 2020.CASTELLVI, Maria Teresa CABRÉ. La clasificación de neologismos. Alfa, São Paulo, 50 (2): 229-250, 2006DAVIES, Mark. The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing, Brigham, v. 25, n. 4, 2010. Disponível em: https://academic.oup.com/dsh/article-abstract/25/4/447/997323?redirectedFrom=fulltext. Acesso em: 21 ago. 2019.FRANCIS, W. N.; KUCERA, H. Frequency analysis of English usage: lexicon and grammar. Boston: Houghton Mifflin, 1982DAVIES, Mark; KIM, Jong-Bok. Historical shifts with the INTO-CAUSATIVE construction in American English. The Gruyter mouton, [S.L.], v. 57, n. 1, 2019. Disponível em: http://web.khu.ac.kr/~jongbok/research/2019/2019-ahci-into-historical-shift-linguistics.pdf Acesso em 21 ago. 2019DICIONÁRIO PRIBERAM DA LÍNGUA PORTUGUESA. Desenvolvido por Lello editores, Porto, 1996 e 1999. Licensiado à Priberam em 2008. Disponível em: https://dicionario.priberam.org/sobre.aspx Acesso em 21 ago. 2019KJELLMER, G. A. A dictionary of English collocations: based on the Brown Corpus, v. 3. Oxford: Oxford University Press, 1994KREMELBERG, David. Practical statistic: a quick and easy guide to IBM ℗ SPSS ℗ Statistics, STATA, and other statistical software. Sage: Los Angeles, 2011.MC ENERY, Tony, et al. Corpus Linguistics, Learner Corpora, and SLA: Employing Technology to Analyze Language Use. Annual Review of Applied Linguistics (2019), 39, 74–92MODIS, Theodore. The end of the internet rush. Technological Forecasting Social Change, Lugano, v. 72, n. 8, 2005. Disponível em: https://www.sciencedirect.com/science/article/pii/S0040162505000843 Acesso em: 21 ago. 2019OLIVEIRA, Lúcia Pacheco de. Linguística de corpus: Teoria, interfaces e aplicações. Matraga, Rio de janeiro, v. 16, n. 24, 2009. Disponível em: https://www.e-publicacoes.uerj.br/index.php/matraga/article/view/27796. Acesso em: 21 ago. 2019PARTINGTON, A. Patterns and meanings: using corpora for english language research and teaching. Amsterdã/Filadélfia: John Benjamnins, 1998ROBINSON, Mary; DUNCAN, Daniel (2019) Holistic Approaches to Syntactic Variation: Wh-all Questions in English. University of Pennsylvania Working Papers in Linguistics: v. 25, n. 1 , 2019. Disponível em: https://repository.upenn.edu/pwpl/vol25/iss1/23/. Acesso em: 21 ago. 2019SANCHEZ, A. Definición e historia de los corpos. In: SANCHEZ, A. et al. (orgs.). CUMBRE: corpus linguístico de español contemporaneo. Madri: SGEL, 1995, p. 7-24.BERBER SARDINHA. T. Linguística de Corpus. Barueri, SP: Manole, 2004.SINCLAIR, J. McH. Beginning the study of lexis. In: BAZELL, C. E. In memory of R. Firth. Londres: Longman, 1966, p. 410-430.SVARTVIK, Jan. Corpora are becoming mainstream. In: THOMAS, J. and SHORT, M. (orgs). Using corpora for language research. London and New York: Longman,1996. p 3-13.Recebido em 16-11-2019 | Aceito em 12-02-2020

Download Full-text

Sketch Engine in Building a Lexical Minimum for Children

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9206.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5105-5108

Keyword(s):

Computational Linguistics ◽

Computer Technology ◽

Corpus Linguistics ◽

English Language ◽

Case Manager ◽

Foreign Languages ◽

School Age ◽

Language Research ◽

The Many ◽

Primary School Age

Our article deals with such an aspect of computational linguistics as the construction of lexical minima cases using the Sketch Engine program as an example. The advent of computational linguistics has played an important role in the process of learning foreign languages. Thanks to computer technology, the process of learning foreign languages is greatly simplified and becomes more accessible. Among the many programs for learning foreign languages, we chose the Sketch Engine program, since it is a case manager and a tool for analyzing linguistic buildings, that is, collections of texts selected and processed according to certain rules, which are used as the basis for language research. This resource is software that combines a specialized search engine and a lot of buildings in different languages. We describe the program through the prism of corpus linguistics, consider the functions and capabilities of this program Sketch Engine in drawing up the lexical minimum for primary school age in English, Russian and German. In this paper, we conducted an experiment on drawing up a lexical minimum for schoolchildren, which consisted in selecting 300 most used words of the English language and saturating them with examples from the cases of the Sketch Engine program.

Download Full-text

The Content Form of the Lexeme “Average”: Synchrony and Diachrony

MGIMO Review of International Relations ◽

10.24833/2071-8160-2015-3-42-250-254 ◽

2015 ◽

pp. 250-254

Author(s):

T. A. Svetonosova

Keyword(s):

Research Methods ◽

English Language ◽

Lexical Semantics ◽

English For Specific Purposes ◽

Theoretical Part ◽

Diachronic Analysis ◽

British National Corpus ◽

The Given ◽

National Corpus

Why can a word in the English language have various meanings? As a rule, such queries arise in English for General Purposes classes and their number is increasing in English for Specific Purposes classes. The word average is learnt in both above-mentioned classes and it is evident that it has different meanings. It is worth noting that not all these meanings can be found in monolingual dictionaries. Watching the usage of the word average while teaching has led to the given article. The synchronic and diachronic analysis of the content form of the lexeme average as the noun is conducted in the article. The theoretical part embraces points of semantics development, lexical semantics notions as well as concepts and definitions of semantics terms used in this article. Then the reasons for choosing the lexeme average are stated, the goal of the article is set, the research methods are provided. The practical part covers the synchronic and diachronic contexts in which the lexeme average operates - data from the British National Corpus, entries from monolingual dictionaries, materials from coursebooks, entries from etymological dictionaries. All these contexts are analyzed and inferences about the content form of the lexeme average are made. At the end of the article further possible research of the lexeme average is outlined.

Download Full-text

Introducing a corpus of conversational stories. Construction and annotation of the Narrative Corpus

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2012-0015 ◽

2012 ◽

Vol 8 (2) ◽

pp. 313-350 ◽

Cited By ~ 13

Author(s):

Christoph Rühlemann, ◽

Matthew Brook O'Donnell,

Keyword(s):

Corpus Linguistics ◽

Selection Criteria ◽

Sampling Methods ◽

Extraction Techniques ◽

Social Significance ◽

Conversational Narrative ◽

British National Corpus ◽

Four Levels ◽

Moral Stance ◽

National Corpus

AbstractAlthough widely seen as critical both in terms of its frequency and its social significance as a prime means of encoding and perpetuating moral stance and configuring self and identity, conversational narrative has received little attention in corpus linguistics. In this paper we describe the construction and annotation of a corpus that is intended to advance the linguistic theory of this fundamental mode of everyday social interaction: the Narrative Corpus (NC). The NC contains narratives extracted from the demographically-sampled subcorpus of the British National Corpus (BNC) (XML version). It includes more than 500 narratives, socially balanced in terms of participant sex, age, and social class.We describe the extraction techniques, selection criteria, and sampling methods used in constructing the NC. Further, we describe four levels of annotation implemented in the corpus: speaker (social information on speakers), text (text Ids, title, type of story, type of embedding etc.), textual components (pre-/post-narrative talk, narrative, and narrative-initial/final utterances), and utterance (participation roles, quotatives and reporting modes). A brief rationale is given for each level of annotation, and possible avenues of research facilitated by the annotation are sketched out.

Download Full-text

Are Synonyms Always Synonymous? A Corpus-assisted Approach to Announce, Declare, and State

ASIAN TEFL Journal of Language Teaching and Applied Linguistics ◽

10.21462/asiantefl.v5i1.110 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Ike Susanti Effendi ◽

Riska Amalia ◽

Sakinah Asa Lalita

Keyword(s):

Foreign Language ◽

Corpus Linguistics ◽

Language Use ◽

Language Teaching ◽

Authentic Materials ◽

Word Combination ◽

Instrument Analysis ◽

Data Source ◽

British National Corpus ◽

National Corpus

The study on (near) synonymous word has been of intriguing topic in the recent decades. Scholars have investigated them from diverse perspectives including but not limited to semantics, grammar, and language teaching. However, few of them examine synonymous verbs. This study endeavors to scrutinize ‘announce’, ‘declare’, and ‘state’ by employing descriptive qualitative approach and British National Corpus as data source. Besides, it also attempts to shed pivotal light the pedagogical implication of corpus linguistics to the teaching of word or vocabulary and meaning in use. Sketch Engine is used as instrument analysis by which collocation and concordance analysis were employed to elucidate word combination and contexts to produce meaning. The findings demonstrate that ‘announce’, ‘declare’, and ‘state’ could not be used rudimentary interchangeably since they carry out (slightly) different meaning depending on collocate word and grammatical pattern. This study also corroborated the notion that corpus linguistics plays significant role in foreign language teaching since it offers authentic materials and contextual clue for language use.

Download Full-text

Identifying and describing functional discourse units in the BNC Spoken 2014

Text & Talk - An Interdisciplinary Journal of Language Discourse Communication Studies ◽

10.1515/text-2020-0053 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Jesse Egbert ◽

Stacey Wizner ◽

Daniel Keller ◽

Douglas Biber ◽

Tony McEnery ◽

...

Keyword(s):

New Method ◽

Linguistic Variation ◽

Important Research ◽

Conversational Discourse ◽

Conversational Language ◽

Research Questions ◽

British National Corpus ◽

Discourse Units ◽

Ongoing Project ◽

National Corpus

Abstract On the surface, it appears that conversational language is produced in a stream of spoken utterances. In reality conversation is composed of contiguous units that are characterized by coherent communicative purposes. A large number of important research questions about the nature of conversational discourse could be addressed if researchers could investigate linguistic variation across functional discourse units. To date, however, no corpus of conversational language has been annotated according to functional units, and there are no existing methods for carrying out this type of annotation. We introduce a new method for segmenting transcribed conversation files into discourse units and characterizing those units based on their communicative purposes. In this paper, the development and piloting of this method is described in detail and the final framework is presented. We conclude with a discussion of an ongoing project where we are applying this coding framework to the British National Corpus Spoken 2014.

Download Full-text