Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis

2018 ◽  
Vol 14 (1) ◽  
pp. 133-167 ◽  
Author(s):  
Punjaporn Pojanapunya ◽  
Richard Watson Todd

AbstractKeyword analysis is used in a range of sub-disciplines of applied linguistics from genre analyses to critically-oriented studies for different purposes ranging from producing a general characterization of a genre to identifying text-specific ideological issues. This study compares the use of log-likelihood (LL), a probability statistic, and odds ratio (OR), an effect size statistic, for keyword identification and argues that the two methods produce different keywords applicable to research focusing on different purposes. Through two case studies, keyword analyses of advance fee scams against the British National Corpus and research articles in applied linguistics against research articles from other academic disciplines, we show that both the LL and OR keywords concern the aboutness of the corpus, but differ in their specificity and pervasiveness through the corpus. LL highlights words which are relatively common in general use serving genre purposes, whereas OR highlights more specialized words serving critically-oriented purposes. Methodological and practical contributions to keyword analysis are discussed.

2020 ◽  
Vol 83 ◽  
pp. 191-204
Author(s):  
Muhammed Parviz ◽  
Alireza Jalilifar ◽  
Alexanne Don

The present study aimed at exploring how research article writers from two academic disciplines exploit phrasal complexity features (PCFs) to verbalize the results sections of research articles with the eventual aim of assisting advanced EFL writers with their composition strategies. To this end, following a manual search, 200 empirical research articles in the fields of Applied Linguistics and Physics were comparatively examined. Due to the low rate of success of tagging programs in identifying the occurrences of PCFs, the datasets were also manually analyzed. The results revealed that the research article writers drew upon three high-frequency phrasal complexity features, namely, pre-modifying adjectives, post-modifying prepositional phrases, and nominalizations. The study also revealed that the results sections of research articles included different amounts of exceedingly complex patterns of pre-modification, a hybrid of novel appositive structures, and great reliance on hyphenated adjectives. Overall, we believe that these findings can be used to heighten the awareness of academic writers and instructors regarding the linguistic characteristics of academic writing and of the variations of how such phrasal features of compression are constructed in different academic subjects.


2020 ◽  
Author(s):  
Rajab Esfandiari ◽  
Mohammad Ahmadi

Abstract Complexity measures in academic writing have experienced a shift from clausal to phrasal indices in recent years. Drawing on a subset of Biber et al. (2011) hypothesized stages of writing development, we explored phrasal complexity across sections of research articles (RAs) in applied linguistics and clinical medicine. A 389,332-word corpus consisting of 80 randomly selected RAs from leading journals in applied linguistics and clinical medicine was compiled for the purposes of the present study. One-way analysis of variance (ANOVA) and independent-samples t-test, as implemented in SPSS (version 25), were employed to find differences across the RA sections and between two groups of academic writers. The findings indicated that RAs in clinical medicine relied more heavily on noun phrase modifiers in all sections than those in applied linguistics, suggesting that the distributional pattern of these linguistic expressions is discipline-independent. The implications of the findings are also discussed.


2021 ◽  
Vol 9 (2) ◽  
pp. 1-33
Author(s):  
Stefan Th. Gries

A widely-used method in corpus-linguistic approaches to discourse analysis, register/text type/genre analysis, and educational/curriculum questions is that of keywords analysis, a simple statistical method aiming to identify words that are key to, i.e. characteristic for, certain discourses, text types, or topic domains. The vast majority of keywords analyses relied on the same statistical measure that most collocation studies are using, the log-likelihood ratio, which is performed on frequencies of occurrence in two corpora under consideration. In a recent paper, Egbert and Biber (2019) advocated a different approach, one that involves computing log-likelihood ratios for word types based on the range of their distribution rather than their frequencies in the target and reference corpora under consideration. In this paper, I argue that their approach is a most welcome addition to keywords analysis but can still be profitably extended by utilizing both frequency and dispersion for keyness computations. I am presenting a new two-dimensional approach to keyness and exemplifying it on the basis of the Clinton-Trump Corpus and the British National Corpus.


2017 ◽  
Vol 34 (4) ◽  
pp. 477-492 ◽  
Author(s):  
Ute Römer

This paper aims to connect recent corpus research on phraseology with current language testing practice. It discusses how corpora and corpus-analytic techniques can illuminate central aspects of speech and help in conceptualizing the notion of lexicogrammar in second language speaking assessment. The description of speech and some of its core features is based on the 1.8-million-word Michigan Corpus of Academic Spoken English (MICASE) and on the 10-million-word spoken component of the British National Corpus (BNC). Analyses of word frequency and keyword lists are followed by an automatic extraction of different types of phraseological items that are particularly common in speech and serve important communicative functions. These corpus explorations provide evidence for the strong interconnectedness of lexical items and grammatical structures in natural language. Based on the assumption that the existence of lexicogrammatical patterns is of relevance for constructs of speaking tests, the paper then reviews rubrics of popular high-stakes speaking tests and critically discusses how far these rubrics capture the central aspects of spoken language identified in the corpus analyses as well as the centrality of phraseology in language. It closes with recommendations for speaking assessment in the light of this characterization of real-world spoken lexicogrammar.


2020 ◽  
pp. 007542422097914
Author(s):  
Karin Aijmer

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.


1998 ◽  
Vol 21 (2) ◽  
pp. 221-222
Author(s):  
Louis G. Tassinary

Chow (1996) offers a reconceptualization of statistical significance that is reasoned and comprehensive. Despite a somewhat rough presentation, his arguments are compelling and deserve to be taken seriously by the scientific community. It is argued that his characterization of literal replication, types of research, effect size, and experimental control are in need of revision.


2014 ◽  
Vol 12 (4) ◽  
pp. 319-340
Author(s):  
Anu Koskela

This paper explores the lexicographic representation of a type of polysemy that arises when the meaning of one lexical item can either include or contrast with the meaning of another, as in the case of dog/bitch, shoe/boot, finger/thumb and animal/bird. A survey of how such pairs are represented in monolingual English dictionaries showed that dictionaries mostly represent as explicitly polysemous those lexical items whose broader and narrower readings are more distinctive and clearly separable in definitional terms. They commonly only represented the broader readings for terms that are in fact frequently used in the narrower reading, as shown by data from the British National Corpus.  


Sign in / Sign up

Export Citation Format

Share Document