Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis

AbstractKeyword analysis is used in a range of sub-disciplines of applied linguistics from genre analyses to critically-oriented studies for different purposes ranging from producing a general characterization of a genre to identifying text-specific ideological issues. This study compares the use of log-likelihood (LL), a probability statistic, and odds ratio (OR), an effect size statistic, for keyword identification and argues that the two methods produce different keywords applicable to research focusing on different purposes. Through two case studies, keyword analyses of advance fee scams against the British National Corpus and research articles in applied linguistics against research articles from other academic disciplines, we show that both the LL and OR keywords concern the aboutness of the corpus, but differ in their specificity and pervasiveness through the corpus. LL highlights words which are relatively common in general use serving genre purposes, whereas OR highlights more specialized words serving critically-oriented purposes. Methodological and practical contributions to keyword analysis are discussed.

Download Full-text

Phrasal Discourse Style in Cross-Disciplinary Writing: A Comparison of Phrasal Complexity Features in the Results Sections of Research Articles

Círculo de lingüística aplicada a la comunicación ◽

10.5209/clac.70573 ◽

2020 ◽

Vol 83 ◽

pp. 191-204

Author(s):

Muhammed Parviz ◽

Alireza Jalilifar ◽

Alexanne Don

Keyword(s):

Academic Writing ◽

High Frequency ◽

Applied Linguistics ◽

Academic Disciplines ◽

Research Articles ◽

Disciplinary Writing ◽

Prepositional Phrases ◽

Manual Search ◽

Research Article ◽

Rate Of Success

The present study aimed at exploring how research article writers from two academic disciplines exploit phrasal complexity features (PCFs) to verbalize the results sections of research articles with the eventual aim of assisting advanced EFL writers with their composition strategies. To this end, following a manual search, 200 empirical research articles in the fields of Applied Linguistics and Physics were comparatively examined. Due to the low rate of success of tagging programs in identifying the occurrences of PCFs, the datasets were also manually analyzed. The results revealed that the research article writers drew upon three high-frequency phrasal complexity features, namely, pre-modifying adjectives, post-modifying prepositional phrases, and nominalizations. The study also revealed that the results sections of research articles included different amounts of exceedingly complex patterns of pre-modification, a hybrid of novel appositive structures, and great reliance on hyphenated adjectives. Overall, we believe that these findings can be used to heighten the awareness of academic writers and instructors regarding the linguistic characteristics of academic writing and of the variations of how such phrasal features of compression are constructed in different academic subjects.

Download Full-text

A Corpus-based Analysis of Phrasal Complexity across Different Sections of Research Articles in two Academic Disciplines

10.21203/rs.3.rs-121169/v1 ◽

2020 ◽

Author(s):

Rajab Esfandiari ◽

Mohammad Ahmadi

Keyword(s):

Noun Phrase ◽

Academic Writing ◽

Clinical Medicine ◽

Applied Linguistics ◽

Writing Development ◽

Academic Disciplines ◽

Research Articles ◽

Distributional Pattern ◽

Complexity Measures ◽

Leading Journals

Abstract Complexity measures in academic writing have experienced a shift from clausal to phrasal indices in recent years. Drawing on a subset of Biber et al. (2011) hypothesized stages of writing development, we explored phrasal complexity across sections of research articles (RAs) in applied linguistics and clinical medicine. A 389,332-word corpus consisting of 80 randomly selected RAs from leading journals in applied linguistics and clinical medicine was compiled for the purposes of the present study. One-way analysis of variance (ANOVA) and independent-samples t-test, as implemented in SPSS (version 25), were employed to find differences across the RA sections and between two groups of academic writers. The findings indicated that RAs in clinical medicine relied more heavily on noun phrase modifiers in all sections than those in applied linguistics, suggesting that the distributional pattern of these linguistic expressions is discipline-independent. The implications of the findings are also discussed.

Download Full-text

A new approach to (key) keywords analysis: Using frequency, and now also dispersion

Research in Corpus Linguistics ◽

10.32714/ricl.09.02.02 ◽

2021 ◽

Vol 9 (2) ◽

pp. 1-33

Author(s):

Stefan Th. Gries

Keyword(s):

Statistical Measure ◽

Dimensional Approach ◽

Likelihood Ratios ◽

New Approach ◽

Text Type ◽

Corpus Linguistic ◽

Log Likelihood ◽

British National Corpus ◽

Linguistic Approaches ◽

National Corpus

A widely-used method in corpus-linguistic approaches to discourse analysis, register/text type/genre analysis, and educational/curriculum questions is that of keywords analysis, a simple statistical method aiming to identify words that are key to, i.e. characteristic for, certain discourses, text types, or topic domains. The vast majority of keywords analyses relied on the same statistical measure that most collocation studies are using, the log-likelihood ratio, which is performed on frequencies of occurrence in two corpora under consideration. In a recent paper, Egbert and Biber (2019) advocated a different approach, one that involves computing log-likelihood ratios for word types based on the range of their distribution rather than their frequencies in the target and reference corpora under consideration. In this paper, I argue that their approach is a most welcome addition to keywords analysis but can still be profitably extended by utilizing both frequency and dispersion for keyness computations. I am presenting a new two-dimensional approach to keyness and exemplifying it on the basis of the Clinton-Trump Corpus and the British National Corpus.

Download Full-text

Language assessment and the inseparability of lexis and grammar: Focus on the construct of speaking

Language Testing ◽

10.1177/0265532217711431 ◽

2017 ◽

Vol 34 (4) ◽

pp. 477-492 ◽

Cited By ~ 1

Author(s):

Ute Römer

Keyword(s):

High Stakes ◽

Spoken English ◽

Communicative Functions ◽

Speaking Assessment ◽

Different Types ◽

British National Corpus ◽

Grammatical Structures ◽

Core Features ◽

National Corpus

This paper aims to connect recent corpus research on phraseology with current language testing practice. It discusses how corpora and corpus-analytic techniques can illuminate central aspects of speech and help in conceptualizing the notion of lexicogrammar in second language speaking assessment. The description of speech and some of its core features is based on the 1.8-million-word Michigan Corpus of Academic Spoken English (MICASE) and on the 10-million-word spoken component of the British National Corpus (BNC). Analyses of word frequency and keyword lists are followed by an automatic extraction of different types of phraseological items that are particularly common in speech and serve important communicative functions. These corpus explorations provide evidence for the strong interconnectedness of lexical items and grammatical structures in natural language. Based on the assumption that the existence of lexicogrammatical patterns is of relevance for constructs of speaking tests, the paper then reviews rubrics of popular high-stakes speaking tests and critically discusses how far these rubrics capture the central aspects of spoken language identified in the corpus analyses as well as the centrality of phraseology in language. It closes with recommendations for speaking assessment in the light of this characterization of real-world spoken lexicogrammar.

Download Full-text

The Relative Incident Rate Ratio Effect Size for Count-Based Impact Evaluations: When an Odds Ratio is Not an Odds Ratio

Journal of Quantitative Criminology ◽

10.1007/s10940-021-09494-w ◽

2021 ◽

Cited By ~ 2

Author(s):

David B. Wilson

Keyword(s):

Effect Size ◽

Odds Ratio ◽

Rate Ratio ◽

Ratio Effect ◽

Incident Rate Ratio ◽

Incident Rate

Download Full-text

Diachronic corpus analysis of stance markers in research articles: The field of applied linguistics

Cogent Arts and Humanities ◽

10.1080/23311983.2021.1872165 ◽

2021 ◽

Vol 8 (1) ◽

pp. 1872165

Author(s):

Shirin Rezaei ◽

Davud Kuhi ◽

Mahnaz Saeidi

Keyword(s):

Applied Linguistics ◽

Corpus Analysis ◽

Research Articles

Download Full-text

Variability and functions of lexical bundles in research articles of applied linguistics and pharmaceutical sciences

Journal of English for Academic Purposes ◽

10.1016/j.jeap.2021.100968 ◽

2021 ◽

Vol 50 ◽

pp. 100968

Author(s):

Junqiang Ren

Keyword(s):

Applied Linguistics ◽

Research Articles ◽

Pharmaceutical Sciences ◽

Lexical Bundles

Download Full-text

“That’s well good”: A Re-emergent Intensifier in Current British English

Journal of English Linguistics ◽

10.1177/0075424220979143 ◽

2020 ◽

pp. 007542422097914

Author(s):

Karin Aijmer

Keyword(s):

Social Class ◽

Fourteenth Century ◽

Social Factors ◽

British English ◽

Discourse Marker ◽

Time Gap ◽

British National Corpus ◽

Semantic Types ◽

Over Time ◽

National Corpus

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.

Download Full-text

Significance tests: Necessary but not sufficient

Behavioral and Brain Sciences ◽

10.1017/s0140525x98521164 ◽

1998 ◽

Vol 21 (2) ◽

pp. 221-222

Author(s):

Louis G. Tassinary

Keyword(s):

Effect Size ◽

Scientific Community ◽

Statistical Significance ◽

Significance Tests ◽

Experimental Control

Chow (1996) offers a reconceptualization of statistical significance that is reasoned and comprehensive. Despite a somewhat rough presentation, his arguments are compelling and deserve to be taken seriously by the scientific community. It is argued that his characterization of literal replication, types of research, effect size, and experimental control are in need of revision.

Download Full-text

Inclusion, Contrast and Polysemy in Dictionaries: The Relationship between Theory, Language Use and Lexicographic Practice

Research in Language ◽

10.1515/rela-2015-0001 ◽

2014 ◽

Vol 12 (4) ◽

pp. 319-340

Author(s):

Anu Koskela

Keyword(s):

Language Use ◽

Lexical Item ◽

British National Corpus ◽

Lexical Items ◽

The Relationship ◽

National Corpus

This paper explores the lexicographic representation of a type of polysemy that arises when the meaning of one lexical item can either include or contrast with the meaning of another, as in the case of dog/bitch, shoe/boot, finger/thumb and animal/bird. A survey of how such pairs are represented in monolingual English dictionaries showed that dictionaries mostly represent as explicitly polysemous those lexical items whose broader and narrower readings are more distinctive and clearly separable in definitional terms. They commonly only represented the broader readings for terms that are in fact frequently used in the narrower reading, as shown by data from the British National Corpus.

Download Full-text