A new approach to (key) keywords analysis: Using frequency, and now also dispersion

Stefan Th. Gries

doi:10.32714/ricl.09.02.02

A new approach to (key) keywords analysis: Using frequency, and now also dispersion

Research in Corpus Linguistics ◽

10.32714/ricl.09.02.02 ◽

2021 ◽

Vol 9 (2) ◽

pp. 1-33

Author(s):

Stefan Th. Gries

Keyword(s):

Statistical Measure ◽

Dimensional Approach ◽

Likelihood Ratios ◽

New Approach ◽

Text Type ◽

Corpus Linguistic ◽

Log Likelihood ◽

British National Corpus ◽

Linguistic Approaches ◽

National Corpus

A widely-used method in corpus-linguistic approaches to discourse analysis, register/text type/genre analysis, and educational/curriculum questions is that of keywords analysis, a simple statistical method aiming to identify words that are key to, i.e. characteristic for, certain discourses, text types, or topic domains. The vast majority of keywords analyses relied on the same statistical measure that most collocation studies are using, the log-likelihood ratio, which is performed on frequencies of occurrence in two corpora under consideration. In a recent paper, Egbert and Biber (2019) advocated a different approach, one that involves computing log-likelihood ratios for word types based on the range of their distribution rather than their frequencies in the target and reference corpora under consideration. In this paper, I argue that their approach is a most welcome addition to keywords analysis but can still be profitably extended by utilizing both frequency and dispersion for keyness computations. I am presenting a new two-dimensional approach to keyness and exemplifying it on the basis of the Clinton-Trump Corpus and the British National Corpus.

Download Full-text

A CONTRASTIVE ACCOUNT OF PHASE VERBS BEGIN AND START IN ENGLISH AND SERBIAN

Nasledje Kragujevac ◽

10.46793/naskg2148.203m ◽

2021 ◽

Vol 18 (48) ◽

pp. 203-2018

Author(s):

Nataša Milivojević ◽

Keyword(s):

Argument Structure ◽

American English ◽

Traditional View ◽

Parallel Corpus ◽

British National Corpus ◽

Linguistic Approaches ◽

National Corpus ◽

Additional Phase ◽

Aspectual Meaning

The paper focuses on contrastive semantics of phase verbs or aspectualizers in English and Serbian, taking into account both typical and atypical phase verbs. Following Piper (Piper et al. 2005), we adopt the class of atypical aspectalizers in Serbian which are primarily lexical verbs but yield an aspectual meaning when combined with an aspectual complement. We specifically consider phase verbs BEGIN and START in English and their Serbian equivalents POČETI and KRENUTI. Alternatively to both traditional and more contemporary linguistic approaches to phase verbs in English and Serbian, we claim that the true overall linguistic equivalent of the English phase verb START is not Serbian phase verb POČETI, but another, atypical aspectualizer KRENUTI. We base this claim on the equivalency of contrastive syntactic complementation of the inspected aspectualizers, as well as their argument structure, taking into account Freed’s (Freed 1979: 31) traditional view of the aspectual event, where the event is segmental, containing the onset, the nucleus, and the coda. Freed’s account is combined with the lexical-projectionist model proposed by Levin (Levin 1993) alongside the grammar of constructions (Goldberg 1995, 2006). Additionally, alternatively to the generally accepted claim that all phase verbs in Serbian as a rule take imperfective verbs as their complements (Ivić 1970: 44), we claim that KRENUTI with additional, phase-related meanings frequently and productively allows for perfec- tive complementation. The present analysis is backed up by a parallel corpus of English and Serbian sentences compiled from the British National Corpus, the Corpus of Contemporary American English, and the Corpus of Contemporary Serbian Language.

Download Full-text

Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2015-0030 ◽

2018 ◽

Vol 14 (1) ◽

pp. 133-167 ◽

Cited By ~ 12

Author(s):

Punjaporn Pojanapunya ◽

Richard Watson Todd

Keyword(s):

Effect Size ◽

Odds Ratio ◽

Applied Linguistics ◽

Academic Disciplines ◽

Research Articles ◽

Keyword Analysis ◽

Log Likelihood ◽

British National Corpus ◽

National Corpus

AbstractKeyword analysis is used in a range of sub-disciplines of applied linguistics from genre analyses to critically-oriented studies for different purposes ranging from producing a general characterization of a genre to identifying text-specific ideological issues. This study compares the use of log-likelihood (LL), a probability statistic, and odds ratio (OR), an effect size statistic, for keyword identification and argues that the two methods produce different keywords applicable to research focusing on different purposes. Through two case studies, keyword analyses of advance fee scams against the British National Corpus and research articles in applied linguistics against research articles from other academic disciplines, we show that both the LL and OR keywords concern the aboutness of the corpus, but differ in their specificity and pervasiveness through the corpus. LL highlights words which are relatively common in general use serving genre purposes, whereas OR highlights more specialized words serving critically-oriented purposes. Methodological and practical contributions to keyword analysis are discussed.

Download Full-text

Improved Decoding for LDPC Coded Modulation Systems Using Averaged Log-Likelihood Ratios

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2007.01699 ◽

2011 ◽

Vol 30 (8) ◽

pp. 1845-1848

Author(s):

Ping Huang ◽

Ming Jiang ◽

Chun-ming Zhao

Keyword(s):

Coded Modulation ◽

Likelihood Ratios ◽

Log Likelihood

Download Full-text

“That’s well good”: A Re-emergent Intensifier in Current British English

Journal of English Linguistics ◽

10.1177/0075424220979143 ◽

2020 ◽

pp. 007542422097914

Author(s):

Karin Aijmer

Keyword(s):

Social Class ◽

Fourteenth Century ◽

Social Factors ◽

British English ◽

Discourse Marker ◽

Time Gap ◽

British National Corpus ◽

Semantic Types ◽

Over Time ◽

National Corpus

Well has a long history and is found as an intensifier already in older English. It is argued that diachronically well has developed from its etymological meaning (‘in a good way’) on a cline of adverbialization to an intensifier and to a discourse marker. Well is replaced by other intensifiers in the fourteenth century but emerges in new uses in Present-Day English. The changes in frequency and use of the new intensifier are explored on the basis of a twenty-year time gap between the old British National Corpus (1994) and the new Spoken British National Corpus (2014). The results show that well increases in frequency over time and that it spreads to new semantic types of adjectives and participles, and is found above all in predicative structures with a copula. The emergence of a new well and its increase in frequency are also related to social factors such as the age, gender, and social class of the speakers, and the informal character of the conversation.

Download Full-text

New Approach to the Validity of the Alcohol Use Disorders Identification Test: Stratum-Specific Likelihood Ratios Analysis

Alcoholism Clinical and Experimental Research ◽

10.1097/01.alc.0000159189.56671.ec ◽

2005 ◽

Vol 29 (4) ◽

pp. 602-608 ◽

Cited By ~ 23

Author(s):

Chun-Hsin Chen ◽

Wei J. Chen ◽

Andrew T. A. Cheng

Keyword(s):

Alcohol Use ◽

Alcohol Use Disorders ◽

Likelihood Ratios ◽

New Approach ◽

Identification Test

Download Full-text

Inclusion, Contrast and Polysemy in Dictionaries: The Relationship between Theory, Language Use and Lexicographic Practice

Research in Language ◽

10.1515/rela-2015-0001 ◽

2014 ◽

Vol 12 (4) ◽

pp. 319-340

Author(s):

Anu Koskela

Keyword(s):

Language Use ◽

Lexical Item ◽

British National Corpus ◽

Lexical Items ◽

The Relationship ◽

National Corpus

This paper explores the lexicographic representation of a type of polysemy that arises when the meaning of one lexical item can either include or contrast with the meaning of another, as in the case of dog/bitch, shoe/boot, finger/thumb and animal/bird. A survey of how such pairs are represented in monolingual English dictionaries showed that dictionaries mostly represent as explicitly polysemous those lexical items whose broader and narrower readings are more distinctive and clearly separable in definitional terms. They commonly only represented the broader readings for terms that are in fact frequently used in the narrower reading, as shown by data from the British National Corpus.

Download Full-text

On quantization of log-likelihood ratios for maximum mutual information

2015 IEEE 16th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) ◽

10.1109/spawc.2015.7227051 ◽

2015 ◽

Cited By ~ 6

Author(s):

Andreas Winkelbauer ◽

Gerald Matz

Keyword(s):

Mutual Information ◽

Likelihood Ratios ◽

Log Likelihood ◽

Maximum Mutual Information

Download Full-text

Arab women in news headlines during the Arab Spring: Image and perception in Germany

Discourse & Communication ◽

10.1177/1750481317714114 ◽

2017 ◽

Vol 11 (5) ◽

pp. 515-538 ◽

Cited By ~ 1

Author(s):

Zahra Mustafa-Awad ◽

Monika Kirner-Ludwig

Keyword(s):

Discourse Analysis ◽

News Media ◽

Arab Spring ◽

Arab Women ◽

Corpus Linguistic ◽

University Courses ◽

The Arab Spring ◽

Linguistic Approaches ◽

The Relationship ◽

News Headlines

This article reports on the first stage of a research project on German university students’ conceptualization of Arab women and to what extent it is affected by the latters’ representation in the Western press during the Arab Spring. We combined discourse analysis and corpus-linguistic approaches to investigate the relationship between lexical items used by the students to express their attitudes toward Arab women and those featuring in news headlines about them published in British, American, and German news media. Results show that the portrayal of Arab women in Western news headlines has a clear impact on German students’ opinions of them. The findings also show that our participants tend to be aware of this effect, which could be partly due to their familiarity with discourse analysis as students of linguistics. These results have implications for incorporating media education systematically in general university courses.

Download Full-text

A Corpora-Based Analysis of Rely on and Depend on

Journal of Critical Studies in Language and Literature ◽

10.46809/jcsll.v3i1.119 ◽

2021 ◽

Vol 3 (1) ◽

pp. 9-21

Author(s):

Namkil Kang

Keyword(s):

Comparative Analysis ◽

American English ◽

The Other ◽

Information State ◽

Other Hand ◽

British National Corpus ◽

National Corpus

The ultimate goal of this paper is to provide a comparative analysis of rely on and depend on in the Corpus of Contemporary American English and the British National Corpus. The COCA clearly shows that the expression rely on government is the most preferred by Americans, followed by rely on people, and rely on data. The COCA further indicates that the expression depend on slate is the most preferred by Americans, followed by depend on government, and depend on people. The BNC shows, on the other hand, that the expression rely on others is the most preferred by the British, followed by rely on people, and rely on friends. The BNC further indicates that depend on factors and depend on others are the most preferred by the British, followed by depend on age, and depend on food. Finally, in the COCA, the nouns government, luck, welfare, people, information, state, fossil, water, family, oil, food, and things are linked to both rely on and depend on, but many nouns are not still linked to both of them. On the other hand, in the BNC, only the nouns state, chance, government, and others are linked to both rely on and depend on, but many nouns are not still linked to both rely on and depend on. It can thus be inferred from this that rely on is slightly different from depend on in its use.

Download Full-text

The Correlations Between Combinational Arrangements and Semantic Implications of Utterly in the British National Corpus

The Journal of Humanities and Social sciences 21 ◽

10.22143/hss21.12.6.25 ◽

2021 ◽

Vol 12 (6) ◽

pp. 349-360

Author(s):

Jungyull Lee

Keyword(s):

British National Corpus ◽

National Corpus

Download Full-text