A corpus-driven approach to formulaic language in English

2009 ◽  
Vol 14 (3) ◽  
pp. 275-311 ◽  
Author(s):  
Douglas Biber

The present study utilizes a corpus-driven approach to identify the most common multi-word patterns in conversation and academic writing, and to investigate the differing pattern types in the two registers. The paper first surveys the methodological characteristics of corpus-driven research and then contrasts the linguistic characteristics of two types of multi-word sequences: ‘multi-word lexical collocations’ (combinations of content words) versus ‘multi-word formulaic sequences’ (incorporating both function words and content words).
 Building on this background, the primary focus of the paper is an empirical investigation of the ‘patterns’ represented by multi-word formulaic sequences. It turns out that the multi-word patterns typical of speech are fundamentally different from those typical of academic writing: patterns in conversation tend to be fixed sequences (including both function words and content words). In contrast, most patterns in academic writing are formulaic frames consisting of invariable function words with an intervening variable slot that is filled by content words.

Author(s):  
Muhammad Azeem Abbas ◽  
Shiza Hammad ◽  
Gwo-Jen Hwang ◽  
Sharifullah Khan ◽  
Syed Mushhad Mustuzhar Gilani

2020 ◽  
Vol 33 (4) ◽  
pp. 417-442
Author(s):  
Kaja Dobrovoljc

Abstract In view of the pervasiveness of formulaic language in human communication and the growing awareness of its relevance to modern lexicography, this study presents a corpus-driven identification, analysis and comparison of dictionary-relevant formulaic sequences in reference corpora of written and spoken Slovenian. The sequences were identified using a semi-automatic approach, whereby the most frequently recurring word combinations in each corpus were ranked according to their statistical salience and manually inspected for formulaic expressions with lexicographic relevance. Despite its semantic heterogeneity, the resulting list illustrates the distinct characteristics of formulaic multi-word expressions, such as high frequency of usage, prevalent inclusion of grammatical words and common non-propositional meaning, especially in speech, where research revealed numerous understudied formulaic expressions related to interaction management and mitigation. The final evaluation of measures used in the identification process demonstrates their relative suitability for corpus-driven identification of dictionary-relevant formulaic expressions, with their precision varying in relation to corpus size and length of sequences under investigation.


2019 ◽  
Vol 2018 (1) ◽  
pp. 241
Author(s):  
Lewis Murray

For L2 learners, successful acquisition of formulaic sequences (FSs) is recognised as being valuable for academic writing. Studies suggest that cued output exercises requiring an evaluation effort may prove beneficial. The aim of this study was to examine the value of such exercises. Four classes in a Japanese university EAP programme were each assigned a different intervention over a 4-week period. Each intervention required a different degree of involvement with selected target FSs. Writing samples collected from participants before the intervention established no significant difference in target FS use between the groups. Postintervention data, drawn from the difference in individual participant’s pre- and posttest target FS use, revealed significantly increased use only from the group assigned exercises requiring the greatest involvement, suggesting that such exercises may be important for acquisition. These findings are discussed in relation to other studies concerning cued output and evaluation effort. 第二言語学習者のアカデミック・ライティング学習には、定型表現の習得が有益とされる。判断負荷のかかる手がかり提示型課題の効果を示唆した研究もある。そこで本研究は、そうした練習課題の有効性を検証するため、日本の大学のEAPコースで4週間にわたり、4つの通常授業クラスで各々異なる介入活動を行なった。各介入は、特定の定型表現に対し異なる度合いの関与を必用とした。介入前の授業参加者によるライティング・サンプルにおいては、グループ間の有意差は認められなかったが、介入後のデータでは、一つのグループでのみ、定型表現の使用に大幅な増加が認められた。ここからは、このグループの参加者が行なった練習問題に、より多くの判断作業量が含まれていたことが、定型表現の習得のために重要であった、という可能性が示唆される。こうした調査結果について、手がかり提示型課題と判断作業の問題を扱った他の研究との関係から、考察を行なった。


2018 ◽  
Vol 7 (2) ◽  
pp. 355-376 ◽  
Author(s):  
Ying Wang

Abstract Formulaic sequences (e.g. on the other hand, for example, at the same time) are pervasive in natural language use and play an important role in differentiating socially situated practices. This paper examines formulaic sequences signalling discourse organisation in academic ELF lectures from a disciplinary perspective. Most previous studies of this kind employ a frequency-based approach; however, the inherent limitations of the methodology (e.g. arbitrary operational criteria, difficulty in handling discontinuous units) mean that a great deal may have been overlooked. This may be particularly relevant to ELF communication, which involves a high degree of flexibility and adaptability. The present study aims to address this gap by taking a manual approach in the identification of formulaic sequences, continuous or discontinuous, in context. The results provide further evidence for disciplinary differences and variability in the use of formulaic language to signal discourse organisation by lecturers in academic ELF settings.


2021 ◽  
Author(s):  
Anna Siyanova

© 2015 2015 by De Gruyter Mouton. Many applied and corpus linguists entertain the idea of collocations, and other types of formulaic language, being processed as unanalysed, or holistic units. It has, indeed, been demonstrated that, due to their frequency and predictability, formulaic sequences are processed quantitatively faster than matched novel phrases. This finding implies an important role of phrasal frequency in language processing and highlights the contribution of entrenchment of a particular phrasal configuration in memory. This finding, however, cannot be taken to suggest that formulaic sequences are necessarily processed as unanalysed, or holistic units. The present paper reviews some of the recent studies and explains why a processing advantage observed for formulaic sequences over novel phrases should not be equated with holistic storage and processing. The present paper is not intended as an overview of the studies on on-line processing of formulaic language. For a comprehensive review of the method and findings specific to formulaic sequences, their on-line representation and processing, we direct an interested reader to Siyanova-Chanturia (2013) and Siyanova-Chanturia and Martinez (2014).


2015 ◽  
Vol 20 (4) ◽  
pp. 500-525 ◽  
Author(s):  
Sylvia Jaworska ◽  
Cedric Krummes ◽  
Astrid Ensslin

The aim of this paper is to contribute to learner corpus research into formulaic language in native and non-native German. To this effect, a corpus of argumentative essays written by advanced British students of German (WHiG) was compared with a corpus of argumentative essays written by German native speakers (Falko-L1). A corpus-driven analysis reveals a larger number of 3-grams in WHiG than in Falko-L1, which suggests that British advanced learners of German are more likely to use formulaic language in argumentative writing than their native-speaker counterparts. Secondly, by classifying the formulaic sequences according to their functions, this study finds that native speakers of German prefer discourse-structuring devices to stance expressions, whilst British advanced learners display the opposite preferences. Thirdly, the results show that learners of German make greater use of macro-discourse-structuring devices and cautious language, whereas native speakers favour micro-discourse structuring devices and tend to use more direct language.


Dementia ◽  
2011 ◽  
Vol 10 (4) ◽  
pp. 603-623 ◽  
Author(s):  
Camilla Lindholm ◽  
Alison Wray

Some types of formulaic (routine and familiar) language seem to remain fairly intact in people with language and memory disturbances, making it a useful tool for both testing language skills and supporting language retention and use. Proverbs can reasonably be considered a subset of formulaic language, and while it is known that the ability to understand proverbs is compromised in dementia, completing them ought to be relatively easy, if proverbs are stored holistically like other kinds of formulaic language. However, this study reports how three people with dementia often struggled to complete proverbs in a game used in a day-care centre to stimulate the memory and language skills. By examining their responses and relating them to the causes of formulaic language patterns, it is argued that these games are not as appropriate a tool for stimulating memory and language skills as might be first thought. Although they do provide a much-needed opportunity for sustained patient-carer interaction that transcends the basic delivery of physical care needs, the games contravene some of the guidelines offered by Orange (2001) regarding the best way to support people with Alzheimer’s Disease in constructive interaction.


2012 ◽  
Vol 32 ◽  
pp. 130-149 ◽  
Author(s):  
Magali Paquot ◽  
Sylviane Granger

Formulaic language is at the heart of corpus linguistic research, and learner corpus research (LCR) is no exception. As multiword units of all kinds (e.g., collocations, phrasal verbs, speech formulae) are notoriously difficult for learners, and corpus linguistic techniques are an extremely powerful way of exploring them, they were an obvious area for investigation by researchers from the very early days of LCR. In the first part of this article, the focus is on the types of learner corpus data investigated and the most popular method used to analyze them. The second section describes the types of word sequences analyzed in learner corpora and the methodologies used to extract them. In the rest of the article, we summarize some of the main findings of LCR studies of the learner phrasicon, distinguishing between co-occurrence and recurrence. Particular emphasis is also placed on the relationship between learners’ use of formulaic sequences and transfer from the learner's first language. The article concludes with some proposals for future research in the field.


Sign in / Sign up

Export Citation Format

Share Document