Sources and steps of corpus lemmatization

Author(s):  
Laura García Fernández

Abstract This article describes the steps and results of the lemmatization of the derived anomalous verbs of Old English. The data have been retrieved from The Dictionary of Old English Web Corpus, searched through the lexical database from the Nerthus Project called Norna. The methodology comprises several steps combining automatic searches on the lemmatizer and manual revision. Part of the results, including the verbs starting with the letters A to H, are compared with the Dictionary of Old English, while the rest of the lemmas are checked with the standard Old English dictionaries (Clark-Hall, Sweet and Bosworth-Toller). The discussion leads to the conclusion that the lemmatization of the verbs of Old English, a language with a remarkable degree of spelling variation, requires considerable manual revision. However, the progressive improvement of automatic searches, based on the comparison of the initial results with the available lexicographical sources, minimizes the need for manual adjustment.

2010 ◽  
Vol 46 (2) ◽  
pp. 21-43 ◽  
Author(s):  
Elisa Torres

The Bases of Derivation of Old English Affixed Nouns: Status and Category The aim of this journal article is to carry out a complete analysis of the category, status and patterns of the bases of derivation of Old English affixal nouns. The results of the analysis are discussed in the light of the evolution from stem-formation to word-formation. The corpus of analysis of this research is based on data retrieved from the lexical database of Old English Nerthus, which contains 30170 predicates. 16694 out of these are nouns, of which 4115 are basic and 12579 qualify as non-basic. Within non-basic nouns there are 3488 affixed nouns (351 by prefixation and 3137 by suffixation) and 9091 compound nouns. The line of argumentation is that, under certain circumstances, the existence of more than one base available for the formation of a derivative does not reinforce the explanation of invariable bases; on the contrary, it goes in the direction of variable bases produced by inflectional processes and made ready for derivation. The following conclusions are reached. In the first place, the importance is underlined of formations on stems in Old English, involving, at least, nouns. Secondly, the analysis evidences that the importance of stem-formation in Old English might be higher than has been acknowledged by previous studies. If Old English made extensive use of words as bases of derivation, a single base should be available; if, on the contrary, Old English is still dependent on stem-formation, more than one base is likely to be found for a single derivative. Such alternative bases of derivation reflect stemformation that may result from inflectional means and be eventually used for derivational purposes.


Author(s):  
Margaret Laing ◽  
Roger Lass

This chapter demonstrates how the four main electronic resources created in the same tradition as A Linguistic Atlas of Late Mediæval English (LALME), i.e. LAEME, LALME itself (and its electronic version eLALME), A Linguistic Atlas of Older Scots (LAOS) and A Corpus of Narrative Etymologies from Proto-Old English to Early Middle English and accompanying Corpus of Changes (CoNE) can be used in tandem to support an investigation into the initial wh-cluster in words such as when, where, what, who, which. No fewer than 57 different spellings are found for this cluster, from the earliest attested Old English to ca 1500. The authors show how LAEME, eLALME, and LAOS provide the data that allow this spelling variation to be analysed as reflecting various scribal choices, whether determined by orthographic variation (including traditional contextual rules for the use of <v> or <u>), phonological variation, geographical variation, and/or diachronic variation. The final section showcases CoNE, and reconstructs a diachronic account on the basis of these spellings, revealing a coherent, if extremely complex, picture of lenitions, fortitions, and reversals.


2014 ◽  
Vol 67 (1) ◽  
pp. 77-94 ◽  
Author(s):  
Raquel Mateo Mendaza
Keyword(s):  

The aim of this article is to measure the indexes of productivity of the prefix ful- and the suffix -ful in Old English adjective formation. This analysis is based on Baayen’s framework, which comprises different measures on productivity. The major sources of the analysis are The Dictionary of Old English Corpus and the lexical database of Old English Nerthus. This study of productivity allows for a diachronic perspective on the evolution of these affixes from the Old English period to the present. The main conclusion drawn from this analysis is that the suffix -ful is more productive than its prefixal counterpart, which implies that more productive patterns are still maintained in Present-day English in contradistinction to the less productive ones.


2015 ◽  
Vol 13 ◽  
pp. 135
Author(s):  
Marta Tío Sáenz

This article compiles a list of lemmas of the second class weak verbs of Old English by using the latest version of the lexical database Nerthus, which incorporates the texts of the Dictionary of Old English Corpus. Out of all the inflecional endings, the most distinctive have been selected for lemmatization: the infinitive, the inflected infinitive, the present participle, the past participle, the second person present indicative singular, the present indicative plural, the present subjunctive singular, the first and third person of preterite indicative singular, the second person of the preterite indicative singular, the preterite indicative plural and the preterite subjunctive plural. When it is necessary to regularize, normalization is restricted to correspondences based on dialectal and diachronic variation. The analysis turns out a total of 1,064 lemmas of weak verbs from the second class.


2013 ◽  
Vol 48 (2-3) ◽  
pp. 27-54
Author(s):  
Roberto Torre Alonso ◽  
Darío Metola Rodríguez

ABSTRACT This paper takes issue with the lexicon of Old English and, more specifically, with the existence of closing suffixes in word-formation. Closing suffixes are defined as base suffixes that prevent further suffixation by word-forming suffixes (Aronoff & Furhop 2002: 455). This is tantamount to saying that this is a study in recursivity, or the formation of derivatives from derived bases, as in anti-establish-ment, which requires the attachment of the prefix anti- to the derived input establishment. The present analysis comprises all major lexical categories, that is, nouns, adjectives, verbs and adverbs and concentrates on suffixes because they represent the newest and the most productive process in Old English word-formation (Kastovsky 1992, 2006), as well as the set of morphemes that has survived into Present-day English without undergoing radical changes. Given this aim, the data retrieved from the lexical database of Old English Nerthus (www.nerthusproject.com) comprise 6,073 affixed (prefixed and suffixed) derivatives, including 3,008 nouns, 1,961 adjectives, 974 adverbs and 130 verbs. All of them have been analysed in order to isolate recursive formations.


2021 ◽  
Author(s):  
Yosra Hamdoun Bghiyel

This article aims to discuss the lemmatisation process of Old English adverbs inflected for the superlative from a corpus-based perspective. This study has been conducted on the basis of a semi-automatic methodology through which the inflectional forms have been automatically extracted from The York-Toronto-Helsinki Parsed Corpus of Old English Prose and The York Toronto-Helsinki Parsed Corpus of Old English Poetry whereas the task of assigning a lemma has been completed manually. The list of adverbial lemmas amounts to 1,755 and has been provided by the lexical database of Old English Nerthus. Additionally, the resulting lemmatised list has been checked against the lemmatised forms compiled by the Dictionary of Old English and Seelig’s (1930) work on Old English comparative and superlative adjectives and adverbs. Through this comparison, it has been possible to verify doubtful forms and incorporate new ones that are unattested by the YCOE. This pilot study has implemented for the first time a methodology for the lemmatisation of a non-verbal class and can be further applied to those categories that are still unlemmatised, namely nouns and adjectives.


Author(s):  
Merja Stenroos

This chapter uses a new resource, the Middle English Grammar Corpus (MEG-C), a corpus of 14th and 15th Century English texts, to answer an old question: it is possible to find traces of a systematic distinction between the reflexes of Old English e/ē and eo/ēo in Middle English? An investigation into the spelling variation found in 27 lexical items that contain a vowel representing Old English eo/ēo as well as the equivalent Old Norse element jó throws up a wide range of spellings, the vast majority of which show <e>/<ee>. Spellings that might suggest a rounded pronunciation are also fairly robustly present, however, particularly <eo>, with the Southwest Midlands as its core area. The second part of the investigation retrieves all words that were spelled with the digraph <eo>. The vast majority of these turn out to be reflexes of Old English eo/ēo, and almost all of them are localized to the Southwest Midlands. They occur either as reflexes of OE y/ȳ, or in unstressed syllables, or in words where <eo> follows <w> – three groups for which a rounded pronunciation would be plausible.


2018 ◽  
Vol 136 (2) ◽  
pp. 269-276
Author(s):  
Matti Kilpiö

AbstractThe main focus of this article is on a passage in Ælfric’s Catholic Homily I, 33 and its Latin source in Augustine’s Sermon 71. The correspondence between the Latin source text and Ælfric’s translation is exceptionally close, almost gloss-like. What is particularly striking is the occurrence of passives of possessive (ge)habban in the Old English, corresponding to passives of possessive habere in the source. In both Old English and Latin the expression of possession with the passives of both (ge)habban and habere is very rare. The Latin Trinitarian statement translated by Ælfric consists of three sentences which display a remarkable degree of parallelism at the level of syntax and lexis. This results in a compact statement consisting of parallel repeated elements, which not only establish differences between the three persons of the Godhead but also emphasise the essential unity underlying the Trinity. The article also briefly deals with another, syntactically more relaxed, formulation of the same Trinitarian statement occurring earlier in Augustine’s sermon and tentatively asks the question why Ælfric chose the more complex and unwieldy version with passives of habere as the base text for his translation.


Author(s):  
Raquel Mateo Mendaza

AbstractThis article measures the productivity index of the Old English suffixes-cund, -ful,and-iscas well as the prefixful-and checks the results against the diachronic evolution of the affixes. The frameworks brought to the discussion includeType frequencymeasurement, as well as productivity indexes proposed by Baayen (1992, 1993, 2009) and Trips (2009). The sources are both textual(The Dictionary of Old English Corpus)and lexicographical (the lexical database of Old EnglishNerthus).The conclusion drawn is that Baayen's (1992, 1993, 2002) index ofGlobal Productivityprovides the most consistent results with the diachronic evolution of the affixes.


Author(s):  
Raquel Mateo Mendaza

The aim of this article is to identify the Old English exponent for the semantic prime LIVE following the principles of the Natural Semantic Metalanguage theory (Wierzbicka 1996, Goddard & Wierzbicka 2002, Goddard 2011). The methodology applied in the study is based on previous research in Old English semantic primes. In these terms, a search for those Old English words conveying the meaning of the semantic prime LIVE is made. This search selects the verbs (ge)buan, drohtian, (ge)eardian, (ge)libban, and wunian as candidate words for prime exponent. Then, these verbs are analysed in terms of morphological, textual, semantic, and syntactic criteria. With this purpose, relevant information on these words has been gathered from different lexicographical and textual sources in Old English, such as the Dictionary of Old English, the Dictionary of Old English Corpus, and the lexical database of Old English Nerthus. After the analysis of these verbs, the conclusion is drawn that the Old English verb (ge)libban is selected as prime exponent, as it satisfies the requirements proposed by each criterion.


Sign in / Sign up

Export Citation Format

Share Document