lexical data
Recently Published Documents


TOTAL DOCUMENTS

126
(FIVE YEARS 40)

H-INDEX

10
(FIVE YEARS 2)

2021 ◽  
Vol 16 ◽  
pp. 42-48
Author(s):  
Rūta Petrauskaitė ◽  
Virginijus Dadurkevičius
Keyword(s):  

In the paper the method is presented how to update traditional digitalised dictionaries based on comparison of the dictionary lemmas and a big corpus. Hunspell platform is used for generation of all the word forms from the dictionary lemmas. 6th edition of The Dictionary of Modern Lithuanian was chosen for its comparison with the lexical data from The Joint Corpus of Lithuanian. The outcome of the comparison was two lists of non-overlapping lexis: the list of the dictionary lemmas unused in the present-day Lithuanian and the list of the dictionary gaps, i.e., frequently used words and word forms ignored by the dictionary. The latter is discussed in greater detail to give lexicographers a clue for updates.


2021 ◽  
Vol 50 (2) ◽  
pp. 285-325
Author(s):  
Sifra Van Acker ◽  
Sara Pacchiarotti ◽  
Edmond De Langhe ◽  
Koen Bostoen

Lexical data has been key in attempts to reconstruct the early history of the banana (Musa sp.) in Africa. Previous language-based approaches to the introduction and dispersal of this staple crop of Asian origin have suffered from the absence of well-established genealogical classifications and inadequate historical-linguistic analysis. We therefore focus in this article on West-Coastal Bantu (WCB), one specific branch within the Bantu family whose genealogy and diachronic phonology are well established. We reconstruct three distinct banana terms to Proto-West-Coastal Bantu (PWCB), i.e. *dɪ̀‑ŋkòndò/*mà‑ŋkòndò ‘plantain’, *dɪ̀‑ŋkò/*mà‑ŋkò ‘plantain’ and *kɪ̀‑túká/*bì‑túká ‘bunch of bananas’. From this new historical-linguistic evidence we infer that AAB Plantains, one of Africa’s two major cultivar subgroups, already played a key role in the subsistence economy of the first Bantu speakers who assumedly migrated south of the rainforest around 2500 years ago. We furthermore analyze four innovations that emerged after WCB started to spread from its interior homeland in the Kasai-Kamtsha region of Congo-Kinshasa towards the Atlantic coast, i.e. dɪ̀‑kòndè ‘plantain’, kɪ̀‑tébè ‘starchy banana’, banga ‘False Horn plantain’, and dɪ̀‑tòtò ‘sweet banana’. Finally, we assess the historical implications of these lexical retentions and innovations both within and beyond WCB and sketch some perspectives for future lexicon-based banana research.


2021 ◽  
pp. 99-114
Author(s):  
Franz Manni ◽  
John Nerbonne

Gabon is an African country located very close to the homeland of Bantu languages (Cameroon). Starting about 5,000 years ago, Bantu-speaking populations diffused into almost all sub-Saharan Africa. By processing with computational linguistic methods (Levenshtein distance) two independently collected lexical data sets recording the pronunciation of 88 and 158 words in more than 50 linguistic varieties spoken in Gabon, we obtained a numerical classification of the major linguistic groups. We compared this classification to those available based on historical linguistics methods (cognate-sharing defined by experts), and found them to overlap, which indicates that the two methods capture the same signal of linguistic difference (and relatedness). To focus on the historical relatedness between major linguistic clusters, we controlled for the linguistic similarity related to contact, proportional to geographic vicinity, and suggested that the first Bantu-speaking groups to people Gabon where those speaking KOTA-KELE (B20) languages. The other varieties concern five different immigration waves (B10, B30, B40, B50-B60-B70—Guthrie nomenclature) that penetrated Gabon later in history. To conclude, we suggest a peopling scenario that incorporates available paleoclimatic, archaeological, and population genetic evidence.


2021 ◽  
Author(s):  
Johann-Mattis List ◽  
Robert Forkel ◽  
Simon J. Greenhill ◽  
Christoph Rzymski ◽  
Johannes Englisch ◽  
...  

Abstract The past decades have seen substantial growth in digital data on the world's languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, the majority of published datasets lack standardization which makes their comparison difficult. Here, we present the first step to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that increase the FAIRness of linguistic data. We test the Lexibank workflow on a collection of 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.


2021 ◽  
Vol 133 (3) ◽  
pp. 346-360
Author(s):  
Jonathan Thambyrajah

Abstract The Hebrew word אֶלְגָּבִישׁ has typically been understood as referring to hail. This presents a lexical problem, given that all of its apparent cognates appear to refer to rock. Based on a reanalysis of existing lexical data with the inclusion of new cognates and a new analysis of the imagery contained within Ezekiel 13 and Ezekiel 38, this study proposes that Hebrew אֶלְגָּבִישׁ, Akkadian algamešu, Ugaritic, a͗lgbṯ, Egyptian, i͗rḳbs, and other related words all derive from Egyptian i͗nr-km.


2021 ◽  
Vol 137 (3) ◽  
pp. 925-932
Author(s):  
Jean-Pierre Chambon ◽  
Jean Germain

Abstract The comparison of the lexical data gathered by the LEI and the medieval and contemporary anthroponymic data gathered by PatRom makes it possible to measure, on a concrete example (the delexical personal names formed on continuators of regional Proto-Romance */karne-lakˈsare/ ‘Shrove Tuesday’), the contributions, modest but not negligible, of anthroponymy to historical lexicology.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Yizhou He

English writing is conducive to the online communication and communication of language; the current diagnosis system of English writing is difficult to accurately find and diagnose the wrong words, which leads to a low diagnosis rate of wrong words in English writing system. To solve this problem, this paper designs an intelligent diagnosis system for English writing based on data feature extraction and fusion. First of all, B/S architecture is introduced on the basis of the conventional intelligent diagnosis system structure of English writing, which makes up for the problem that the C/S mode is prone to diagnostic errors. Secondly, the features of English lexical data are extracted and fused to provide better input for the diagnostic model, which effectively solves the problems of complex vocabulary and feature redundancy in English writing. The simulation results show that the proposed intelligent diagnosis system for English writing has higher diagnostic accuracy and faster query speed.


Corpora ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. 205-236
Author(s):  
Evandro L.T.P. Cunha ◽  
Søren Wichmann

When exploring diachronic corpora, it is often beneficial for linguists to pinpoint not only the first or the last attestation dates of certain linguistic items, but also the moments in which they become more strongly established in the corpus or, conversely, the moments in which they, despite still being part of the language, become obsolete. In this paper, we propose an algorithm to assist the identification of such periods based on the frequency of items in a corpus. Our simple and generalisable algorithm can be used for the investigation of any linguistic item in any corpus which is divided into time-frames. We also demonstrate the applicability of our method using lexical data from the Corpus of Historical American English (coha), providing case studies on the statistics and characteristics of words that appear in or disappear from this corpus in different periods.


2021 ◽  
Vol 12 ◽  
Author(s):  
Gerhard Jäger ◽  
Johannes Wahle

In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated from the data. Our approach involves three steps: First, distributions of phylogenies are inferred from lexical data. Second, these phylogenies are used as part of a statistical model to estimate transition rates between parameter states. Finally, the long-term equilibrium of the resulting Markov process is computed. As a case study, we investigate a series of potential word-order correlations across the languages of the world.


Sign in / Sign up

Export Citation Format

Share Document