Managing Historical Linguistic Data for Computational Phylogenetics and Computer-Assisted Language Comparison

<p>In order to develop effective teaching methods and computer-assisted language teaching systems for learners of English as a foreign language who need to study the basic linguistic competences for writing, pronunciation, reading, and listening, it is necessary to first investigate which vocabulary and grammar they have or have not yet learned. Identifying such vocabulary and grammar requires a learner corpus for analyzing the accuracy and fluency of learners’ linguistic competences. However, it is difficult to use previous learner corpora for this purpose because they have not compiled all the types of linguistic data that we need. Therefore, this study aimed to solve this problem by designing and developing a new learner corpus that compiles linguistic data regarding the accuracy and fluency of the four basic linguistic competences of writing, pronunciation, reading, and listening. The reliability and validity of the learner corpus were partially confirmed, and practical application of the learner corpus is reported here as case studies.</p>

Download Full-text

Le Peuplement des Grassfields: Recherches Archeologiquesdans L’ouest du Cameroun

Afrika Focus ◽

10.1163/2031356x-01401005 ◽

1998 ◽

Vol 14 (1) ◽

pp. 17-36

Author(s):

Philippe Lavachery

Keyword(s):

Historical Linguistic ◽

Point Of View ◽

Stone Age ◽

The West ◽

Hunter Gatherers ◽

Rock Shelter ◽

Bantu Languages ◽

Linguistic Data ◽

Late Stone Age ◽

Metal Age

The Settlement of the Grassfields: Archeological Research in the West of Cameroon Until recently the Grassfields (Western Cameroon), cradle of the Bantu languages, were an unknown zone from an archaeological point of view. The excavations of Shum Laka rock shelter offer the first chrono-cultural sequence for the area. After 20 millenniums of microlithic (Late Stone Age) traditions of hunter-gatherers, a new culture with macrolithic tools, pottery and arboriculture (Stone to Metal Age) slowly developed from 6000 BC onwards. Correlation with palaeo-climatic and historical linguistic data suggests that proto-Benue-Congo and, later, proto-Bantu speakers could have been involved in these industries.

Download Full-text

Amalgamating Knowledge, Translating Empire

Mining Language ◽

10.5149/northcarolina/9781469654386.003.0009 ◽

2020 ◽

pp. 229-258

Author(s):

Allison Margaret Bigelow

Keyword(s):

Historical Linguistic ◽

Seventeenth Century ◽

Technology Transfer ◽

Indigenous Knowledge ◽

New Method ◽

Colonial Mexico ◽

Central Mexico ◽

Linguistic Data ◽

Silver Mining ◽

The Rich

This chapter introduces the final section of the book, silver, by outlining the development of silver mining and refining in colonial Mexico and Perú. It pays special attention to the sixteenth-century technology transfer of amalgamation methods from central Mexico to Alto Perú, especially the rich deposits of the Cerro Rico of Potosí. By combining historical linguistic data and case studies of the translation and mistranslation of key technical terms used in seventeenth-century Andean metallurgy, as written in colonial sources that denied the sophistication of Indigenous science and technology, this chapter proposes a new method to document Indigenous knowledge production.

Download Full-text

A Matter of Metals: Finno-Ugric and Northern Iranian

Iran and the Caucasus ◽

10.1163/1573384x-20200205 ◽

2020 ◽

Vol 24 (2) ◽

pp. 196-215

Author(s):

Paolo Ognibene

Keyword(s):

Historical Linguistic ◽

First Century ◽

Northern Caucasus ◽

Linguistic Data ◽

The Third ◽

The Way

Vsevolod Miller in the third part of his Ossetic Studies considered the names of the metals both in Iron and Digoron, with particular reference to those of Finno-Ugric origin, in order to determine the way followed by the Alans to reach the Northern Caucasus in the first century A.D. In this paper Miller's theory is examined in the light of the historical linguistic data currently available.

Download Full-text

Phylogenetic linguistic evidence and the Dene-Yeniseian homeland

Diachronica ◽

10.1075/dia.17038.yan ◽

2020 ◽

Vol 37 (3) ◽

pp. 410-446

Author(s):

Igor Yanovich

Keyword(s):

Best Practice ◽

Phylogenetic Analyses ◽

Careful Examination ◽

Current Evidence ◽

Methodological Issues ◽

Linguistic Data ◽

Folklore Studies ◽

Northeastern Siberia ◽

Linguistic Evidence ◽

Computational Phylogenetics

Abstract Sicoli & Holton (2014) (PLoS ONE 9:3, e91722) use computational phylogenetics to argue that linguistic data from the putative, but likely, Dene-Yeniseian macro-family are better compatible with a homeland in Beringia (i.e., northeastern Siberia plus northwestern Alaska) than with one in central Siberia or deeper Asia. I show that a more careful examination of the data invalidates this conclusion: in fact, linguistic data do not support Beringia as the homeland. In the course of showing this, I discuss, without requiring a deep mathematical background, a number of methodological issues concerning computational phylogenetic analyses of linguistic data and drawing inferences from them. The aim is to contribute to making computational phylogenetics less of a black box for historical linguists. I conclude with a brief overview of the current evidence bearing on the Dene-Yeniseian homeland from linguistics, archaeology, folklore studies and genetics, and suggest current best practice for linguistic phylogenetics, the use of which would have helped to avoid some of the problems in Sicoli and Holton’s Dene-Yeniseian study, and in turn the percolation of those problems into subsequent synthetic interdisciplinary research.

Download Full-text

CLICS2: An improved database of cross-linguistic colexifications assembling lexical data with the help of cross-linguistic data formats

Linguistic Typology ◽

10.1515/lingty-2018-0010 ◽

2018 ◽

Vol 22 (2) ◽

pp. 277-306 ◽

Cited By ~ 3

Author(s):

Johann-Mattis List ◽

Simon J. Greenhill ◽

Cormac Anderson ◽

Thomas Mayer ◽

Tiago Tresoldi ◽

...

Keyword(s):

Data Aggregation ◽

Reliable Data ◽

Computer Assisted ◽

Semantic Change ◽

Current Form ◽

Linguistic Data ◽

Data Formats ◽

Semantic Associations ◽

Novel Approaches ◽

Lexical Data

Abstract The Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns. In its current form, it has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations, ranging from studies on semantic change, patterns of conceptualization, and linguistic paleontology. But CLICS has also been criticized for obvious shortcomings, ranging from the underlying dataset, which still contains many errors, up to the limits of cross-linguistic colexification studies in general. Building on recent standardization efforts reflected in the Cross-Linguistic Data Formats initiative (CLDF) and novel approaches for fast, efficient, and reliable data aggregation, we have created a new database for cross-linguistic colexifications, which not only supersedes the original CLICS database in terms of coverage but also offers a much more principled procedure for the creation, curation and aggregation of datasets. The paper presents the new database and discusses its major features.

Download Full-text

Tracing the origins of a set of discourse particles

Journal of Historical Pragmatics ◽

10.1075/jhp.6.2.04lin ◽

2005 ◽

Vol 6 (2) ◽

pp. 211-236 ◽

Cited By ~ 13

Author(s):

Jan K. Lindström ◽

Camilla Wide

Keyword(s):

Historical Linguistic ◽

Historical Data ◽

Linguistic Data ◽

Discourse Particles ◽

Functional Differences ◽

Old Swedish ◽

Present Tense ◽

Large Corpus ◽

Historical Origins

This paper investigates the historical origins, both syntactic and functional, of a set of discourse particles commonly used in present-day spoken Swedish: hör du ‘(you) listen’, vet du ‘you know’, ser du ‘you see’, and förstår du ‘you understand’. From a synchronic perspective, the particles seem to be a morpho-syntactically unified phenomenon, and have been treated as such in earlier linguistic works. However, there is no diachronic account of these particles. This paper presents a number of hypotheses concerning the syntactic and functional sources of the discourse particles; we also evaluate the hypotheses against the background of historical linguistic data collected from Old Swedish, Middle Swedish, and Modern Swedish sources. The Modern Swedish period is covered by a large corpus of plays from the 1700s to the late 1900s. Comparisons are also made to Old and Modern Icelandic data. The historical data show that the particle hör du is of imperative, functionally directive origin, while the rest of the particles include a verb in present tense indicative, thus presumably originating from minimal clauses with a declarative or an interrogative function. Hence, historical formal and functional differences are hidden behind the apparent uniform present-day forms and functions of the discourse particles.

Download Full-text

A Pipeline for Computational Historical Linguistics

Language Dynamics and Change ◽

10.1163/221058211x570358 ◽

2011 ◽

Vol 1 (1) ◽

pp. 89-127 ◽

Cited By ~ 22

Author(s):

Lydia Steiner ◽

Michael Cysouw ◽

Peter Stadler

Keyword(s):

Historical Linguistic ◽

South America ◽

Molecular Phylogenetics ◽

Historical Linguistics ◽

Comparative Method ◽

Current Approach ◽

The Caucasus ◽

Linguistic Data ◽

Linguistic Research ◽

Lexical Data

AbstractThere are many parallels between historical linguistics and molecular phylogenetics. In this paper we describe an algorithmic pipeline that mimics, as closely as possible, the traditional workflow of language reconstruction known as the comparative method. The pipeline consists of suitably modified algorithms based on recent research in bioinformatics, which are adapted to the specifics of linguistic data. This approach can alleviate much of the laborious research needed to establish proof of historical relationships between languages. Equally important to our proposal is that each step in the workflow of the comparative method is implemented independently, so language specialists have the possibility to scrutinize intermediate results. We have used our pipeline to investigate two groups of languages, the Tsezic languages of the Caucasus and the Mataco-Guaicuruan languages of South America, based on the lexical data from the Intercontinental Dictionary Series (IDS). The results of these tests show that the current approach is a viable and useful extension to historical linguistic research.

Download Full-text

Topological mapping for visualisation of high-dimensional historical linguistic data

10.1075/cilt.356.14moi ◽

2021 ◽

pp. 210-224

Author(s):

Hermann Moisl

Keyword(s):

Historical Linguistic ◽

High Dimensional ◽

Linguistic Data ◽

Topological Mapping

Download Full-text

Romanization and Latinization of the Roman Empire in the light of data in the Computerized Historical Linguistic Database of Latin Inscriptions of the Imperial Age

Journal of Latin Linguistics ◽

10.1515/joll-2021-2016 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Béla Adamik

Keyword(s):

Historical Linguistic ◽

Roman Empire ◽

Comparative Analysis ◽

Complex Problem ◽

Positive Outcomes ◽

Linguistic Data ◽

Multi Level ◽

Gallia Narbonensis ◽

Level Analysis ◽

Over Time

Abstract The present study demonstrates that the process of linguistic Romanization, i.e. Latinization of the Roman Empire, is traceable by the data of the Computerized Historical Linguistic Database of Latin Inscriptions of the Imperial Age (LLDB). A multi-level analysis of linguistic and non-linguistic data in the LLDB has shown that Latinization, i.e. the spread of spoken or vulgar Latin, became more and more intensive over time in all concerned provinces (i.e. Lusitania, Gallia Narbonensis, Venetia et Histria, Dalmatia, Moesia, Pannonia, and Britannia), although to a varying degree in each. What is more, in many aspects of the investigation, it was possible to find differences between the selected provinces of the Roman Empire corresponding mostly to the future Romance (both negative and positive) outcomes of the respective areas. All in all, the analysis of data of the LLDB database can contribute to solving the complex problem of Latinization, and is a lot more appropriate for this purpose than a simple comparative analysis of epigraphic corpora of the selected provinces.

Download Full-text