scholarly journals The classification of the Transeurasian languages

Author(s):  
Martine Robbeets

Even if the hypothesis of Transeurasian affiliation is gradually gaining acceptance, supporters do not coincide on the internal structure of the family. Over the last century, a range of different classifications has been proposed. While these proposals show some remarkable overlap, the position of the Tungusic branch in the family tree remains a recurrent issue. Here the best supportable tree for the Transeurasian family is inferred, notably a binary topology with a Japano-Koreanic and an Altaic branch, in which Tungusic is the first to split off from the Altaic branch. To this end, the power of classical historical-comparative linguistics is combined with computational Bayesian phylogenetic methods. In this way, a quantitative basis is introduced to test various competing hypotheses with regard to the internal structure of the Transeurasian family and to solve uncertainties associated with the application of the classical historical-comparative method.

Author(s):  
Devin Moore

AbstractCoahuitlán Totonac is spoken in Veracruz, Mexico, and has been variously ascribed to two different branches of the Totonacan family tree. While recent work has begun to bring empirical evidence to the internal structure of this family tree, there remain several important areas of disagreement, in addition to the disputed affiliation of Coahuitlán. This article informs the family tree and demonstrates that Coahuitlán belongs to the Northern branch using shared innovations and two computational methods. The comparative method seeks sets of shared innovations for evidence of subgrouping. This article presents proposed shared innovations in phonology, morphology, and lexicon, which fall into two sets, one belonging to the Sierra and Lowland branches, and the other belonging to the Northern. Coahuitlán Totonac overwhelmingly shares innovations found in Northern languages and lacks innovations found in Sierra. Two quantitative methods are also used to show that Coahuitlán groups groups closely with other Northern languages.


Diachronica ◽  
2011 ◽  
Vol 28 (3) ◽  
pp. 291-323 ◽  
Author(s):  
Michael Dunn ◽  
Niclas Burenhult ◽  
Nicole Kruspe ◽  
Sylvia Tufvesson ◽  
Neele Becker

This paper analyzes newly collected lexical data from 26 languages of the Aslian subgroup of the Austroasiatic language family using computational phylogenetic methods. We show the most likely topology of the Aslian family tree, discuss rooting and external relationships to other Austroasiatic languages, and investigate differences in the rates of diversification of different branches. Evidence is given supporting the classification of Jah Hut as a fourth top level subgroup of the family. The phylogenetic positions of known geographic and linguistic outlier languages are clarified, and the relationships of the little studied Aslian languages of Southern Thailand to the rest of the family are explored.


Diachronica ◽  
2013 ◽  
Vol 30 (3) ◽  
pp. 323-352 ◽  
Author(s):  
Kaj Syrjänen ◽  
Terhi Honkola ◽  
Kalle Korhonen ◽  
Jyri Lehtinen ◽  
Outi Vesakoski ◽  
...  

Encouraged by ongoing discussion of the classification of the Uralic languages, we investigate the family quantitatively using Bayesian phylogenetics and basic vocabulary from seventeen languages. To estimate the heterogeneity within this family and the robustness of its subgroupings, we analyse ten divergent sets of basic vocabulary, including basic vocabulary lists from the literature, lists that exclude borrowing-susceptible meanings, lists with varying degrees of borrowing-susceptible meanings and a list combining all of the examined items. The results show that the Uralic phylogeny has a fairly robust shape from the perspective of basic vocabulary, and is not dramatically altered by borrowing-susceptible meanings. The results differ to some extent from the ‘standard paradigm’ classification of these languages, such as the lack of firm evidence for Finno-Permian.


2020 ◽  
Vol 8 (13) ◽  
pp. 64-82
Author(s):  
Gbenga Fakuade ◽  
◽  
Lawal Tope Aminat ◽  
Adewale Rafiu ◽  
◽  
...  

This paper examined variation in Onko dialect using the family tree model and the corresponding comparative method as the theoretical tool. A wordlist of basic items and a designed frame technique were used to gather data for this study. The data were presented in tables and the analyses were done through descriptive statistics. The data were analyzed to determine variation at the phonological, syntactic and lexical levels. The study revealed differences between Standard Yoruba and Onko dialect as well as the variation therein. Two basic factors discovered to be responsible for variations in Onko are geography (distribution of Onko communities) and language contact. The paper established that Onko exhibits variations, which are however not significant enough to disrupt mutual intelligibility among the speakers, and thus all the varieties remain a single dialect.


2020 ◽  
Vol 5 (1) ◽  
pp. 39-53 ◽  
Author(s):  
Alexander Savelyev ◽  
Martine Robbeets

Abstract Despite more than 200 years of research, the internal structure of the Turkic language family remains subject to debate. Classifications of Turkic so far are based on both classical historical–comparative linguistic and distance-based quantitative approaches. Although these studies yield an internal structure of the Turkic family, they cannot give us an understanding of the statistical robustness of the proposed branches, nor are they capable of reliably inferring absolute divergence dates, without assuming constant rates of change. Here we use computational Bayesian phylogenetic methods to build a phylogeny of the Turkic languages, express the reliability of the proposed branches in terms of probability, and estimate the time-depth of the family within credibility intervals. To this end, we collect a new dataset of 254 basic vocabulary items for thirty-two Turkic language varieties based on the recently introduced Leipzig–Jakarta list. Our application of Bayesian phylogenetic inference on lexical data of the Turkic languages is unprecedented. The resulting phylogenetic tree supports a binary structure for Turkic and replicates most of the conventional sub-branches in the Common Turkic branch. We calculate the robustness of the inferences for subgroups and individual languages whose position in the tree seems to be debatable. We infer the time-depth of the Turkic family at around 2100 years before present, thus providing a reliable quantitative basis for previous estimates based on classical historical linguistics and lexicostatistics.


2019 ◽  
Vol 5 (1) ◽  
pp. 54-74
Author(s):  
Luis Miguel Rojas-Berscia ◽  
Sean Roberts

Abstract Pronouns as a diagnostic feature of language relatedness have been widely explored in historical and comparative linguistics. In this article, we focus on South American pronouns, as a potential example of items with their own history passing between the boundaries of language families, what has been dubbed in the literature as ‘historical markers’. Historical markers are not a direct diagnostic of genealogical relatedness among languages, but account for phenomena beyond the grasp of the historical comparative method. Relatedness between pronoun systems can thus serve as suggestions for closer studies of genealogical relationships. How can we use computational methods to help us with this process? We collected pronouns for 121 South American languages, grouped them into classes and aligned the phonemes within each class (assisted by automatic methods). We then used Bayesian phylogenetic tree inference to model the birth and death of individual phonemes within cognate sets, rather than the typical practice of modelling whole cognate sets. The reliability of the splits found in our analysis was low above the level of language family, and validation on alternative data suggested that the analysis cannot be used to infer general genealogical relatedness among languages. However, many results aligned with existing theories, and the analysis as a whole provided a useful starting point for future analyses of historical relationships between the languages of South America. We show that using automated methods with evolutionary principles can support progress in historical linguistics research.


Diachronica ◽  
2021 ◽  
Author(s):  
Sofia Oskolskaya ◽  
Ezequiel Koile ◽  
Martine Robbeets

AbstractThe Tungusic language family is comprised of languages spoken in Siberia, the Russian Far East, Northeast China and Xinjiang. There is a general consensus that these languages are genealogically related and descend from a common ancestral language. Nevertheless, there is considerable disagreement with regard to the internal structure of the Tungusic family and the time depth of its separation into daughter languages. Here we use computational Bayesian phylogenetic methods to generate a phylogeny of Tungusic languages and estimate the time-depth of the family. Our analysis is based on the recently introduced Leipzig-Jakarta-Jena list, a dataset of 254 basic vocabulary items collected for 21 Tungusic doculects. Our results are consistent with two basic classifications previously proposed in the literature, notably a Manchu-Tungusic classification, in which the break-up of Jurchenic constitutes the first split in the tree, as well as a North-South classification, which includes a Jurchenic-Nanaic and an Orochic-Ewenic branch. In addition, we obtain a time-depth for the age of Proto-Tungusic between the 8th century BC and the 12th century AD (95% highest posterior density interval). Previous classifications of Tungusic were based on both classical historical comparative linguistic and lexicostatistic approaches, but the application of Bayesian phylogenetic methods to the Tungusic languages has not so far been attempted. In contrast to previous approaches, our Bayesian analysis adds an understanding of the statistical robustness of the proposed branches and infers absolute divergence dates, allowing variation of rates of change across branches and cognate sets. In this way, our research provides a reliable quantitative basis for previous estimates based on classical historical linguistic and lexicostatistic approaches.


Author(s):  
Constanze Weise

Many societies in pre-1800 Africa depended on orality both for communication and for record keeping. Historians of Africa, among other ways of dealing with this issue, treat languages as archives and apply what is sometimes called the “words and things” approach. Every language is an archive, in the sense that its words and their meanings have histories. The presence and use of particular words in the vocabulary of the language can often be traced back many centuries into the past. They are, in other words, historical artifacts. Their presence in the language in the past and their meanings in those earlier times tell us about the things that people knew, made use of, and talked about in past ages. They provide us complex insights into the world in which people of past societies lived and operated. But in order to reconstruct word histories, historians first need to determine the relationships and evolution of the languages that possessed those words. The techniques of comparative historical linguistics and language classification allow one to establish a linguistic stratigraphy: to show how the periods can be established in which meaning changes in existing words or changes in the words used for particular meanings took place, to assess what these word histories reveal about changes in a society and its culture, and to identify whether internal innovation or encounters with other societies mediated such changes. The comparative method on its own cannot establish absolute dates of language divergence. The method does allow scholars, however, to reconstruct the lexicons of material culture used at each earlier period in the language family tree. These data identify the particular cultural features to look for in the archaeology of people who spoke languages of the family in earlier times, and that evidence in turn enables scholars to propose datable archaeological correlations for the nodes of the family tree. A second approach to dating a language family tree has been a lexicostatistical technique, often called glottochronology, which seeks to estimate how long ago sister languages began to diverge out of their common ancestor language by using calculations based on the proportion of words in the most basic parts of the vocabulary that the languages still retain in common. Recent work in computational linguistic phylogenetics makes use of elements of lexicostatistics, and there have been efforts to automate the comparative method as well. In order to compare languages historically, two important issues first have to be confronted, namely data acquisition and data analysis. Linguistic field collection of vocabularies from native speakers and linguistic archive work, especially with dictionaries, are principal means of data acquisition. The comparative historical linguistic approach and methods provide the tools for analyzing these linguistic data, both diachronically and synchronically. Nearly all African languages have been classified into four language families, namely: Niger-Congo, Nilo-Saharan, Afroasiatic, and Khoisan. The Malagasy language of Madagascar is an exception, in that it was brought west across the Indian Ocean to that island from the East Indies early in the first millennium ce. Malagasy as well as several languages with an Indo-European origin, such as Afrikaans, Krio, and Nigerian Pidgin English, are not part of this discussion.


2018 ◽  
Vol 8 (1) ◽  
pp. 1-21 ◽  
Author(s):  
Dan Dediu

AbstractOne of the best-known types of non-independence between languages is caused by genealogical relationships due to descent from a common ancestor. These can be represented by (more or less resolved and controversial) language family trees. In theory, one can argue that language families should be built through the strict application of the comparative method of historical linguistics, but in practice this is not always the case, and there are several proposed classifications of languages into language families, each with its own advantages and disadvantages. A major stumbling block shared by most of them is that they are relatively difficult to use with computational methods, and in particular with phylogenetics. This is due to their lack of standardization, coupled with the general non-availability of branch length information, which encapsulates the amount of evolution taking place on the family tree. In this paper I introduce a method (and its implementation in R) that converts the language classifications provided by four widely-used databases (Ethnologue, WALS, AUTOTYP and Glottolog) into the de facto Newick standard generally used in phylogenetics, aligns the four most used conventions for unique identifiers of linguistic entities (ISO 639-3, WALS, AUTOTYP and Glottocode), and adds branch length information from a variety of sources (the tree’s own topology, an externally given numeric constant, or a distance matrix). The R scripts, input data and resulting Newick trees are available under liberal open-source licenses in a GitHub repository (https://github.com/ddediu/lgfam-newick), to encourage and promote the use of phylogenetic methods to investigate linguistic diversity and its temporal dynamics.


Sign in / Sign up

Export Citation Format

Share Document