scholarly journals Reflex prediction

Diachronica ◽  
2021 ◽  
Author(s):  
Timotheus A. Bodt¹ ◽  
Johann-Mattis List²

Abstract While analysing lexical data of Western Kho-Bwa languages of the Sino-Tibetan or Trans-Himalayan family with the help of a computer-assisted approach for historical language comparison, we observed gaps in the data where one or more varieties lacked forms for certain concepts. We employed a new workflow, combining manual and automated steps, to predict the most likely phonetic realisations of the missing forms in our data, by making systematic use of the information on sound correspondences in words that were potentially cognate with the missing forms. This procedure yielded a list of hypothetical reflexes of previously identified cognate sets, which we first preregistered as an experiment on the prediction of unattested word forms and then compared with actual word forms elicited during secondary fieldwork. In this study we first describe the workflow which we used to predict hypothetical reflexes and the process of elicitation of actual word forms during fieldwork. We then present the results of our reflex prediction experiment. Based on this experiment, we identify four general benefits of reflex prediction in historical language comparison. These comprise (1) an increased transparency of linguistic research, (2) an increased efficiency of field and source work, (3) an educational aspect which offers teachers and learners a wide plethora of linguistic phenomena, including the regularity of sound change, and (4) the possibility of kindling speakers’ interest in their own linguistic heritage.

2019 ◽  
Vol 10 (2018/1) ◽  
Author(s):  
András Zsigmond Albeker

While the stenographic records of the Meiji era have been analyzed in thecontext of linguistic research into the unification of the spoken and writtenlanguage (gembun icchi 言文一致), vocabulary and grammar, there is somedebate as to the value of these records. This paper aims to clarify what kinds of difference occurred in the process of translating and typing the shorthand symbols into magazines andnewspapers. It has become clear that the stenographed speeches published in newspapers and magazines were not faithful reproductions of the original texts. Tomake it easier for the reader to understand, mistakes were rectified in the transcribing process, words and word forms were corrected by the stenographer and/or the editor. It seems that- as linguistic material - the value of a stenographic record ishigher than that of a shorthand book. However, very few shorthand manuscripts have so far been confirmed and in genre they are closer to stenographed speeches. We can assume that if a shorthand manuscript such as rakugo落語 or the Imperial Congressional Record were to be discovered, our understanding of the Meiji period Japanese language would be further enhanced.


2021 ◽  
Vol 16 ◽  
pp. 42-48
Author(s):  
Rūta Petrauskaitė ◽  
Virginijus Dadurkevičius
Keyword(s):  

In the paper the method is presented how to update traditional digitalised dictionaries based on comparison of the dictionary lemmas and a big corpus. Hunspell platform is used for generation of all the word forms from the dictionary lemmas. 6th edition of The Dictionary of Modern Lithuanian was chosen for its comparison with the lexical data from The Joint Corpus of Lithuanian. The outcome of the comparison was two lists of non-overlapping lexis: the list of the dictionary lemmas unused in the present-day Lithuanian and the list of the dictionary gaps, i.e., frequently used words and word forms ignored by the dictionary. The latter is discussed in greater detail to give lexicographers a clue for updates.


Author(s):  
David Fertig

Analogy is traditionally regarded as one of the three main factors responsible for language change, along with sound change and borrowing. Whereas sound change is understood to be phonetically motivated and blind to structural patterns and semantic and functional relationships, analogy is licensed precisely by those patterns and relationships. In the Neogrammarian tradition, analogical change is regarded, at least largely, as a by-product of the normal operation (acquisition, representation, and use) of the mental grammar. Historical linguists commonly use proportional equations of the form A : B = C : X to represent analogical innovations, where A, B, and C are (sets of) word forms known to the innovator, who solves for X by discerning a formal relationship between A and B and then deductively arriving at a form that is related to C in the same way that B is related to A. Along with the core type of analogical change captured by proportional equations, most historical linguists include a number of other phenomena under the analogy umbrella. Some of these, such as paradigm leveling—the reduction or elimination of stem alternations in paradigms—are arguably largely proportional, but others such as contamination and folk etymology seem to have less to do with the normal operation of the mental grammar and instead involve some kind of interference among the mental representations of phonetically or semantically similar forms. The Neogrammarian approach to analogical change has been criticized and challenged on a variety of grounds, and a number of important scholars use the term “analogy” in a rather different sense, to refer to the role that phonological and/or semantic similarity play in the influence that forms exert on each other.


Author(s):  
Koen Bostoen ◽  
Yvonne Bastin

Lexical reconstruction has been an important enterprise in Bantu historical linguistics since the earliest days of the discipline. In this chapter a historical overview is provided of the principal scholarly contributions to that field of study. It is also explained how the Comparative Method has been and can be applied to reconstruct ancestral Bantu vocabulary via the intermediate step of phonological reconstruction and how the study of sound change needs to be completed with diachronic semantics in order to correctly reconstruct both the form and the meaning of etymons. Finally, some issues complicating this type of historical linguistic research, such as “osculance” due to prehistoric language contact, are addressed, as well as the relationship between reconstruction and classification.


2017 ◽  
Vol 45 (3) ◽  
pp. 673-702 ◽  
Author(s):  
Barbara DAVIS ◽  
Suzanne VAN DER FEEST ◽  
Hoyoung YI

AbstractThis study investigates whether the earliest words children choose to say are mainly words containing sounds they can produce (cf. ‘phonological dominance’ hypotheses), or whether children choose words without regard to their phonological characteristics (cf. ‘lexical dominance’ hypotheses). Phonological properties of words in spontaneous speech from six children age 0;8 to 2;11 were analyzed by comparing sound distributions of consonant place and manner. Word-initial and word-final consonant patterns in children'sWord TargetsversusActual Word Formswere analyzed as a function of vocabulary size. Word-initial results showed more overall evidence for phonological dominance. In word-final position, at lower vocabulary sizes, results showed several differences between Word Targets and Actual Word Forms, consistent with lexical dominance. These findings challenge an ‘either–or’ phonological versus lexical dominance approach, and support consideration of a multifactorial set of influences, including different phonological dimensions and word positions, on the words that young children choose to say.


2018 ◽  
Vol 22 (2) ◽  
pp. 277-306 ◽  
Author(s):  
Johann-Mattis List ◽  
Simon J. Greenhill ◽  
Cormac Anderson ◽  
Thomas Mayer ◽  
Tiago Tresoldi ◽  
...  

Abstract The Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns. In its current form, it has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations, ranging from studies on semantic change, patterns of conceptualization, and linguistic paleontology. But CLICS has also been criticized for obvious shortcomings, ranging from the underlying dataset, which still contains many errors, up to the limits of cross-linguistic colexification studies in general. Building on recent standardization efforts reflected in the Cross-Linguistic Data Formats initiative (CLDF) and novel approaches for fast, efficient, and reliable data aggregation, we have created a new database for cross-linguistic colexifications, which not only supersedes the original CLICS database in terms of coverage but also offers a much more principled procedure for the creation, curation and aggregation of datasets. The paper presents the new database and discusses its major features.


1996 ◽  
Vol 20 (2) ◽  
pp. 381-418
Author(s):  
Jean-Paul Metzger ◽  
Seyed Mohammad Mahmoudi

RÉSUMÉ L'objet de cet article réside dans la conception globale d'un analyseur morpho-syntaxique du persan pour 1'indexation automatique. L'analyseur se limite donc à la recherche des Syntagmes Nominaux (SN), considérés comme les éléments les plus informatifs, dans le contexte d'une recherche documentaire, pour l'analyse du contenu d'un texte. La mise au point d'un tel analyseur nécessite, au préalable, une segmentation et une catégorisation correcte de toute forme lexico-syntaxique. Nous présentons très brièvement un aperçu général du traitement automatique des langues naturelles (TAL) et certaines caractéristiques de la langue persane. Puis nous essayons de donner quelques solutions générales pour la construction des règies de réécriture nécessaires pour la reconnaissance automatique des SN en persan. Les règies de réécriture ainsi élaborées sont transcrites en un programme en langage Prolog. SUMMARY The aim of this paper is the conception and realisation of a morpho-syntactic parser of persian designed for applications to automatic indexing and computer-assisted instruction of the language (CAT). One of the chief extensions to this research is the automatic processing of natural language by means of artificial intelligence systems. The main interest of this contribution is to study the automatic recognition of noun phrases in Persian. In the case of automatic indexing, the recognition of the noun phrases would allow the apprehension of the content of the document. Automatic indexing, just as manual indexing, consists of selecting in every document the most informative elements which actually are descriptors or noun phrases (NP). The setting up or conception of such a parser demands, primarily, a correct segmentation and categorisation of any lexico-syntactic forms in the corpus. After having established all the transcription rules needed for the recognition of NP, we shall then transcribe every phase of the analysis by a program in Prolog language. All the lexical data necessary for the categorisation of morpho-syntactic forms are presented as clauses of Prolog in a data-base.


2011 ◽  
Vol 1 (1) ◽  
pp. 89-127 ◽  
Author(s):  
Lydia Steiner ◽  
Michael Cysouw ◽  
Peter Stadler

AbstractThere are many parallels between historical linguistics and molecular phylogenetics. In this paper we describe an algorithmic pipeline that mimics, as closely as possible, the traditional workflow of language reconstruction known as the comparative method. The pipeline consists of suitably modified algorithms based on recent research in bioinformatics, which are adapted to the specifics of linguistic data. This approach can alleviate much of the laborious research needed to establish proof of historical relationships between languages. Equally important to our proposal is that each step in the workflow of the comparative method is implemented independently, so language specialists have the possibility to scrutinize intermediate results. We have used our pipeline to investigate two groups of languages, the Tsezic languages of the Caucasus and the Mataco-Guaicuruan languages of South America, based on the lexical data from the Intercontinental Dictionary Series (IDS). The results of these tests show that the current approach is a viable and useful extension to historical linguistic research.


2021 ◽  
Author(s):  
Johann-Mattis List ◽  
Robert Forkel ◽  
Simon J. Greenhill ◽  
Christoph Rzymski ◽  
Johannes Englisch ◽  
...  

Abstract The past decades have seen substantial growth in digital data on the world's languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, the majority of published datasets lack standardization which makes their comparison difficult. Here, we present the first step to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that increase the FAIRness of linguistic data. We test the Lexibank workflow on a collection of 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.


Sign in / Sign up

Export Citation Format

Share Document