An algorithm to identify periods of establishment and obsolescence of linguistic items in a diachronic corpus

Corpora ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. 205-236
Author(s):  
Evandro L.T.P. Cunha ◽  
Søren Wichmann

When exploring diachronic corpora, it is often beneficial for linguists to pinpoint not only the first or the last attestation dates of certain linguistic items, but also the moments in which they become more strongly established in the corpus or, conversely, the moments in which they, despite still being part of the language, become obsolete. In this paper, we propose an algorithm to assist the identification of such periods based on the frequency of items in a corpus. Our simple and generalisable algorithm can be used for the investigation of any linguistic item in any corpus which is divided into time-frames. We also demonstrate the applicability of our method using lexical data from the Corpus of Historical American English (coha), providing case studies on the statistics and characteristics of words that appear in or disappear from this corpus in different periods.

This chapter presents the in-depth analysis of case studies in which each linguistic item used in pre-test and post-tests are noted. Although the participants in case studies did not show improved accuracy after receiving written CF, the errors which had been provided with written CF did not occur again in the post-tests. And it can be seen that the errors that appeared in post-tests bore no relation to the written CF. Proficiency level was not found have an impact on the effect of written CF, but it was found that participants who had lower proficiency level required more written CF assistance.


Phonology ◽  
2021 ◽  
Vol 38 (2) ◽  
pp. 173-202
Author(s):  
Karthik Durvasula ◽  
Mohammed Qasem Ruthan ◽  
Sarah Heidenreich ◽  
Yen-Hwei Lin

Previous research has found that different syllabic (particularly simplex vs. complex onset) organisations have different temporal stability signatures in articulations – this observation is based entirely on articulatory measurements. In this article, we present the results of three production experiments which show that similar correlations between onset organisation and temporal stability metrics are observable in an analysis of acoustic measurements in American English and Jazani Arabic. The results that we present show stability across speakers and test items for both language groups, and highlight the possibility of using acoustic techniques to help to investigate the organisation of onsets in other languages.


Author(s):  
Judit Szitó

The study examined two YouTube Accent Tag videos to reveal how pairs of British and American speakers reacted to each other’s and their own varieties as they pronounced words from a list and answered several questions by offering their lexical choices. Accent Tag videos represent a novel way for lay people to be involved in science by offering their language varieties and opinions, accumulating data in unprecedented numbers in the history of dialectology and also creating a rich source for various types of linguistic inquiry. The results showed a marked difference between the manners in which standard-accented British and American speakers evaluated both their own and their interlocutor’s speech. The two RP-accented British speakers were more prone to criticise the two mainstream accented American (General American) speakers’ speech but did not receive criticisms from their interlocutors. Further, neither of the British speakers disparaged their own speech, while one American speaker did. The study also identified some disfavoured features of American English, mainly phonetic differences in comparison with RP, including the lower unrounded LOT-vowel [ɑ], T-flapping, and the flat BATH-vowel [ᴂ] (Wells 1982). The findings of these case studies support the hypothesis that in the game of “American vs English”, RP-accented British English is generally rated higher than mainstream American (or GA) in both groups of speakers.


2018 ◽  
Vol 8 (1) ◽  
pp. 78-107 ◽  
Author(s):  
Igor Yanovich

AbstractLexical datasets used for computational phylogenetic inference suffer a unique type of data error. Some words actually present in a language may be absent from the dataset at no fault of its curators: especially for lesser-studied languages, a word may be missing from all available sources such as dictionaries. It is thus important to be able to (i) check how robust one’s inferences are to dictionary omission errors, and (ii) incorporate the knowledge that such errors may be present into one’s inference. I introduce two simple techniques that work towards those goals, and study the possible effects of dictionary omission errors in two real-life case studies on the Lezgian and Uralic datasets from Kassian (2015) and Syrjänen et al. (2013), respectively. The effects of dictionary omission turn out to be moderate (Lezgian) to negligible (Uralic), and certainly far less significant than the possible effects of modeling choices, including priors, on the inferred phylogeny, as demonstrated in the Uralic case study. Assessing the possible effects of dictionary omissions is advisable, but severe problems are unlikely. Collecting significantly larger lexical datasets, in order to overcome sensitivity to priors, is likely more important than expending resources on verifying data against dictionary omissions.


1997 ◽  
Vol 6 (3) ◽  
pp. 45-56 ◽  
Author(s):  
Karla K. McGregor ◽  
Danielle Williams ◽  
Sarah Hearst ◽  
Amy C. Johnson

Contrastive analysis aids the identification of true speech-language errors in cases where there is a mismatch between the linguistic communities of the clinician and the client. This tutorial illustrates the procedure via three case studies of preschoolers who speak African American English (AAE). In these case studies, there was good agreement between the results of contrastive analysis and the results of more well-established comparison metrics, suggesting that contrastive analysis can yield valid profiles that aid in distinguishing difference from disorder in children who speak a nonstandard dialect.


2003 ◽  
Vol 17 (3) ◽  
pp. 179-205 ◽  
Author(s):  
Gerard Saucier

A scientific taxonomy of human personality attributes should optimally be based on studies from multiple languages and cultures. Study 1 demonstrates convergence between seven‐factor structures found in previous studies of Filipino and Hebrew languages. Study 2 shows that this ‘Multi‐Language Seven’ (ML7) factor model overlaps partially with the Big Five model, but includes four rather than three affective–interpersonal factors, replicates in American English lexical data nearly as well as the Big Five, and has close correspondences to the structure upon which two Italian lexical studies have converged. Correlates were used to clarify interpretation of ML7 factors labelled Gregariousness, Self‐Assurance, Even Temper ‘versus Temperamentalness’, Concern for Others, Conscientiousness, Originality/Virtuosity, and Negative Valence ‘or Social Unacceptability’. These studies indicate the viability of a lexically derived ‘etic’ alternative to the Big Five. Copyright © 2003 John Wiley & Sons, Ltd.


Sign in / Sign up

Export Citation Format

Share Document