The acquisition of gender marking by young German-speaking children: Evidence for learning guided by phonological regularities

ABSTRACTThe acquisition of noun gender on articles was studied in a sample of 21 young German-speaking children. Longitudinal spontaneous speech data were used. Data analysis is based on 22 two-hourly speech samples per child from 6 children between 1 ; 4 and 3 ; 8 and on 5 two-hourly speech samples per child from 15 children between 1 ; 4 and 2 ; 10. The use of gender marked articles occurred from 1 ; 5. Error frequencies dropped below 10% by 3 ; 0. Definite and indefinite articles were used with similar frequencies and error rates did not differ in the two paradigms. Children's errors were systematic. For monosyllabic nouns and for polysyllabic nouns ending in -el, -en and -er errors were more frequent for nouns which did not conform to the rule that such nouns tend to be masculine. Furthermore, children erred in the direction of the rule overgeneralizing der. Correct gender marking was also associated with adult frequency of noun use. The present data is evidence for the early use of phonological regularities of noun structure in the acquisition of gender marking.

Download Full-text

Evaluating Web-Based Automatic Transcription for Alzheimer’s Speech Data: Transcript Comparison and Machine Learning Analysis (Preprint)

10.2196/preprints.33460 ◽

2021 ◽

Author(s):

Thomas Soroski ◽

Thiago da Cunha Vasco ◽

Sally Newton-Mason ◽

Saffrin Granby ◽

Caitlin Lewis ◽

...

Keyword(s):

Machine Learning ◽

Neurodegenerative Disease ◽

Life Experience ◽

Cost Savings ◽

Spontaneous Speech ◽

Error Rates ◽

Healthy Controls ◽

Machine Learning Classification ◽

Speech Data ◽

Automatic Transcription

BACKGROUND Speech data for medical research can be collected non-invasively and in large volumes. Speech analysis has shown promise in diagnosing neurodegenerative disease. To effectively leverage speech data, transcription is important as there is valuable information contained in lexical content. Manual transcription, while highly accurate, limits potential scalability and cost savings associated with language-based screening. OBJECTIVE To better understand the use of automatic transcription for classification of neurodegenerative disease (Alzheimer’s Disease [AD], mild cognitive impairment [MCI] or subjective memory complaints [SMC] versus healthy controls), we compared automatically generated transcripts against transcripts that went through manual correction. METHODS We recruited individuals from a memory clinic (“patients”) with a diagnosis of mild-moderate AD, (n=44), MCI (n=20), SMC (n=8) and healthy controls living in the community (n=77). Participants were asked to describe a standardized picture, read a paragraph, and recall a pleasant life experience. We compared transcripts generated using Google speech-to-text software to manually-verified transcripts by examining transcription confidence scores, transcription error rates, and machine learning classification accuracy. For the classification tasks, Logistic Regression, Gaussian Naive Bayes, and Random Forests were used. RESULTS The transcription software showed higher confidence scores (P<.001) and lower error rates (P>.05) for speech from healthy controls as compared with patients. Classification models using human-verified transcripts significantly (P<.001) outperformed automatically-generated transcript models for both spontaneous speech tasks. This comparison showed no difference in the reading task. Manually adding pauses to transcripts had no impact on classification performance. Manually correcting both spontaneous speech tasks led to significantly higher performances in the machine learning models. CONCLUSIONS We found that automatically-transcribed speech data could be used to distinguish patients with a diagnosis of AD, MCI or SMC from controls. We recommend a human verification step to improve the performance of automatic transcripts, especially for spontaneous tasks. Moreover, human verification can focus on correcting errors and adding punctuation to transcripts. Manual addition of pauses, however, is not needed, which can simplify the human verification step to more efficiently process large volumes of speech data.

Download Full-text

Lexically driven or early structure building? Constructing an early grammar in German child language

First Language ◽

10.1177/0142723718761414 ◽

2018 ◽

Vol 39 (1) ◽

pp. 61-79 ◽

Cited By ~ 1

Author(s):

Gisela Szagun ◽

Satyam A. Schramm

Keyword(s):

Spontaneous Speech ◽

Parent Report ◽

Syntactic Complexity ◽

Grammatical Structure ◽

Structure Building ◽

Speech Data ◽

German Speaking ◽

Report Data ◽

Time Lagged

This study examines the role of the lexicon and grammatical structure building in early grammar. Parent-report data in CDI format from a sample of 1151 German-speaking children between 1;6 and 2;6 and longitudinal spontaneous speech data from 22 children between 1;8 and 2;5 were used. Regression analysis of the parent-report data indicates that grammatical words have a stronger influence on concurrent syntactic complexity than lexical words. Time-lagged correlations using the spontaneous speech data showed that lexical words at 1;8 predict subsequent MLU at 2;1 significantly; grammatical words do not. MLU at 2;5 is significantly predicted by grammatical words and no longer by lexical words. The influence of different grammatical subcategories on subsequent MLU varies. Use of articles and the copula at 2;1 most strongly predicts MLU at 2;5. Children use both types of articles and multiple determiners before a noun to the same extent as adults. The present results are suggestive of early grammatical structure building.

Download Full-text

‘I me Mine’ The Acquisition of Dutch Pronominal Possessives by L1 Children, L2 Children and L2 Adults

ITL - International Journal of Applied Linguistics ◽

10.2143/itl.155.0.2032363 ◽

2008 ◽

Vol 155 ◽

pp. 23-52

Author(s):

Elma Nap-Kolhoff ◽

Peter Broeder

Keyword(s):

Adult Learners ◽

Young Children ◽

First Language ◽

Spontaneous Speech ◽

L2 Acquisition ◽

Possessive Constructions ◽

Speech Data ◽

L2 Learners ◽

Dutch Language ◽

L1 Acquisition

Abstract This study compares pronominal possessive constructions in Dutch first language (L1) acquisition, second language (L2) acquisition by young children, and untutored L2 acquisition by adults. The L2 learners all have Turkish as L1. In longitudinal spontaneous speech data for four L1 learners, seven child L2 learners, and two adult learners, remarkable differences and similarities between the three learner groups were found. In some respects, the child L2 learners develop in a way that is similar to child L1 learners, for instance in the kind of overgeneralisations that they make. However, the child L2 learners also behave like adult L2 learners; i.e., in the pace of the acquisition process, the frequency and persistence of non-target constructions, and the difficulty in acquiring reduced pronouns. The similarities between the child and adult L2 learners are remarkable, because the child L2 learners were only two years old when they started learning Dutch. L2 acquisition before the age of three is often considered to be similar to L1 acquisition. The findings might be attributable to the relatively small amount of Dutch language input the L2 children received.

Download Full-text

Configuraciones tonales de frases entonativas en enunciados aseverativos del español de la Ciudad de México

Anuario de Letras Lingüística y Filología ◽

10.19130/iifl.adel.2020.24874 ◽

2020 ◽

Vol 8 (2) ◽

pp. 117-141

Author(s):

Alberto Rodríguez Márquez

Keyword(s):

Social Networks ◽

Spontaneous Speech ◽

The Other ◽

Prosodic Features ◽

Speech Corpus ◽

Data Set ◽

Other Hand ◽

Significant Difference ◽

Speech Data ◽

Intonational Phrases

The objective of this paper is to describe the prosodic features of the final intonation contour of minor intonational phrases (ip) and the tonemes of major intonational phrases (IP) in Mexico City’s Spanish variety. The speech data was taken from a spontaneous speech corpus made from speakers from two social networks: neighborhood and labor. Final intonation contours of ip show a predominantly rising movement. These contours are generally produced with greater length in the last syllable of the ip, which represents the most significant difference between both networks in the case of oxitone endings. On the other hand, tonemes are predominantly descendant, although the circumflex accent has an important number of cases within the data set. Tonemes produced by the neighborhood network are produced with larger length than those from the labor network.

Download Full-text

An English pronunciation learning system for Japanese students based on diagnosis of critical pronunciation errors

ReCALL ◽

10.1017/s0958344004001314 ◽

2004 ◽

Vol 16 (1) ◽

pp. 173-188 ◽

Cited By ~ 8

Author(s):

YASUSHI TSUBOTA ◽

MASATAKE DANTSUJI ◽

TATSUYA KAWAHARA

Keyword(s):

Error Rate ◽

Error Detection ◽

Native Speakers ◽

Error Rates ◽

Learning System ◽

Second Phase ◽

Acoustic Model ◽

Japanese Students ◽

Speech Data ◽

English Pronunciation

We have developed an English pronunciation learning system which estimates the intelligibility of Japanese learners' speech and ranks their errors from the viewpoint of improving their intelligibility to native speakers. Error diagnosis is particularly important in self-study since students tend to spend time on aspects of pronunciation that do not noticeably affect intelligibility. As a preliminary experiment, the speech of seven Japanese students was scored from 1 (hardly intelligible) to 5 (perfectly intelligible) by a linguistic expert. We also computed their error rates for each skill. We found that each intelligibility level is characterized by its distribution of error rates. Thus, we modeled each intelligibility level in accordance with its error rate. Error priority was calculated by comparing students' error rate distributions with that of the corresponding model for each intelligibility level. As non-native speech is acoustically broader than the speech of native speakers, we developed an acoustic model to perform automatic error detection using speech data obtained from Japanese students. As for supra-segmental error detection, we categorized errors frequently made by Japanese students and developed a separate acoustic model for that type of error detection. Pronunciation learning using this system involves two phases. In the first phase, students experience virtual conversation through video clips. They receive an error profile based on pronunciation errors detected during the conversation. Using the profile, students are able to grasp characteristic tendencies in their pronunciation errors which in effect lower their intelligibility. In the second phase, students practise correcting their individual errors using words and short phrases. They then receive information regarding the errors detected during this round of practice and instructions for correcting the errors. We have begun using this system in a CALL class at Kyoto University. We have evaluated system performance through the use of questionnaires and analysis of speech data logged in the server, and will present our findings in this paper.

Download Full-text

Speech Abilities in a Heterogeneous Group of Children With Autism

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-20-00651 ◽

2021 ◽

pp. 1-15

Author(s):

Kate Broome ◽

Patricia McCabe ◽

Kimberley Docking ◽

Maree Doble ◽

Bronwyn Carrigg

Keyword(s):

Expressive Language ◽

Children With Autism ◽

Spontaneous Speech ◽

Receptive Language ◽

Autism Spectrum ◽

Speech Development ◽

Limited Information ◽

Number Of Children ◽

Children With Asd ◽

Speech Data

Purpose This study aimed to provide detailed descriptive information about the speech of a heterogeneous cohort of children with autism spectrum disorder (ASD) and to explore whether subgroups exist based on this detailed speech data. High rates of delayed and disordered speech in both low-verbal and high-functioning children with ASD have been reported. There is limited information regarding the speech abilities of young children across a range of functional levels. Method Participants were 23 children aged 2;0–6;11 (years;months) with a diagnosis of ASD. Comprehensive speech and language assessments were administered. Independent and relational speech analyses were conducted from single-word naming tasks and spontaneous speech samples. Hierarchical clustering based on language, nonverbal communication, and spontaneous speech descriptive data was completed. Results Independent and relational speech analyses are reported. These variables are used in the cluster analyses, which identified three distinct subgroups: (a) children with high language and high speech ability ( n = 10), (b) children with low expressive language and low speech ability but higher receptive language and use of gestures ( n = 3), and (c) children with low language and low speech development ( n = 10). Conclusions This is the first study to provide detailed descriptive speech data of a heterogeneous cohort of children with ASD and use this information to statistically explore potential subgroups. Clustering suggests a small number of children present with low levels of speech and expressive language in the presence of better receptive language and gestures. This communication profile warrants further exploration. Replicating these findings with a larger cohort of children is needed. Supplemental Material https://doi.org/10.23641/asha.16906978

Download Full-text

Common risk difference test and interval estimation of risk difference for stratified bilateral correlated data

Statistical Methods in Medical Research ◽

10.1177/0962280218781988 ◽

2018 ◽

Vol 28 (8) ◽

pp. 2418-2438

Author(s):

Xi Shen ◽

Chang-Xing Ma ◽

Kam C Yuen ◽

Guo-Liang Tian

Keyword(s):

Data Analysis ◽

Confidence Interval ◽

Score Test ◽

Real Data ◽

Risk Difference ◽

Error Rates ◽

Correlated Data ◽

Test Statistic ◽

Data Set ◽

Intra Class Correlation

Bilateral correlated data are often encountered in medical researches such as ophthalmologic (or otolaryngologic) studies, in which each unit contributes information from paired organs to the data analysis, and the measurements from such paired organs are generally highly correlated. Various statistical methods have been developed to tackle intra-class correlation on bilateral correlated data analysis. In practice, it is very important to adjust the effect of confounder on statistical inferences, since either ignoring the intra-class correlation or confounding effect may lead to biased results. In this article, we propose three approaches for testing common risk difference for stratified bilateral correlated data under the assumption of equal correlation. Five confidence intervals of common difference of two proportions are derived. The performance of the proposed test methods and confidence interval estimations is evaluated by Monte Carlo simulations. The simulation results show that the score test statistic outperforms other statistics in the sense that the former has robust type [Formula: see text] error rates with high powers. The score confidence interval induced from the score test statistic performs satisfactorily in terms of coverage probabilities with reasonable interval widths. A real data set from an otolaryngologic study is used to illustrate the proposed methodologies.

Download Full-text

Deaccenting in Spontaneous Speech in Barcelona Spanish

Studies in Hispanic and Lusophone Linguistics ◽

10.1515/shll-2009-1035 ◽

2009 ◽

Vol 2 (1) ◽

Cited By ~ 14

Author(s):

Rajiv Rao

Keyword(s):

High Frequency ◽

Recent Literature ◽

Spontaneous Speech ◽

Lexical Item ◽

Future Research ◽

Research Gaps ◽

Stressed Syllable ◽

Speech Data ◽

Phonological Phrase ◽

Pitch Movement

AbstractRecent literature on Spanish intonation assumes that deaccenting occurs when a lexical item fails to cue stress via an F0 rise or some other pitch movement through its stressed syllable. Inspired by the findings and suggestions for future research by Face (2003), the present study fills in research gaps by examining seven potential influences on deaccenting, working with spontaneous speech, and addressing the understudied Barcelona dialect of Spanish. The analysis of 160-170 minutes of spontaneous speech data collected at the Universitat Autònoma de Barcelona reveals that the odds of deaccenting increase in words that are high frequency in Spanish, have fewer syllables, are verbs or adverbs, are uttered multiple times within a recent timeframe, or are in initial or medial positions of the phonological phrase. Finally, high frequency verbs and adverbs, as well as adverbs, nouns, and verbs with fewer syllables are all especially prone to deaccenting.

Download Full-text