scholarly journals Describing Vocalizations in Young Children: A Big Data Approach Through Citizen Science Annotation

Author(s):  
Chiara Semenzin ◽  
Lisa Hamrick ◽  
Amanda Seidl ◽  
Bridgette L. Kelleher ◽  
Alejandrina Cristia

Purpose Recording young children's vocalizations through wearables is a promising method to assess language development. However, accurately and rapidly annotating these files remains challenging. Online crowdsourcing with the collaboration of citizen scientists could be a feasible solution. In this article, we assess the extent to which citizen scientists' annotations align with those gathered in the lab for recordings collected from young children. Method Segments identified by Language ENvironment Analysis as produced by the key child were extracted from one daylong recording for each of 20 participants: 10 low-risk control children and 10 children diagnosed with Angelman syndrome, a neurogenetic syndrome characterized by severe language impairments. Speech samples were annotated by trained annotators in the laboratory as well as by citizen scientists on Zooniverse. All annotators assigned one of five labels to each sample: Canonical, Noncanonical, Crying, Laughing, and Junk. This allowed the derivation of two child-level vocalization metrics: the Linguistic Proportion and the Canonical Proportion. Results At the segment level, Zooniverse classifications had moderate precision and recall. More importantly, the Linguistic Proportion and the Canonical Proportion derived from Zooniverse annotations were highly correlated with those derived from laboratory annotations. Conclusions Annotations obtained through a citizen science platform can help us overcome challenges posed by the process of annotating daylong speech recordings. Particularly when used in composites or derived metrics, such annotations can be used to investigate early markers of language delays.

2020 ◽  
Author(s):  
chiara semenzin ◽  
Lisa Hamrick ◽  
Amanda Seidl ◽  
Bridgette Lynne Kelleher ◽  
Alejandrina Cristia

Recording young children's vocalizations through wearables is a promising method. However, accurately and rapidly annotating these files remains challenging. Online crowdsourcing with the collaboration of citizen scientists could be a feasible solution. In this paper, we assess the extent to which citizen scientists' annotations align with those gathered in the lab for recordings collected from young children. Segments identified by LENA^TM^ as produced by the key child were extracted from one daylong recording for each of 20 participants: 10 low-risk control children and 10 children diagnosed with Angelman syndrome, a neurogenetic syndrome characterized by severe language impairments. Speech samples were annotated by trained annotators in the laboratory as well as by citizen scientists on Zooniverse. All annotators assigned one of five labels to each sample: Canonical, Non-Canonical, Crying, Laughing, and Junk. This allowed the derivation of two child-level vocalization metrics: the Linguistic Proportion, and the Canonical Proportion. At the segment level, Zooniverse classifications had moderate precision and recall. More importantly, the Linguistic Proportion and the Canonical Proportion derived from Zooniverse annotations were highly correlated with those derived from laboratory annotations. Annotations obtained through a citizen science platform can help us overcome challenges posed by the process of annotating daylong speech recordings. Particularly when used in composites or derived metrics, such annotations can be used to investigate early markers of language delays in non-typically developing children.


Author(s):  
Aleah S. Brock ◽  
Sandie M. Bass-Ringdahl

Purpose This research note reports preliminary data from an investigation of facilitative language techniques (FLTs) used in the natural environment by caregivers of children who are deaf or hard of hearing (DHH). The investigation seeks to establish a new method to collect and analyze data on caregiver FLT use in the home. Method This pilot investigation included two children under the age of 36 months with moderate-to-profound sensorineural hearing loss. Both children were consistent users of hearing devices and were pursing oral communication. Data were collected via the Language ENvironment Analysis (LENA) system in the participants' homes. Thirty-six 5-min segments containing the highest adult word count were extracted from each participant's sample. Researchers coded segments for the presence or absence of 10 FLTs within 30-s intervals. Results The collection, coding, and analysis of caregiver FLTs using LENA was a feasible method to investigate caregiver linguistic input in the natural environment. Despite differences in age, sex, and hearing level, the distribution of caregiver FLTs was similar for both participants. Caregivers used high levels of narration, closed-ended questions, and directives throughout the day. Conclusions Results of this investigation provide information about the types of FLTs that are used in the home by caregivers of young children who are DHH. Furthermore, results indicate the feasibility of this method to investigate in-home use of caregiver FLTs.


2019 ◽  
Vol 146 (4) ◽  
pp. 2956-2956 ◽  
Author(s):  
Jaie C. Woodard ◽  
Nikaela Losievski ◽  
Meisam K. Arjmandi ◽  
Matthew Lehet ◽  
Yuanyuan Wang ◽  
...  

2020 ◽  
Vol 51 (3) ◽  
pp. 706-719 ◽  
Author(s):  
Anne L. Larson ◽  
Tyson S. Barrett ◽  
Scott R. McConnell

Purpose This study was conducted in a large Midwestern metropolitan area to examine the language environments at home and in center-based childcare for young children who are living in poverty. We compared child language use and exposure in the home and childcare settings using extended observations with automated Language Environment Analysis to gain a deeper understanding of the environmental factors that may affect change in language outcomes for young children. Method Thirty-eight children, along with parents ( n = 38) and childcare providers ( n = 14) across five childcare centers, participated in this study. Each child completed a standardized language assessment and two daylong recordings with Language Environment Analysis to determine the number of adult words, conversational turns, and child vocalizations that occurred in each setting. Data were analyzed at 5-min intervals across each recording. Results Comparisons between home recordings in this sample and a comparison group showed reliably higher rates of adult words and conversational turns in the home setting. Linear mixed-effects regression models showed significant differences in the child language environments, with the home setting providing higher levels of language input and use. These effects were still meaningful after accounting for the time of day, participant demographic characteristics, and child language ability. Conclusions Practical implications for supporting child language development across settings are discussed, and suggestions for further research are provided. Supplemental Material https://doi.org/10.23641/asha.12042678


2018 ◽  
Vol 27 (3) ◽  
pp. 1066-1072
Author(s):  
Shelley L. Bredin-Oja ◽  
Heather Fielding ◽  
Kandace K. Fleming ◽  
Steven F. Warren

Purpose The purpose of this study was to investigate the reliability of an automated language analysis system, the Language Environment Analysis (LENA), compared with a human transcriber to determine the rate of child vocalizations during recording sessions that were significantly shorter than recommended for the automated device. Method Participants were 6 nonverbal male children between the ages of 28 and 46 months. Two children had autism diagnoses, 2 had Down syndrome, 1 had a chromosomal deletion, and 1 had developmental delay. Participants were recorded by the LENA digital language processor during 14 play-based interactions with a responsive adult. Rate of child vocalizations during each of the 84 recordings was determined by both a human transcriber and the LENA software. Results A statistically significant difference between the 2 methods was observed for 4 of the 6 participants. Effect sizes were moderate to large. Variation in syllable structure did not explain the difference between the 2 methods. Vocalization rates from the 2 methods were highly correlated for 5 of the 6 participants. Conclusions Estimates of vocalization rates from nonverbal children produced by the LENA system differed from human transcription during sessions that were substantially shorter than the recommended recording length. These results confirm the recommendation of the LENA Foundation to record sessions of at least 1 hr.


2021 ◽  
pp. 1-15
Author(s):  
Leonardo PIOT ◽  
Naomi HAVRON ◽  
Alejandrina CRISTIA

Abstract Using a meta-analytic approach, we evaluate the association between socioeconomic status (SES) and children's experiences measured with the Language Environment Analysis (LENA) system. Our final analysis included 22 independent samples, representing data from 1583 children. A model controlling for LENATM measures, age and publication type revealed an effect size of r z = .186, indicating a small effect of SES on children's language experiences. The type of LENA metric measured emerged as a significant moderator, indicating stronger effects for adult word counts than child vocalization counts. These results provide important evidence for the strength of association between SES and children's everyday language experiences as measured with an unobtrusive recording analyzed automatically in a standardized fashion.


2021 ◽  
Vol 64 (3) ◽  
pp. 792-808
Author(s):  
Margarethe McDonald ◽  
Taeahn Kwon ◽  
Hyunji Kim ◽  
Youngki Lee ◽  
Eon-Suk Ko

Purpose The algorithm of the Language ENvironment Analysis (LENA) system for calculating language environment measures was trained on American English; thus, its validity with other languages cannot be assumed. This article evaluates the accuracy of the LENA system applied to Korean. Method We sampled sixty 5-min recording clips involving 38 key children aged 7–18 months from a larger data set. We establish the identification error rate, precision, and recall of LENA classification compared to human coders. We then examine the correlation between standard LENA measures of adult word count, child vocalization count, and conversational turn count and human counts of the same measures. Results Our identification error rate (64% or 67%), including false alarm, confusion, and misses, was similar to the rate found in Cristia, Lavechin, et al. (2020) . The correlation between LENA and human counts for adult word count ( r = .78 or .79) was similar to that found in the other studies, but the same measure for child vocalization count ( r = .34–.47) was lower than the value in Cristia, Lavechin, et al., though it fell within ranges found in other non-European languages. The correlation between LENA and human conversational turn count was not high ( r = .36–.47), similar to the findings in other studies. Conclusions LENA technology is similarly reliable for Korean language environments as it is for other non-English language environments. Factors affecting the accuracy of diarization include speakers' pitch, duration of utterances, age, and the presence of noise and electronic sounds.


Sign in / Sign up

Export Citation Format

Share Document