Towards Automatic Large-Scale Identification of Birds in Audio Recordings

Author(s):  
Mario Lasseck
Keyword(s):  
2019 ◽  
Author(s):  
Meg Cychosz ◽  
Alejandrina Cristia ◽  
Elika Bergelson ◽  
Marisa Casillas ◽  
Gladys Baudet ◽  
...  

This study evaluates whether early vocalizations develop in similar ways in children across diverse cultural contexts. We analyze data from daylong audio-recordings of 49 children (1-36 months) from five different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations contained canonical transitions or not (e.g., "ba'' versus "ee''). Results revealed that the proportion of clips reported to contain canonical transitions increased with age. Further, this proportion exceeded 0.15 by around 7 months, replicating and extending previous findings on canonical vocalization development but using data from the natural environments of a culturally and linguistically diverse sample. This work explores how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Lower inter-annotator reliability on the crowdsourcing platform, relative to more traditional in-lab expert annotators, means that a larger number of unique annotators and/or annotations are required and that crowdsourcing may not be a suitable method for more fine-grained annotation decisions. Audio clips used for this project are compiled into a large-scale infant vocal corpus that is available for other researchers to use in future work.


2007 ◽  
Vol 3 (6) ◽  
pp. 603-606
Author(s):  
Dale Joachim ◽  
Eben Goodale

Playback is an important method of surveying animals, assessing habitats and studying animal communication. However, conventional playback methods require on-site observers and therefore become labour-intensive when covering large areas. Such limitations could be circumvented by the use of cellular telephony, a ubiquitous technology with increasing biological applications. In addressing concerns about the low audio quality of cellular telephones, this paper presents experimental data to show that owls of two species ( Strix varia and Megascops asio ) respond similarly to calls played through cellular telephones as to calls played through conventional playback technology. In addition, the telephone audio recordings are of sufficient quality to detect most of the two owl species' responses. These findings are a first important step towards large-scale applications where networks of cellular phones conduct real-time monitoring tasks.


2018 ◽  
Vol 47 (7) ◽  
pp. 451-464 ◽  
Author(s):  
Sean Kelly ◽  
Andrew M. Olney ◽  
Patrick Donnelly ◽  
Martin Nystrand ◽  
Sidney K. D’Mello

Analyzing the quality of classroom talk is central to educational research and improvement efforts. In particular, the presence of authentic teacher questions, where answers are not predetermined by the teacher, helps constitute and serves as a marker of productive classroom discourse. Further, authentic questions can be cultivated to improve teaching effectiveness and consequently student achievement. Unfortunately, current methods to measure question authenticity do not scale because they rely on human observations or coding of teacher discourse. To address this challenge, we set out to use automatic speech recognition, natural language processing, and machine learning to train computers to detect authentic questions in real-world classrooms automatically. Our methods were iteratively refined using classroom audio and human-coded observational data from two sources: (a) a large archival database of text transcripts of 451 observations from 112 classrooms; and (b) a newly collected sample of 132 high-quality audio recordings from 27 classrooms, obtained under technical constraints that anticipate large-scale automated data collection and analysis. Correlations between human-coded and computer-coded authenticity at the classroom level were sufficiently high ( r = .602 for archival transcripts and .687 for audio recordings) to provide a valuable complement to human coding in research efforts.


2017 ◽  
Author(s):  
Michael C. Frank ◽  
Christina Bergmann ◽  
Elika Bergelson ◽  
Krista Byers-Heinlein ◽  
Alejandrina Cristia ◽  
...  

The field of psychology has become increasingly concerned with issues related to methodology and replicability. Infancy researchers face specific challenges related to replicability: high-powered studies are difficult to conduct, testing conditions vary across labs, and different labs have access to different infant populations, amongst other factors. Addressing these concerns, we report on a large-scale, multi-site study aimed at 1) assessing the overall replicability of a single theoretically-important phenomenon and 2) examining methodological, situational, cultural, and developmental moderators. We focus on infants’ preference for infant-directed speech (IDS) over adult-directed speech (ADS). Stimuli of mothers speaking to their infants and to an adult were created using semi-naturalistic laboratory-based audio recordings in North American English. Infants’ relative preference for IDS and ADS was assessed across 67 laboratories in North America, Europe, Australia, and Asia using the three commonly-used infant discrimination methods (head-turn preference, central fixation, and eye tracking). The overall meta-analytic effect size (Cohen’s *d*) was 0.35 [0.29 - 0.42], which was reliably above zero but smaller than the meta-analytic mean computed from previous literature (0.67). The IDS preference was significantly stronger in older children, in those children for whom the stimuli matched their native language and dialect, and in data from labs using the head-turn preference procedure. Together these findings replicate the infant-directed speech preference but suggest that its magnitude is modulated by development, native language experience, and testing procedure.


10.2196/20545 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e20545
Author(s):  
Paul J Barr ◽  
James Ryan ◽  
Nicholas C Jacobson

COVID-19 cases are exponentially increasing worldwide; however, its clinical phenotype remains unclear. Natural language processing (NLP) and machine learning approaches may yield key methods to rapidly identify individuals at a high risk of COVID-19 and to understand key symptoms upon clinical manifestation and presentation. Data on such symptoms may not be accurately synthesized into patient records owing to the pressing need to treat patients in overburdened health care settings. In this scenario, clinicians may focus on documenting widely reported symptoms that indicate a confirmed diagnosis of COVID-19, albeit at the expense of infrequently reported symptoms. While NLP solutions can play a key role in generating clinical phenotypes of COVID-19, they are limited by the resulting limitations in data from electronic health records (EHRs). A comprehensive record of clinic visits is required—audio recordings may be the answer. A recording of clinic visits represents a more comprehensive record of patient-reported symptoms. If done at scale, a combination of data from the EHR and recordings of clinic visits can be used to power NLP and machine learning models, thus rapidly generating a clinical phenotype of COVID-19. We propose the generation of a pipeline extending from audio or video recordings of clinic visits to establish a model that factors in clinical symptoms and predict COVID-19 incidence. With vast amounts of available data, we believe that a prediction model can be rapidly developed to promote the accurate screening of individuals at a high risk of COVID-19 and to identify patient characteristics that predict a greater risk of a more severe infection. If clinical encounters are recorded and our NLP model is adequately refined, benchtop virologic findings would be better informed. While clinic visit recordings are not the panacea for this pandemic, they are a low-cost option with many potential benefits, which have recently begun to be explored.


2012 ◽  
Vol 22 ◽  
pp. 11-14
Author(s):  
David Burraston

Environmental sonification of rainfall events using large-scale long wire instruments is presented. After a brief historical introduction of long wires and their relationship to the Aeolian harp, some aspects of construction and recording techniques are discussed. Observations of rainfall-induced vibrations on long wire audio recordings are then presented.


2018 ◽  
Vol 4 (2) ◽  
pp. 197-230
Author(s):  
Jennifer Kuo

Abstract This study aims to (i) identify patterns of sociophonetic variation in Taiwan Mandarin, and (ii) evaluate smartphone technologies as a tool for crowdsourcing sociophonetic data. Specifically, this study examines both phonological variables found in prior literature to be highly salient (deretroflexion, labiovelar glide deletion), and variables that are less likely to index social properties (merging of final /n, ŋ/, changes to Tones 2 and 3). Unlike past studies which have primarily relied on smaller sample sizes, I utilize a smartphone application to crowdsource audio recordings across Taiwan; subsequent Rbrul analysis of 292 recordings revealed robust patterns of sociolinguistic variation. Deretroflexion correlates strongly with gender and age, while glide deletion correlates with gender. Nasal final merging and tonal change exhibit less socio-indexical variation, but provide evidence of potential change in progress. These findings suggest that smartphone-based crowdsourcing can complement traditional sociolinguistic fieldwork, and reveal new knowledge about large-scale variation.


Author(s):  
James N. Stanford

For nearly 400 years, New England has held an important place in the development of American English, and “New England accents” are very well known in popular imagination. But since the 1930s, no large-scale academic book project has focused specifically on New England English. While other research projects have studied dialect features in various regions of New England, this is the first large-scale scholarly project to focus solely on New England English since the Linguistic Atlas of New England. This book presents new research covering all six New England states, with detailed geographic, phonetic, and statistical analysis of data collected from over 1,600 New Englanders. The book covers the past, present, and future of New England dialect features, analyzing them with dialect maps and statistical modeling in terms of age, gender, social class, ethnicity, and other factors. The book reports on a recent large-scale data collection project that included 367 field interviews, 626 audio-recorded interviews, and 634 online written questionnaires. Using computational methods, the project processed over 200,000 individual vowels in audio recordings to examine changes in New England speech. The researchers also manually examined 30,000 instances of /r/ to investigate “r-dropping” in words like “park” and so on. The book also reviews other recent research in the area. Using acoustic phonetics, computational processing, detailed statistical analyses, dialect maps, and graphical illustrations, the book systematically documents all of the major traditional New England dialect features, other regional features, and their current usage across New England.


2019 ◽  
pp. 75-112
Author(s):  
James N. Stanford

This is the first of the two chapters (Chapters 4 and 5) that present the results of the online data collection project using Amazon’s Mechanical Turk system. These projects provide a broad-scale “bird’s eye” view of New England dialect features across large distances. This chapter examines the results from 626 speakers who audio-recorded themselves reading 12 sentences two times each. The recordings were analyzed acoustically and then modeled statistically and graphically. The results are presented in the form of maps and statistical analyses, with the goal of providing a large-scale geographic overview of modern-day patterns of New England dialect features.


Sign in / Sign up

Export Citation Format

Share Document