Birdsong Phrase Verification and Classification Using Siamese Neural Networks
Bird vocalizations have been the focus of a wide variety of interdisciplinary studies in bioacoustics and neuroethology since they serve as models of motor control, learning and auditory perception. Yet, researchers have only begun to shed light on the structure and function of birdsong. Hypotheses abound, but still there is little agreement as how songs should be analyzed. One of the main challenges has been to classify acoustic units (syllables) from birdsong recordings, a task requiring robust classification algorithms capable of generalizing to unseen instances and dealing with data scarcity. Systematically detecting changes in syllable repertoires can help biologists to understand the origin and evolution of birdsong. The process of learning good features to discriminate among numerous and different sound classes is computationally expensive. Moreover, it might be impossible to achieve acceptable performance in cases where training data is scarce and classes are unbalanced. To address this issue, we propose a few-shot learning task in which an algorithm must make predictions given only a few instances of each class. We compared the performance of different Siamese Neural Networks at metric learning over the set of Cassini's Vireo syllables. Then, the network features were reused for the few-shot classification task. With this approach we overcame the limitations of data scarcity and class imbalance while achieving state-of-the-art performance.