scholarly journals Higher-Order Musical Temporal Structure in Bird Song

2021 ◽  
Vol 12 ◽  
Author(s):  
Hans T. Bilger ◽  
Emily Vertosick ◽  
Andrew Vickers ◽  
Konrad Kaczmarek ◽  
Richard O. Prum

Bird songs often display musical acoustic features such as tonal pitch selection, rhythmicity, and melodic contouring. We investigated higher-order musical temporal structure in bird song using an experimental method called “music scrambling” with human subjects. Recorded songs from a phylogenetically diverse group of 20 avian taxa were split into constituent elements (“notes” or “syllables”) and recombined in original and random order. Human subjects were asked to evaluate which version sounded more “musical” on a per-species basis. Species identity and stimulus treatment were concealed from subjects, and stimulus presentation order was randomized within and between taxa. Two recordings of human music were included as a control for attentiveness. Participants varied in their assessments of individual species musicality, but overall they were significantly more likely to rate bird songs with original temporal sequence as more musical than those with randomized temporal sequence. We discuss alternative hypotheses for the origins of avian musicality, including honest signaling, perceptual bias, and arbitrary aesthetic coevolution.

2018 ◽  
Vol 30 (12) ◽  
pp. 3151-3167 ◽  
Author(s):  
Dmitry Krotov ◽  
John Hopfield

Deep neural networks (DNNs) trained in a supervised way suffer from two known problems. First, the minima of the objective function used in learning correspond to data points (also known as rubbish examples or fooling images) that lack semantic similarity with the training data. Second, a clean input can be changed by a small, and often imperceptible for human vision, perturbation so that the resulting deformed input is misclassified by the network. These findings emphasize the differences between the ways DNNs and humans classify patterns and raise a question of designing learning algorithms that more accurately mimic human perception compared to the existing methods. Our article examines these questions within the framework of dense associative memory (DAM) models. These models are defined by the energy function, with higher-order (higher than quadratic) interactions between the neurons. We show that in the limit when the power of the interaction vertex in the energy function is sufficiently large, these models have the following three properties. First, the minima of the objective function are free from rubbish images, so that each minimum is a semantically meaningful pattern. Second, artificial patterns poised precisely at the decision boundary look ambiguous to human subjects and share aspects of both classes that are separated by that decision boundary. Third, adversarial images constructed by models with small power of the interaction vertex, which are equivalent to DNN with rectified linear units, fail to transfer to and fool the models with higher-order interactions. This opens up the possibility of using higher-order models for detecting and stopping malicious adversarial attacks. The results we present suggest that DAMs with higher-order energy functions are more robust to adversarial and rubbish inputs than DNNs with rectified linear units.


2017 ◽  
Author(s):  
Krishna C. Puvvada ◽  
Jonathan Z. Simon

AbstractThe ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically-based representations in the auditory nerve, into perceptually distinct auditory-objects based representation in auditory cortex. Here, using magnetoencephalography (MEG) recordings from human subjects, both men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in auditory cortex contain dominantly spectro-temporal based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. In contrast, we also show that higher order auditory cortical areas represent the attended stream separately, and with significantly higher fidelity, than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Taken together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of human auditory cortex.Significance StatementUsing magnetoencephalography (MEG) recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of auditory cortex. We show that the primary-like areas in auditory cortex use a dominantly spectro-temporal based representation of the entire auditory scene, with both attended and ignored speech streams represented with almost equal fidelity. In contrast, we show that higher order auditory cortical areas represent an attended speech stream separately from, and with significantly higher fidelity than, unattended speech streams. Furthermore, the unattended background streams are represented as a single undivided background object rather than as distinct background objects.


1999 ◽  
Vol 17 (2) ◽  
pp. 223-239 ◽  
Author(s):  
Piet G. Vos ◽  
Paul P. Verkaart

Listeners' ability to infer the mode (major vs. minor) of a piece of Western tonal music was examined. Twenty-four subjects, divided into two groups according to their level of musical expertise, evaluated 11 musical stimuli, selected from J. S. Bach's "Well-Tempered Clavier". The stimuli included both unambiguous and ambiguous examples of the two modes, as well as one example of a modulation (from minor into major). The stimuli consisted of unaccompanied melodic openings of compositions, each containing 10 tones. Stimulus presentation and evaluation took place in nine progressively longer steps, starting with presentation of the first two tones, followed by their evaluation on a continuous scale, with 0 = "extremely minor" and 100 = "extremely major," and ending with evaluation of the complete stimulus. The results showed that mode inference followed the prescribed modes and tended to become more definite with increasing stimulus length. Experts were generally more definite in their inferences than were nonexperts. Surprisingly, the temporal structure of stimuli also appeared to affect mode inference. The degree of definiteness of mode judgments did not systematically differ between the two modes. It was concluded that listeners are able to infer the mode of a piece of music in the absence of explicit harmonic cues. The generalizability of the results with respect to music pieces of late periods in Western music history and the impact of different musical genres on mode inference are discussed. /// Onderwerp van onderzoek betrof het perceptuele onderscheid tussen majeur en mineur. Vierentwintig proefpersonen, verdeeld in twee groepen die verschilden in nivo van muzikale expertise, evalueerden 11 hen onbekende muziek stimuli, gekozen uit J. S. Bach's "Wohltemperierte Klavier". De stimuli bevatten zowel ondubbelzinnige als ambigue voorbeelden van de twee toonsoort geslachten, alsmede een voorbeeld van modulatie (in dit geval van mineur naar majeur). De stimuli bestonden uit ongeharmonizeerde melodische openingen van composities, elk 10 tonen lang. Stimulus aanbieding en (majeur/mineur) evaluatie vonden plaats in negen toenemend langere stappen, beginnend met de aanbieding van de eerste twee tonen van een stimulus, gevolgd door een evaluatie daarvan (op een continue schaal met 0 = "uitgesproken mineur" en 100 = "uitgesproken majeur"), en eindigend met de evaluatie van de complete stimulus. De resultaten lieten zien dat de evaluaties de profilering der voorgeschreven toonsoortgeslachten volgden en stelliger werden met toenemende stimulus lengte. De experts bleken doorgaans zekerder in hun kwalificaties dan de nonexperts. Verrassend genoeg bleek ook de temporele struktuur der stimuli de beoordeling te beïnvloeden. Geconcludeerd werd dat luisteraars in staat zijn om het geslacht van de toonsoort waarin een muziekstuk staat te identificeren in afwezigheid van expliciete harmonische informatie. De generalizeerbaarheid der resultaten met betrekking tot muziekstukken uit latere perioden in de Westerse tonale muziekgeschiedenis alswel de mogelijke invloed van verschillende muzikale genres op de majeur/mineur interpretatie werden ter discussie gesteld.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Kent Livezey

Identifying species of birds by their songs is an important part of censusing, watching, and enjoying birds. However, differentiating among scores or hundreds of bird songs in an area can be difficult. Placing songs into a descriptive key can help in this endeavor by requiring the user to analyze each song and to identify similarities and differences among songs. In 2016, I published a bird song key to the Pipeline Road area in and adjacent to Soberanía National Park, Panama, which included 321 songs of 216 species. This key is, to my knowledge, the largest bird song key in the world. Since the key was published, two species—Rufous-breasted Wren (Pheugopedius rutilus) and Rufous-and-white Wren (Thryophilus rufalbus)—have moved into the area. This addendum adds three songs of Rufous-breasted Wren and three songs Rufous-and-white Wren to the key, thereby increasing the key’s song total to 327 and its species total to 218.


2018 ◽  
Author(s):  
Christopher Baldassano ◽  
Uri Hasson ◽  
Kenneth A. Norman

AbstractUnderstanding movies and stories requires maintaining a high-level situation model that abstracts away from perceptual details to describe the location, characters, actions, and causal relationships of the currently unfolding event. These models are built not only from information present in the current narrative, but also from prior knowledge about schematic event scripts, which describe typical event sequences encountered throughout a lifetime. We analyzed fMRI data from 44 human subjects presented with sixteen three-minute stories, consisting of four schematic events drawn from two different scripts (eating at a restaurant or going through the airport). Aside from this shared script structure, the stories varied widely in terms of their characters and storylines, and were presented in two highly dissimilar formats (audiovisual clips or spoken narration). One group was presented with the stories in an intact temporal sequence, while a separate control group was presented with the same events in scrambled order. Regions including the posterior medial cortex, medial prefrontal cortex (mPFC), and superior frontal gyrus exhibited schematic event patterns that generalized across stories, subjects, and modalities. Patterns in mPFC were also sensitive to overall script structure, with temporally scrambled events evoking weaker schematic representations. Using a Hidden Markov Model, patterns in these regions can predict the script (restaurant vs. airport) of unlabeled data with high accuracy, and can be used to temporally align multiple stories with a shared script. These results extend work on the perception of controlled, artificial schemas in human and animal experiments to naturalistic perception of complex narrative stimuli.Significance StatementIn almost all situations we encounter in our daily lives, we are able to draw on our schematic knowledge about what typically happens in the world to better perceive and mentally represent our ongoing experiences. In contrast to previous studies that investigated schematic cognition using simple, artificial associations, we measured brain activity from subjects watching movies and listening to stories depicting restaurant or airport experiences. Our results reveal a network of brain regions that is sensitive to the shared temporal structure of these naturalistic situations. These regions abstract away from the particular details of each story, activating a representation of the general type of situation being perceived.


2019 ◽  
Author(s):  
Jiei Kuroyanagi ◽  
Shoichiro Sato ◽  
Meng-Jou Ho ◽  
Gakuto Chiba ◽  
Joren Six ◽  
...  

The uniqueness of human music relative to speech and animal song has been extensively debated, but never directly measured. To address this, we applied an automated scale analysis algorithm to a sample of 86 recordings of human music, human speech, and bird songs from around the world. We found that human music throughout the world uniquely emphasized scales with small-integer ratios, particularly a perfect 5th (3:2 ratio), while human speech and bird song showed no clear evidence of scale-like tuning. We speculate that the uniquely human tendency toward scales with small-integer ratios may have resulted from the evolution of synchronized group performance among humans.


Author(s):  
Heather Williams ◽  
Robert F. Lachlan

In studies of cumulative cultural evolution in non-human animals, the focus is most often on incremental changes that increase the efficacy of an existing form of socially learned behaviour, such as the refinement of migratory pathways. In this paper, we compare the songs of different species to describe patterns of evolution in the acoustic structure of bird songs, and explore the question of what building blocks might underlie cumulative cultural evolution of bird song using a comparative approach. We suggest that three steps occurred: first, imitation of independent sounds, or notes, via social learning; second, the formation of categories of note types; and third, assembling note types into sequences with defined structures. Simple sequences can then be repeated to form simple songs or concatenated with other sequences to form segmented songs, increasing complexity. Variant forms of both the notes and the sequencing rules may then arise due to copy errors and innovation. Some variants may become established in the population because of learning biases or selection, increasing signal efficiency, or because of cultural drift. Cumulative cultural evolution of bird songs thus arises from cognitive processes such as vocal imitation, categorization during memorization and learning biases applied to basic acoustic building blocks. This article is part of a discussion meeting issue ‘The emergence of collective knowledge and cumulative culture in animals, humans and machines’.


2021 ◽  
Vol 15 ◽  
Author(s):  
Arthur Prével ◽  
Ruth M. Krebs

In a new environment, humans and animals can detect and learn that cues predict meaningful outcomes, and use this information to adapt their responses. This process is termed Pavlovian conditioning. Pavlovian conditioning is also observed for stimuli that predict outcome-associated cues; a second type of conditioning is termed higher-order Pavlovian conditioning. In this review, we will focus on higher-order conditioning studies with simultaneous and backward conditioned stimuli. We will examine how the results from these experiments pose a challenge to models of Pavlovian conditioning like the Temporal Difference (TD) models, in which learning is mainly driven by reward prediction errors. Contrasting with this view, the results suggest that humans and animals can form complex representations of the (temporal) structure of the task, and use this information to guide behavior, which seems consistent with model-based reinforcement learning. Future investigations involving these procedures could result in important new insights on the mechanisms that underlie Pavlovian conditioning.


2019 ◽  
Vol 22 (03) ◽  
pp. 1950006
Author(s):  
ANDREW MELLOR

Recent advances in data collection and storage have allowed both researchers and industry alike to collect data in real time. Much of this data comes in the form of ‘events’, or timestamped interactions, such as email and social media posts, website clickstreams, or protein–protein interactions. This type of data poses new challenges for modeling, especially if we wish to preserve all temporal features and structure. We highlight several recent approaches in modeling higher-order temporal interaction and bring them together under the umbrella of event graphs. Through examples, we demonstrate how event graphs can be used to understand the higher-order topological-temporal structure of temporal networks and capture properties of the network that are unobservable when considering either a static (or time-aggregated) model. We introduce new algorithms for temporal motif enumeration and provide a novel analysis of the communicability centrality for temporal networks. Furthermore, we show that by modeling a temporal network as an event graph our analysis extends easily to non-dyadic interactions, known as hyper-events.


Sign in / Sign up

Export Citation Format

Share Document