pitch shift
Recently Published Documents


TOTAL DOCUMENTS

67
(FIVE YEARS 13)

H-INDEX

13
(FIVE YEARS 0)

2021 ◽  
Vol 118 (48) ◽  
pp. e2107997118
Author(s):  
Jackson E. Graves ◽  
Paul Egré ◽  
Daniel Pressnitzer ◽  
Vincent de Gardelle

To guide behavior, perceptual systems must operate on intrinsically ambiguous sensory input. Observers are usually able to acknowledge the uncertainty of their perception, but in some cases, they critically fail to do so. Here, we show that a physiological correlate of ambiguity can be found in pupil dilation even when the observer is not aware of such ambiguity. We used a well-known auditory ambiguous stimulus, known as the tritone paradox, which can induce the perception of an upward or downward pitch shift within the same individual. In two experiments, behavioral responses showed that listeners could not explicitly access the ambiguity in this stimulus, even though their responses varied from trial to trial. However, pupil dilation was larger for the more ambiguous cases. The ambiguity of the stimulus for each listener was indexed by the entropy of behavioral responses, and this entropy was also a significant predictor of pupil size. In particular, entropy explained additional variation in pupil size independent of the explicit judgment of confidence in the specific situation that we investigated, in which the two measures were decoupled. Our data thus suggest that stimulus ambiguity is implicitly represented in the brain even without explicit awareness of this ambiguity.


2021 ◽  
Author(s):  
Matthias K. Franken ◽  
Robert Hartsuiker ◽  
Petter Johansson ◽  
Lars Hall ◽  
Andreas Lind

Various studies have claimed that the sense of agency is based on a comparison between an internal estimate of an action’s outcome and sensory feedback. With respect to speech, this presumes that speakers have a stable pre-articulatory representation of their own speech. However, recent research suggests that the sense of agency is flexible and thus in some contexts we may feel like we produced speech that was not actually produced by us. The current study tested whether the estimated pitch of one’s articulation (termed ‘pitch awareness’) is affected by manipulated auditory feedback. In four experiments, fifty-six participants produced isolated vowels while being exposed to pitch-shifted auditory feedback. After every vocalization, participants indicated whether they thought the feedback was higher or lower than their actual production. After exposure to a block of high-pitched auditory feedback (+500 cents pitch shift), participants were more likely to label subsequent auditory feedback as “lower than my actual production”, suggesting that prolonged exposure to high-pitched auditory feedback led to a drift in participants’ pitch awareness. The opposite pattern was found after exposure to a constant -500 cents pitch shift. This suggests that pitch awareness is not solely based on a pre-articulatory representation of intended speech or on a sensory prediction, but also on sensory feedback. We propose that this drift in pitch awareness could be indicative of a sense of agency over the pitch-shifted auditory feedback in the exposure block. If so, this suggests that the sense of agency over vocal output is flexible.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Peter Washington ◽  
Qandeel Tariq ◽  
Emilie Leblanc ◽  
Brianna Chrisman ◽  
Kaitlyn Dunlap ◽  
...  

AbstractStandard medical diagnosis of mental health conditions requires licensed experts who are increasingly outnumbered by those at risk, limiting reach. We test the hypothesis that a trustworthy crowd of non-experts can efficiently annotate behavioral features needed for accurate machine learning detection of the common childhood developmental disorder Autism Spectrum Disorder (ASD) for children under 8 years old. We implement a novel process for identifying and certifying a trustworthy distributed workforce for video feature extraction, selecting a workforce of 102 workers from a pool of 1,107. Two previously validated ASD logistic regression classifiers, evaluated against parent-reported diagnoses, were used to assess the accuracy of the trusted crowd’s ratings of unstructured home videos. A representative balanced sample (N = 50 videos) of videos were evaluated with and without face box and pitch shift privacy alterations, with AUROC and AUPRC scores > 0.98. With both privacy-preserving modifications, sensitivity is preserved (96.0%) while maintaining specificity (80.0%) and accuracy (88.0%) at levels comparable to prior classification methods without alterations. We find that machine learning classification from features extracted by a certified nonexpert crowd achieves high performance for ASD detection from natural home videos of the child at risk and maintains high sensitivity when privacy-preserving mechanisms are applied. These results suggest that privacy-safeguarded crowdsourced analysis of short home videos can help enable rapid and mobile machine-learning detection of developmental delays in children.


2021 ◽  
Vol 17 (3) ◽  
pp. e1008787
Author(s):  
Alejandro Tabas ◽  
Katharina von Kriegstein

Frequency modulation (FM) is a basic constituent of vocalisation in many animals as well as in humans. In human speech, short rising and falling FM-sweeps of around 50 ms duration, called formant transitions, characterise individual speech sounds. There are two representations of FM in the ascending auditory pathway: a spectral representation, holding the instantaneous frequency of the stimuli; and a sweep representation, consisting of neurons that respond selectively to FM direction. To-date computational models use feedforward mechanisms to explain FM encoding. However, from neuroanatomy we know that there are massive feedback projections in the auditory pathway. Here, we found that a classical FM-sweep perceptual effect, the sweep pitch shift, cannot be explained by standard feedforward processing models. We hypothesised that the sweep pitch shift is caused by a predictive feedback mechanism. To test this hypothesis, we developed a novel model of FM encoding incorporating a predictive interaction between the sweep and the spectral representation. The model was designed to encode sweeps of the duration, modulation rate, and modulation shape of formant transitions. It fully accounted for experimental data that we acquired in a perceptual experiment with human participants as well as previously published experimental results. We also designed a new class of stimuli for a second perceptual experiment to further validate the model. Combined, our results indicate that predictive interaction between the frequency encoding and direction encoding neural representations plays an important role in the neural processing of FM. In the brain, this mechanism is likely to occur at early stages of the processing hierarchy.


2021 ◽  
Vol 38 (2) ◽  
pp. 024301
Author(s):  
Zhang-Cai Long ◽  
Yan-Ping Zhang ◽  
Lin Luo

2020 ◽  
Author(s):  
Peter Washington ◽  
Qandeel Tariq ◽  
Emilie Leblanc ◽  
Brianna Chrisman ◽  
Kaitlyn Dunlap ◽  
...  

ABSTRACT Standard medical diagnosis of mental health conditions often requires licensed experts who are increasingly outnumbered by those at risk, limiting reach. We test the hypothesis that a trustworthy crowd of non-experts can efficiently label features needed for accurate machine learning detection of the common childhood developmental disorder autism. We implement a novel process for creating a trustworthy distributed workforce for video feature extraction, selecting a workforce of 102 workers from a pool of 1,107. Two previously validated binary autism logistic regression classifiers were used to evaluate the quality of the curated crowd’s ratings on unstructured home videos. A clinically representative balanced sample (N=50 videos) of videos were evaluated with and without face box and pitch shift privacy alterations, with AUROC and AUPRC scores >0.98. With both privacy-preserving modifications, sensitivity is preserved (96.0%) while maintaining specificity (80.0%) and accuracy (88.0%) at levels that exceed classification methods without alterations. We find that machine learning classification from features extracted by a curated nonexpert crowd achieves clinical performance for pediatric autism videos and maintains acceptable performance when privacy-preserving mechanisms are applied. These results suggest that privacy-based crowdsourcing of short videos can be leveraged for rapid and mobile assessment of behavioral health.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Razieh Alemi ◽  
Alexandre Lehmann ◽  
Mickael L. D. Deroche

Abstract Monitoring voice pitch is a fine-tuned process in daily conversations as conveying accurately the linguistic and affective cues in a given utterance depends on the precise control of phonation and intonation. This monitoring is thought to depend on whether the error is treated as self-generated or externally-generated, resulting in either a correction or inflation of errors. The present study reports on two separate paradigms of adaptation to altered feedback to explore whether participants could behave in a more cohesive manner once the error is of comparable size perceptually. The vocal behavior of normal-hearing and fluent speakers was recorded in response to a personalized size of pitch shift versus a non-specific size, one semitone. The personalized size of shift was determined based on the just-noticeable difference in fundamental frequency (F0) of each participant’s voice. Here we show that both tasks successfully demonstrated opposing responses to a constant and predictable F0 perturbation (on from the production onset) but these effects barely carried over once the feedback was back to normal, depicting a pattern that bears some resemblance to compensatory responses. Experiencing a F0 shift that is perceived as self-generated (because it was precisely just-noticeable) is not enough to force speakers to behave more consistently and more homogeneously in an opposing manner. On the contrary, our results suggest that the type of the response as well as the magnitude of the response do not depend in any trivial way on the sensitivity of participants to their own voice pitch. Based on this finding, we speculate that error correction could possibly occur even with a bionic ear, typically even when F0 cues are too subtle for cochlear implant users to detect accurately.


2020 ◽  
Vol 63 (7) ◽  
pp. 2185-2201
Author(s):  
Allison Hilger ◽  
Jennifer Cole ◽  
Jason H. Kim ◽  
Rosemary A. Lester-Smith ◽  
Charles Larson

Purpose In this study, we investigated how the direction and timing of a perturbation in voice pitch auditory feedback during phrasal production modulated the magnitude and latency of the pitch-shift reflex as well as the scaling of acoustic production of anticipatory intonation targets for phrasal prominence and boundary. Method Brief pitch auditory feedback perturbations (±200 cents for 200-ms duration) were applied during the production of a target phrase on the first or the second word of the phrase. To replicate previous work, we first measured the magnitude and latency of the pitch-shift reflex as a function of the direction and timing of the perturbation within the phrase. As a novel approach, we also measured the adjustment in the production of the phrase-final prominent word as a function of perturbation direction and timing by extracting the acoustic correlates of pitch, loudness, and duration. Results The pitch-shift reflex was greater in magnitude after perturbations on the first word of the phrase, replicating the results from Mandarin speakers in an American English–speaking population. Additionally, the production of the phrase-final prominent word was acoustically enhanced (lengthened vowel duration and increased intensity and fundamental frequency) after perturbations earlier in the phrase, but more so after perturbations on the first word in the phrase. Conclusion The timing of the pitch perturbation within the phrase modulated both the magnitude of the pitch-shift reflex and the production of the prominent word, supporting our hypothesis that speakers use auditory feedback to correct for immediate production errors and to scale anticipatory intonation targets during phrasal production.


Acta Acustica ◽  
2020 ◽  
Vol 4 (6) ◽  
pp. 24
Author(s):  
Kurt Heutschi ◽  
Beat Ott ◽  
Thomas Nussbaumer ◽  
Peter Wellig

There is a great interest in the generation of plausible drone signals in various applications, e.g. for auralization purposes or the compilation of training data for detection algorithms. Here, a methodology is presented which synthesises realistic immission signals based on laboratory recordings and subsequent signal processing. The transformation of a lab drone signal into a virtual field microphone signal has to consider a constant pitch shift to adjust for the manoeuvre specific rotational speed and the corresponding frequency dependent emission strength correction, a random pitch shift variation to account for turbulence induced rotational speed variations in the field, Doppler frequency shift and time and frequency dependent amplitude adjustments according to the different propagation effects. By evaluation of lab and field measurements, the relevant synthesizer parameters were determined. It was found that for the investigated set of drone types, the vertical radiation characteristics can be successfully described by a generic frequency dependent directivity pattern. The proposed method is applied to different drone models with a total weight between 800 g and 3.4 kg and is discussed with respect to its abilities and limitations comparing both, recordings taken in the lab and the field.


2020 ◽  
pp. 1-1
Author(s):  
Renjie Chu ◽  
Baoning Niu ◽  
Shanshan Yao ◽  
Jianquan Liu
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document