scholarly journals Acoustic Cues to Beat Induction: A Machine Learning Perspective

2006 ◽  
Vol 24 (2) ◽  
pp. 177-188 ◽  
Author(s):  
Fabien Gouyon ◽  
Gerhard Widmer ◽  
Xavier Serra ◽  
Arthur Flexer

This article brings forward the question of which acoustic features are the most adequate for identifying beats computationally in acoustic music pieces. We consider many different features computed on consecutive short portions of acoustic signal, among which those currently promoted in the literature on beat induction from acoustic signals and several original features, unmentioned in this literature. Evaluation of feature sets regarding their ability to provide reliable cues to the localization of beats is based on a machine learning methodology with a large corpus of beat-annotated music pieces, in audio format, covering distinctive music categories. Confirming common knowledge, energy is shown to be a very relevant cue to beat induction (especially the temporal variation of energy in various frequency bands, with the special relevance of frequency bands below 500 Hz and above 5 kHz). Some of the new features proposed in this paper are shown to outperform features currently promoted in the literature on beat induction from acoustic signals.We finally hypothesize that modeling beat induction may involve many different, complementary acoustic features and that the process of selecting relevant features should partly depend on acoustic properties of the very signal under consideration.


2017 ◽  
Author(s):  
◽  
Zeshan Peng

With the advancement of machine learning methods, audio sentiment analysis has become an active research area in recent years. For example, business organizations are interested in persuasion tactics from vocal cues and acoustic measures in speech. A typical approach is to find a set of acoustic features from audio data that can indicate or predict a customer's attitude, opinion, or emotion state. For audio signals, acoustic features have been widely used in many machine learning applications, such as music classification, language recognition, emotion recognition, and so on. For emotion recognition, previous work shows that pitch and speech rate features are important features. This thesis work focuses on determining sentiment from call center audio records, each containing a conversation between a sales representative and a customer. The sentiment of an audio record is considered positive if the conversation ended with an appointment being made, and is negative otherwise. In this project, a data processing and machine learning pipeline for this problem has been developed. It consists of three major steps: 1) an audio record is split into segments by speaker turns; 2) acoustic features are extracted from each segment; and 3) classification models are trained on the acoustic features to predict sentiment. Different set of features have been used and different machine learning methods, including classical machine learning algorithms and deep neural networks, have been implemented in the pipeline. In our deep neural network method, the feature vectors of audio segments are stacked in temporal order into a feature matrix, which is fed into deep convolution neural networks as input. Experimental results based on real data shows that acoustic features, such as Mel frequency cepstral coefficients, timbre and Chroma features, are good indicators for sentiment. Temporal information in an audio record can be captured by deep convolutional neural networks for improved prediction accuracy.





Author(s):  
Eric D. Young ◽  
Donata Oertel

Neuronal circuits in the brainstem convert the output of the ear, which carries the acoustic properties of ongoing sound, to a representation of the acoustic environment that can be used by the thalamocortical system. Most important, brainstem circuits reflect the way the brain uses acoustic cues to determine where sounds arise and what they mean. The circuits merge the separate representations of sound in the two ears and stabilize them in the face of disturbances such as loudness fluctuation or background noise. Embedded in these systems are some specialized analyses that are driven by the need to resolve tiny differences in the time and intensity of sounds at the two ears and to resolve rapid temporal fluctuations in sounds like the sequence of notes in music or the sequence of syllables in speech.



2007 ◽  
Vol 97 (2) ◽  
pp. 1470-1484 ◽  
Author(s):  
Yale E. Cohen ◽  
Frédéric Theunissen ◽  
Brian E. Russ ◽  
Patrick Gill

Communication is one of the fundamental components of both human and nonhuman animal behavior. Auditory communication signals (i.e., vocalizations) are especially important in the socioecology of several species of nonhuman primates such as rhesus monkeys. In rhesus, the ventrolateral prefrontal cortex (vPFC) is thought to be part of a circuit involved in representing vocalizations and other auditory objects. To further our understanding of the role of the vPFC in processing vocalizations, we characterized the spectrotemporal features of rhesus vocalizations, compared these features with other classes of natural stimuli, and then related the rhesus-vocalization acoustic features to neural activity. We found that the range of these spectrotemporal features was similar to that found in other ensembles of natural stimuli, including human speech, and identified the subspace of these features that would be particularly informative to discriminate between different vocalizations. In a first neural study, however, we found that the tuning properties of vPFC neurons did not emphasize these particularly informative spectrotemporal features. In a second neural study, we found that a first-order linear model (the spectrotemporal receptive field) is not a good predictor of vPFC activity. The results of these two neural studies are consistent with the hypothesis that the vPFC is not involved in coding the first-order acoustic properties of a stimulus but is involved in processing the higher-order information needed to form representations of auditory objects.



2014 ◽  
Vol 10 (1) ◽  
pp. 20130926 ◽  
Author(s):  
Tamás Faragó ◽  
Attila Andics ◽  
Viktor Devecseri ◽  
Anna Kis ◽  
Márta Gácsi ◽  
...  

Humans excel at assessing conspecific emotional valence and intensity, based solely on non-verbal vocal bursts that are also common in other mammals. It is not known, however, whether human listeners rely on similar acoustic cues to assess emotional content in conspecific and heterospecific vocalizations, and which acoustical parameters affect their performance. Here, for the first time, we directly compared the emotional valence and intensity perception of dog and human non-verbal vocalizations. We revealed similar relationships between acoustic features and emotional valence and intensity ratings of human and dog vocalizations: those with shorter call lengths were rated as more positive, whereas those with a higher pitch were rated as more intense. Our findings demonstrate that humans rate conspecific emotional vocalizations along basic acoustic rules, and that they apply similar rules when processing dog vocal expressions. This suggests that humans may utilize similar mental mechanisms for recognizing human and heterospecific vocal emotions.



Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4540
Author(s):  
Kieran Rendall ◽  
Antonia Nisioti ◽  
Alexios Mylonas

Phishing is one of the most common threats that users face while browsing the web. In the current threat landscape, a targeted phishing attack (i.e., spear phishing) often constitutes the first action of a threat actor during an intrusion campaign. To tackle this threat, many data-driven approaches have been proposed, which mostly rely on the use of supervised machine learning under a single-layer approach. However, such approaches are resource-demanding and, thus, their deployment in production environments is infeasible. Moreover, most previous works utilise a feature set that can be easily tampered with by adversaries. In this paper, we investigate the use of a multi-layered detection framework in which a potential phishing domain is classified multiple times by models using different feature sets. In our work, an additional classification takes place only when the initial one scores below a predefined confidence level, which is set by the system owner. We demonstrate our approach by implementing a two-layered detection system, which uses supervised machine learning to identify phishing attacks. We evaluate our system with a dataset consisting of active phishing attacks and find that its performance is comparable to the state of the art.



2004 ◽  
Vol 82 (5) ◽  
pp. 769-779 ◽  
Author(s):  
Isabelle Charrier ◽  
Laurie L Bloomfield ◽  
Christopher B Sturdy

The chick-a-dee call of the black-capped chickadee, Poecile atricapillus (L., 1766), consists of four note types and is used in a wide variety of contexts including mild alarm, contact between mates, and for mobilizing members of winter flocks. Because note-type composition varies with context and because birds need to identify flock mates and individuals by their calls, it is important that birds are able to discriminate between note types and birds. Moreover, previous experiments have shown that black-capped chickadees are able to discriminate their four note types, but the acoustical basis of this process is still unknown. Here, we present the results of a bioacoustic analysis that suggests which acoustic features may be controlling the birds' perception of note types and of individual identity. Several acoustic features show high note type and individual specificity, but frequency and frequency modulation cues (in particular, those of the initial part of the note) appear more likely to be used in these processes. However, only future experiments testing the bird's perceptual abilities will determine which acoustic cues in particular are used in the discrimination of note types and in individual recognition.



Author(s):  
J. Jyothi ◽  
K. Manjusha ◽  
M. Anand Kumar ◽  
K. P. Soman


2013 ◽  
Vol 846-847 ◽  
pp. 1672-1675 ◽  
Author(s):  
Yuan Ning Liu ◽  
Ye Han ◽  
Xiao Dong Zhu ◽  
Fei He ◽  
Li Yan Wei

Currently a spam filtering method is extracting attributes from e-mail header and using machine learning methods to classify the sample sets. But as time goes on, spammers transform different ways to send spam, which result in a great change of spam's header. So the attributes defined in the past could not deal with this change sufficiently. This paper extracted attributes from all possible forged header fields to expand the feature sets, then used the rough set theory to classify the sample sets. Experiment validated more attributes including in feature sets may lead to greater performance, in terms of higher recall and precision, lower fake recognition than other algorithms.



Sign in / Sign up

Export Citation Format

Share Document