Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks

Author(s):  
Heysem Kaya ◽  
Alexey A. Karpov
Author(s):  
Madhu R. Kamble ◽  
Hardik B. Sailor ◽  
Hemant A. Patil ◽  
Haizhou Li

Abstract In recent years, automatic speaker verification (ASV) is used extensively for voice biometrics. This leads to an increased interest to secure these voice biometric systems for real-world applications. The ASV systems are vulnerable to various kinds of spoofing attacks, namely, synthetic speech (SS), voice conversion (VC), replay, twins, and impersonation. This paper provides the literature review of ASV spoof detection, novel acoustic feature representations, deep learning, end-to-end systems, etc. Furthermore, the paper also summaries previous studies of spoofing attacks with emphasis on SS, VC, and replay along with recent efforts to develop countermeasures for spoof speech detection (SSD) task. The limitations and challenges of SSD task are also presented. While several countermeasures were reported in the literature, they are mostly validated on a particular database, furthermore, their performance is far from perfect. The security of voice biometrics systems against spoofing attacks remains a challenging topic. This paper is based on a tutorial presented at APSIPA Annual Summit and Conference 2017 to serve as a quick start for those interested in the topic.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2730 ◽  
Author(s):  
Wei Jiang ◽  
Zheng Wang ◽  
Jesse S. Jin ◽  
Xianfeng Han ◽  
Chunguang Li

Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


Author(s):  
Anshul Thakur ◽  
Vinayak Abrol ◽  
Pulkit Sharma ◽  
Padmanabhan Rajan

AI ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 195-208
Author(s):  
Gabriel Dahia ◽  
Maurício Pamplona Segundo

We propose a method that can perform one-class classification given only a small number of examples from the target class and none from the others. We formulate the learning of meaningful features for one-class classification as a meta-learning problem in which the meta-training stage repeatedly simulates one-class classification, using the classification loss of the chosen algorithm to learn a feature representation. To learn these representations, we require only multiclass data from similar tasks. We show how the Support Vector Data Description method can be used with our method, and also propose a simpler variant based on Prototypical Networks that obtains comparable performance, indicating that learning feature representations directly from data may be more important than which one-class algorithm we choose. We validate our approach by adapting few-shot classification datasets to the few-shot one-class classification scenario, obtaining similar results to the state-of-the-art of traditional one-class classification, and that improves upon that of one-class classification baselines employed in the few-shot setting.


2021 ◽  
pp. 147248
Author(s):  
Niels T. Haumann ◽  
Massimo Lumaca ◽  
Marina Kliuchko ◽  
Jose L. Santacruz ◽  
Peter Vuust ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jermyn Z. See ◽  
Natsumi Y. Homma ◽  
Craig A. Atencio ◽  
Vikaas S. Sohal ◽  
Christoph E. Schreiner

AbstractNeuronal activity in auditory cortex is often highly synchronous between neighboring neurons. Such coordinated activity is thought to be crucial for information processing. We determined the functional properties of coordinated neuronal ensembles (cNEs) within primary auditory cortical (AI) columns relative to the contributing neurons. Nearly half of AI cNEs showed robust spectro-temporal receptive fields whereas the remaining cNEs showed little or no acoustic feature selectivity. cNEs can therefore capture either specific, time-locked information of spectro-temporal stimulus features or reflect stimulus-unspecific, less-time specific processing aspects. By contrast, we show that individual neurons can represent both of those aspects through membership in multiple cNEs with either high or absent feature selectivity. These associations produce functionally heterogeneous spikes identifiable by instantaneous association with different cNEs. This demonstrates that single neuron spike trains can sequentially convey multiple aspects that contribute to cortical processing, including stimulus-specific and unspecific information.


Author(s):  
Jianchen Wang ◽  
Liming Yuan ◽  
Haixia Xu ◽  
Gengsheng Xie ◽  
Xianbin Wen

Sign in / Sign up

Export Citation Format

Share Document