On audio recognition performance via robust hashing

Author(s):  
C. Bellettini ◽  
G. Mazzini
1991 ◽  
Vol 34 (2) ◽  
pp. 415-426 ◽  
Author(s):  
Richard L. Freyman ◽  
G. Patrick Nerbonne ◽  
Heather A. Cote

This investigation examined the degree to which modification of the consonant-vowel (C-V) intensity ratio affected consonant recognition under conditions in which listeners were forced to rely more heavily on waveform envelope cues than on spectral cues. The stimuli were 22 vowel-consonant-vowel utterances, which had been mixed at six different signal-to-noise ratios with white noise that had been modulated by the speech waveform envelope. The resulting waveforms preserved the gross speech envelope shape, but spectral cues were limited by the white-noise masking. In a second stimulus set, the consonant portion of each utterance was amplified by 10 dB. Sixteen subjects with normal hearing listened to the unmodified stimuli, and 16 listened to the amplified-consonant stimuli. Recognition performance was reduced in the amplified-consonant condition for some consonants, presumably because waveform envelope cues had been distorted. However, for other consonants, especially the voiced stops, consonant amplification improved recognition. Patterns of errors were altered for several consonant groups, including some that showed only small changes in recognition scores. The results indicate that when spectral cues are compromised, nonlinear amplification can alter waveform envelope cues for consonant recognition.


2018 ◽  
Vol 1 (2) ◽  
pp. 34-44
Author(s):  
Faris E Mohammed ◽  
Dr. Eman M ALdaidamony ◽  
Prof. A. M Raid

Individual identification process is a very significant process that resides a large portion of day by day usages. Identification process is appropriate in work place, private zones, banks …etc. Individuals are rich subject having many characteristics that can be used for recognition purpose such as finger vein, iris, face …etc. Finger vein and iris key-points are considered as one of the most talented biometric authentication techniques for its security and convenience. SIFT is new and talented technique for pattern recognition. However, some shortages exist in many related techniques, such as difficulty of feature loss, feature key extraction, and noise point introduction. In this manuscript a new technique named SIFT-based iris and SIFT-based finger vein identification with normalization and enhancement is proposed for achieving better performance. In evaluation with other SIFT-based iris or SIFT-based finger vein recognition algorithms, the suggested technique can overcome the difficulties of tremendous key-point extraction and exclude the noise points without feature loss. Experimental results demonstrate that the normalization and improvement steps are critical for SIFT-based recognition for iris and finger vein , and the proposed technique can accomplish satisfactory recognition performance. Keywords: SIFT, Iris Recognition, Finger Vein identification and Biometric Systems.   © 2018 JASET, International Scholars and Researchers Association    


2019 ◽  
Author(s):  
Alex Bertrams ◽  
Katja Schlegel

People high in autistic-like traits have been found to have difficulties with recognizing emotions from nonverbal expressions. However, findings on the autism—emotion recognition relationship are inconsistent. In the present study, we investigated whether speeded reasoning ability (reasoning performance under time pressure) moderates the inverse relationship between autistic-like traits and emotion recognition performance. We expected the negative correlation between autistic-like traits and emotion recognition to be less strong when speeded reasoning ability was high. MTurkers (N = 217) completed the ten item version of the Autism Spectrum Quotient (AQ-10), two emotion recognition tests using videos with sound (Geneva Emotion Recognition Test, GERT-S) and pictures (Reading the Mind in the Eyes Test, RMET), and Baddeley's Grammatical Reasoning test to measure speeded reasoning. As expected, the higher the ability in speeded reasoning, the less were higher autistic-like traits related to lower emotion recognition performance. These results suggest that a high ability in making quick mental inferences may (partly) compensate for difficulties with intuitive emotion recognition related to autistic-like traits.


Author(s):  
Khamis A. Al-Karawi

Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.


2013 ◽  
Vol 37 (3) ◽  
pp. 611-620
Author(s):  
Ing-Jr Ding ◽  
Chih-Ta Yen

The Eigen-FLS approach using an eigenspace-based scheme for fast fuzzy logic system (FLS) establishments has been attempted successfully in speech pattern recognition. However, speech pattern recognition by Eigen-FLS will still encounter a dissatisfactory recognition performance when the collected data for eigen value calculations of the FLS eigenspace is scarce. To tackle this issue, this paper proposes two improved-versioned Eigen-FLS methods, incremental MLED Eigen-FLS and EigenMLLR-like Eigen-FLS, both of which use a linear interpolation scheme for properly adjusting the target speaker’s Eigen-FLS model derived from an FLS eigenspace. Developed incremental MLED Eigen-FLS and EigenMLLR-like Eigen-FLS are superior to conventional Eigen-FLS especially in the situation of insufficient data from the target speaker.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 52
Author(s):  
Tianyi Zhang ◽  
Abdallah El Ali ◽  
Chen Wang ◽  
Alan Hanjalic ◽  
Pablo Cesar

Recognizing user emotions while they watch short-form videos anytime and anywhere is essential for facilitating video content customization and personalization. However, most works either classify a single emotion per video stimuli, or are restricted to static, desktop environments. To address this, we propose a correlation-based emotion recognition algorithm (CorrNet) to recognize the valence and arousal (V-A) of each instance (fine-grained segment of signals) using only wearable, physiological signals (e.g., electrodermal activity, heart rate). CorrNet takes advantage of features both inside each instance (intra-modality features) and between different instances for the same video stimuli (correlation-based features). We first test our approach on an indoor-desktop affect dataset (CASE), and thereafter on an outdoor-mobile affect dataset (MERCA) which we collected using a smart wristband and wearable eyetracker. Results show that for subject-independent binary classification (high-low), CorrNet yields promising recognition accuracies: 76.37% and 74.03% for V-A on CASE, and 70.29% and 68.15% for V-A on MERCA. Our findings show: (1) instance segment lengths between 1–4 s result in highest recognition accuracies (2) accuracies between laboratory-grade and wearable sensors are comparable, even under low sampling rates (≤64 Hz) (3) large amounts of neutral V-A labels, an artifact of continuous affect annotation, result in varied recognition performance.


Author(s):  
Wei Jia ◽  
Wei Xia ◽  
Yang Zhao ◽  
Hai Min ◽  
Yan-Xiang Chen

AbstractPalmprint recognition and palm vein recognition are two emerging biometrics technologies. In the past two decades, many traditional methods have been proposed for palmprint recognition and palm vein recognition and have achieved impressive results. In recent years, in the field of artificial intelligence, deep learning has gradually become the mainstream recognition technology because of its excellent recognition performance. Some researchers have tried to use convolutional neural networks (CNNs) for palmprint recognition and palm vein recognition. However, the architectures of these CNNs have mostly been developed manually by human experts, which is a time-consuming and error-prone process. In order to overcome some shortcomings of manually designed CNN, neural architecture search (NAS) technology has become an important research direction of deep learning. The significance of NAS is to solve the deep learning model’s parameter adjustment problem, which is a cross-study combining optimization and machine learning. NAS technology represents the future development direction of deep learning. However, up to now, NAS technology has not been well studied for palmprint recognition and palm vein recognition. In this paper, in order to investigate the problem of NAS-based 2D and 3D palmprint recognition and palm vein recognition in-depth, we conduct a performance evaluation of twenty representative NAS methods on five 2D palmprint databases, two palm vein databases, and one 3D palmprint database. Experimental results show that some NAS methods can achieve promising recognition results. Remarkably, among different evaluated NAS methods, ProxylessNAS achieves the best recognition performance.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1685
Author(s):  
Sakorn Mekruksavanich ◽  
Anuchit Jitpattanakul

Sensor-based human activity recognition (S-HAR) has become an important and high-impact topic of research within human-centered computing. In the last decade, successful applications of S-HAR have been presented through fruitful academic research and industrial applications, including for healthcare monitoring, smart home controlling, and daily sport tracking. However, the growing requirements of many current applications for recognizing complex human activities (CHA) have begun to attract the attention of the HAR research field when compared with simple human activities (SHA). S-HAR has shown that deep learning (DL), a type of machine learning based on complicated artificial neural networks, has a significant degree of recognition efficiency. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are two different types of DL methods that have been successfully applied to the S-HAR challenge in recent years. In this paper, we focused on four RNN-based DL models (LSTMs, BiLSTMs, GRUs, and BiGRUs) that performed complex activity recognition tasks. The efficiency of four hybrid DL models that combine convolutional layers with the efficient RNN-based models was also studied. Experimental studies on the UTwente dataset demonstrated that the suggested hybrid RNN-based models achieved a high level of recognition performance along with a variety of performance indicators, including accuracy, F1-score, and confusion matrix. The experimental results show that the hybrid DL model called CNN-BiGRU outperformed the other DL models with a high accuracy of 98.89% when using only complex activity data. Moreover, the CNN-BiGRU model also achieved the highest recognition performance in other scenarios (99.44% by using only simple activity data and 98.78% with a combination of simple and complex activities).


Sign in / Sign up

Export Citation Format

Share Document