Combination of SVM and Score Normalization for Person Identification Based on Audio-Visual Feature Fusion

2011 ◽  
Vol 186 ◽  
pp. 236-240
Author(s):  
Jie Cao ◽  
Di Wu ◽  
Zong Li Liu ◽  
Peng Pan

Aimed at the problem of low accuracy rate for face recognition and speaker recognition in noisy environment, a multi-biometric model fusing face features and speech features is presented by combining Normalization and SVM theory based on the research of feature level fusion. Face features and speech features are first extracted by pulse coupled neural network and VQ-SVM respectively. Then the distance between tested people and template people is calculated after getting the fused feature on the feature level fusion. In order to reduce the computational cost and improve the recognition performance, matching distance is normalized and finally recognized by SVM. Experiment on the ORL database show that even when the signal to noise ratio is declined, recognition rate of the fused system is clearly higher than the single system under noisy environment and the purpose of identity recognition is achieved.

2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Ujwalla Gawande ◽  
Mukesh Zaveri ◽  
Avichal Kapur

Recent times witnessed many advancements in the field of biometric and ultimodal biometric fields. This is typically observed in the area, of security, privacy, and forensics. Even for the best of unimodal biometric systems, it is often not possible to achieve a higher recognition rate. Multimodal biometric systems overcome various limitations of unimodal biometric systems, such as nonuniversality, lower false acceptance, and higher genuine acceptance rates. More reliable recognition performance is achievable as multiple pieces of evidence of the same identity are available. The work presented in this paper is focused on multimodal biometric system using fingerprint and iris. Distinct textual features of the iris and fingerprint are extracted using the Haar wavelet-based technique. A novel feature level fusion algorithm is developed to combine these unimodal features using the Mahalanobis distance technique. A support-vector-machine-based learning algorithm is used to train the system using the feature extracted. The performance of the proposed algorithms is validated and compared with other algorithms using the CASIA iris database and real fingerprint database. From the simulation results, it is evident that our algorithm has higher recognition rate and very less false rejection rate compared to existing approaches.


2021 ◽  
Vol 5 (4) ◽  
pp. 229-250
Author(s):  
Chetana Kamlaskar ◽  
◽  
Aditya Abhyankar ◽  

<abstract><p>For reliable and accurate multimodal biometric based person verification, demands an effective discriminant feature representation and fusion of the extracted relevant information across multiple biometric modalities. In this paper, we propose feature level fusion by adopting the concept of canonical correlation analysis (CCA) to fuse Iris and Fingerprint feature sets of the same person. The uniqueness of this approach is that it extracts maximized correlated features from feature sets of both modalities as effective discriminant information within the features sets. CCA is, therefore, suitable to analyze the underlying relationship between two feature spaces and generates more powerful feature vectors by removing redundant information. We demonstrate that an efficient multimodal recognition can be achieved with a significant reduction in feature dimensions with less computational complexity and recognition time less than one second by exploiting CCA based joint feature fusion and optimization. To evaluate the performance of the proposed system, Left and Right Iris, and thumb Fingerprints from both hands of the SDUMLA-HMT multimodal dataset are considered in this experiment. We show that our proposed approach significantly outperforms in terms of equal error rate (EER) than unimodal system recognition performance. We also demonstrate that CCA based feature fusion excels than the match score level fusion. Further, an exploration of the correlation between Right Iris and Left Fingerprint images (EER of 0.1050%), and Left Iris and Right Fingerprint images (EER of 1.4286%) are also presented to consider the effect of feature dominance and laterality of the selected modalities for the robust multimodal biometric system.</p></abstract>


The performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. Mel scale and Bark scale are designed according to human auditory system. The filter bank structure is defined using Mel and Bark scales for speech and speaker recognition systems to extract speaker specific speech features. In this work, performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. It is found that Bark scale centre frequencies are more effective than Mel scale centre frequencies in case of Indian dialect speaker databases. Mel scale is defined as per interpretation of pitch by human ear and Bark scale is based on critical band selectivity at which loudness becomes significantly different. The recognition rate achieved using Bark scale filter bank is 96% for AISSMSIOIT database and 95% for Marathi database.


2020 ◽  
Author(s):  
chaofeng lan ◽  
yuanyuan Zhang ◽  
hongyun Zhao

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.


Electronics ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 20
Author(s):  
Linhui Sun ◽  
Yunyi Bu ◽  
Bo Zou ◽  
Sheng Fu ◽  
Pingan Li

Extracting speaker’s personalized feature parameters is vital for speaker recognition. Only one kind of feature cannot fully reflect the speaker’s personality information. In order to represent the speaker’s identity more comprehensively and improve speaker recognition rate, we propose a speaker recognition method based on the fusion feature of a deep and shallow recombination Gaussian supervector. In this method, the deep bottleneck features are first extracted by Deep Neural Network (DNN), which are used for the input of the Gaussian Mixture Model (GMM) to obtain the deep Gaussian supervector. On the other hand, we input the Mel-Frequency Cepstral Coefficient (MFCC) to GMM directly to extract the traditional Gaussian supervector. Finally, the two categories of features are combined in the form of horizontal dimension augmentation. In addition, when the number of speakers to be recognized increases, in order to prevent the system recognition rate from falling sharply, we introduce the optimization algorithm to find the optimal weight before the feature fusion. The experiment results indicate that the speaker recognition rate based on the feature which is fused directly can reach 98.75%, which is 5% and 0.62% higher than the traditional feature and deep bottleneck feature, respectively. When the number of speakers increases, the fusion feature based on optimized weight coefficients can improve the recognition rate by 0.81%. It is validated that our proposed fusion method can effectively consider the complementarity of the different types of features and improve the speaker recognition rate.


Author(s):  
Mrs. G. Ananthi ◽  
Dr. J. Raja Sekar ◽  
D. Apsara ◽  
A. K. Gajalakshmi ◽  
S. Tapthi

Palm print identification has been used in various applications in several years. Various methods have been proposed for providing biometric security through palm print authentication. One such a method was feature level fusion which used multiple feature extraction and gives higher accuracy. But it needed to design a new matcher and acquired many training samples. However, it cannot adapt to scenarios like multimodal biometric, regional fusion, contactless and complete direction representation. This problem will be overcome by score level fusion method. In this article, we propose a salient and discriminative descriptor learning method (SDDLM) and gray-level co-occurrence matrix (GLCM).The score values of SDDLM and GLCM are integrated using score level fusion to provide enhanced score. Experiments were conducted on IITD palm print V1 database. The combination of SDDLM AND GLCM methods will be useful in achieving higher performance. It provides good recognition rate and reduces computation burden.


2021 ◽  
Vol 57 (2) ◽  
pp. 313-321
Author(s):  
S Shuma ◽  
◽  
T. Christy Bobby ◽  
S. Malathi ◽  
◽  
...  

Emotion recognition is important in human communication and to achieve a complete interaction between humans and machines. In medical applications, emotion recognition is used to assist the children with Autism Spectrum Disorder (ASD to improve their socio-emotional communication, helps doctors with diagnosis of diseases such as depression and dementia and also helps the caretakers of older patients to monitor their well-being. This paper discusses the application of feature level fusion of speech and facial expressions of different emotions such as neutral, happy, sad, angry, surprise, fearful and disgust. Also, to explore how best to build the deep learning networks to classify the emotions independently and jointly from these two modalities. VGG-model is utilized to extract features from facial images, and spectral features are extracted from speech signals. Further, feature level fusion technique is adopted to fuse the features extracted from the two modalities. Principal Component Analysis (PCA is implemented to choose the significant features. The proposed method achieved a maximum score of 90% on training set and 82% on validation set. The recognition rate in case of multimodal data improved greatly when compared to unimodal system. The multimodal system gave an improvement of 9% compared to the performance of the system based on speech. Thus, result shows that the proposed Multimodal Emotion Recognition (MER outperform the unimodal emotion recognition system.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Rabab A. Rasool

The design of a robust human identification system is in high demand in most modern applications such as internet banking and security, where the multifeature biometric system, also called feature fusion biometric system, is one of the common solutions that increases the system reliability and improves recognition accuracy. This paper implements a comprehensive comparison between two fusion methods, named the feature-level fusion and score-level fusion, to determine which method highly improves the overall system performance. The comparison takes into consideration the image quality for the six combination datasets as well as the type of the applied feature extraction method. The four feature extraction methods, local binary pattern (LBP), gray-level co-occurrence matrix (GLCM), principle component analysis (PCA), and Fourier descriptors (FDs), are applied separately to generate the face-iris machine vector dataset. The experimental results highlighted that the recognition accuracy has been significantly improved when the texture descriptor method, such as LBP, or the statistical method, such as PCA, is utilized with the score-level rather than feature-level fusion for all combination datasets. The maximum recognition accuracy is obtained at 97.53% with LBP and score-level fusion where the Euclidean distance (ED) is considered to measure the maximum accuracy rate at the minimum equal error rate (EER) value.


Author(s):  
Shaokang Zhang ◽  
Chao Wang ◽  
Qindong Sun

As one of the main signal sources of underwater acoustic target recognition, the target noise signal is difficult to characterize the characteristics of the target from clearly comparing with the multi-sensor detection technology, which may lead to lower recognition rate and higher false alarm rate and seriously restricts the function of underwater acoustic detection system. In order to solve this problem, a multi-layers LSTM underwater acoustic target noise feature extraction model is established by using the long short term memory network. The information features such as time-domain envelope of target noise, Demon line spectrum and Mel frequency cepstrum coefficient are extracted, and a subset of multi-classes features is constructed. On this basis, the feature level fusion recognition and classification model based on the multi-classes features subset and the decision level fusion recognition and classification model based on the D-S evidence theory are established, and the above-mentioned models are tested by using the sample database. The difference of classification result between the multi-classes feature fusion and the single class feature recognition classification is compared, and the above model is tested and verified by using the relevant test data of port basin verification experiment. The correlation results show that the present intelligent recognition and classification method of underwater target noise based on the multi-classes feature fusion is more robust, and the recognition rate and false alarm rate of underwater target are better than those of single category feature discrimination method.


Computers ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 21
Author(s):  
Mehwish Leghari ◽  
Shahzad Memon ◽  
Lachhman Das Dhomeja ◽  
Akhtar Hussain Jalbani ◽  
Asghar Ali Chandio

The extensive research in the field of multimodal biometrics by the research community and the advent of modern technology has compelled the use of multimodal biometrics in real life applications. Biometric systems that are based on a single modality have many constraints like noise, less universality, intra class variations and spoof attacks. On the other hand, multimodal biometric systems are gaining greater attention because of their high accuracy, increased reliability and enhanced security. This research paper proposes and develops a Convolutional Neural Network (CNN) based model for the feature level fusion of fingerprint and online signature. Two types of feature level fusion schemes for the fingerprint and online signature have been implemented in this paper. The first scheme named early fusion combines the features of fingerprints and online signatures before the fully connected layers, while the second fusion scheme named late fusion combines the features after fully connected layers. To train and test the proposed model, a new multimodal dataset consisting of 1400 samples of fingerprints and 1400 samples of online signatures from 280 subjects was collected. To train the proposed model more effectively, the size of the training data was further increased using augmentation techniques. The experimental results show an accuracy of 99.10% achieved with early feature fusion scheme, while 98.35% was achieved with late feature fusion scheme.


Sign in / Sign up

Export Citation Format

Share Document