Training time and memory reduction algorithms for Speaker Recognition

Author(s):  
P. Rama Koteswara Rao ◽  
D. Vijaya Kumar ◽  
Y. Srinivasa Rao
2013 ◽  
pp. 997-1017
Author(s):  
Sue Inn Ch'ng ◽  
Kah Phooi Seng ◽  
Li-Minn Ang ◽  
Fong Tien Ong

Biometrics is a promising and viable solution to enhance information security systems compared to passwords. However, there are still several issues regarding large-scale deployment of biometrics in real-world situations that need to be resolved before biometrics can be incorporated together. One of these issues is the occurrence of high training time while enrolling a large amount of people into the system. Hence, in this chapter, the authors present the training architecture for an audio visual system for large scale people recognition over internet protocol. In the proposed architecture, a selection criteria divider unit is used to decompose the large scale people or population into smaller groups whereby each group is trained subsequently. As the input dimensions of each group is reduced compared to the original data size, the proposed structure greatly reduces the overall training time required. To combine the scores from all groups, a two-level fusion based on weighted sum rule and max rule is also proposed in this chapter. The implementation results of the proposed system show a great reduction in training time compared to a similar system trained by conventional means without any compromise on the performance of the system. In addition to the proposal of a scalable training architecture for large-scale people recognition based on audio visual data, a literature review of available audio visual speaker recognition systems and large-scale population training architectures are also presented in this chapter.


Author(s):  
Sue Inn Ch'ng ◽  
Kah Phooi Seng ◽  
Li-Minn Ang ◽  
Fong Tien Ong

Biometrics is a promising and viable solution to enhance information security systems compared to passwords. However, there are still several issues regarding large-scale deployment of biometrics in real-world situations that need to be resolved before biometrics can be incorporated together. One of these issues is the occurrence of high training time while enrolling a large amount of people into the system. Hence, in this chapter, the authors present the training architecture for an audio visual system for large scale people recognition over internet protocol. In the proposed architecture, a selection criteria divider unit is used to decompose the large scale people or population into smaller groups whereby each group is trained subsequently. As the input dimensions of each group is reduced compared to the original data size, the proposed structure greatly reduces the overall training time required. To combine the scores from all groups, a two-level fusion based on weighted sum rule and max rule is also proposed in this chapter. The implementation results of the proposed system show a great reduction in training time compared to a similar system trained by conventional means without any compromise on the performance of the system. In addition to the proposal of a scalable training architecture for large-scale people recognition based on audio visual data, a literature review of available audio visual speaker recognition systems and large-scale population training architectures are also presented in this chapter.


2020 ◽  
Vol 39 (5) ◽  
pp. 6419-6430
Author(s):  
Dusan Marcek

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.


2020 ◽  
pp. 1-12
Author(s):  
Changxin Sun ◽  
Di Ma

In the research of intelligent sports vision systems, the stability and accuracy of vision system target recognition, the reasonable effectiveness of task assignment, and the advantages and disadvantages of path planning are the key factors for the vision system to successfully perform tasks. Aiming at the problem of target recognition errors caused by uneven brightness and mutations in sports competition, a dynamic template mechanism is proposed. In the target recognition algorithm, the correlation degree of data feature changes is fully considered, and the time control factor is introduced when using SVM for classification,At the same time, this study uses an unsupervised clustering method to design a classification strategy to achieve rapid target discrimination when the environmental brightness changes, which improves the accuracy of recognition. In addition, the Adaboost algorithm is selected as the machine learning method, and the algorithm is optimized from the aspects of fast feature selection and double threshold decision, which effectively improves the training time of the classifier. Finally, for complex human poses and partially occluded human targets, this paper proposes to express the entire human body through multiple parts. The experimental results show that this method can be used to detect sports players with multiple poses and partial occlusions in complex backgrounds and provides an effective technical means for detecting sports competition action characteristics in complex backgrounds.


2020 ◽  
Vol 64 (4) ◽  
pp. 40404-1-40404-16
Author(s):  
I.-J. Ding ◽  
C.-M. Ruan

Abstract With rapid developments in techniques related to the internet of things, smart service applications such as voice-command-based speech recognition and smart care applications such as context-aware-based emotion recognition will gain much attention and potentially be a requirement in smart home or office environments. In such intelligence applications, identity recognition of the specific member in indoor spaces will be a crucial issue. In this study, a combined audio-visual identity recognition approach was developed. In this approach, visual information obtained from face detection was incorporated into acoustic Gaussian likelihood calculations for constructing speaker classification trees to significantly enhance the Gaussian mixture model (GMM)-based speaker recognition method. This study considered the privacy of the monitored person and reduced the degree of surveillance. Moreover, the popular Kinect sensor device containing a microphone array was adopted to obtain acoustic voice data from the person. The proposed audio-visual identity recognition approach deploys only two cameras in a specific indoor space for conveniently performing face detection and quickly determining the total number of people in the specific space. Such information pertaining to the number of people in the indoor space obtained using face detection was utilized to effectively regulate the accurate GMM speaker classification tree design. Two face-detection-regulated speaker classification tree schemes are presented for the GMM speaker recognition method in this study—the binary speaker classification tree (GMM-BT) and the non-binary speaker classification tree (GMM-NBT). The proposed GMM-BT and GMM-NBT methods achieve excellent identity recognition rates of 84.28% and 83%, respectively; both values are higher than the rate of the conventional GMM approach (80.5%). Moreover, as the extremely complex calculations of face recognition in general audio-visual speaker recognition tasks are not required, the proposed approach is rapid and efficient with only a slight increment of 0.051 s in the average recognition time.


Sign in / Sign up

Export Citation Format

Share Document