Speaker Recognition With Normal and Telephonic Assamese Speech Using I-Vector and Learning-Based Classifier

Author(s):  
Mridusmita Sharma ◽  
Rituraj Kaushik ◽  
Kandarpa Kumar Sarma

Speaker recognition is the task of identifying a person by his/her unique identification features or behavioural characteristics that are included in the speech uttered by the person. Speaker recognition deals with the identity of the speaker. It is a biometric modality which uses the features of the speaker that is influenced by one's individual behaviour as well as the characteristics of the vocal cord. The issue becomes more complex when regional languages are considered. Here, the authors report the design of a speaker recognition system using normal and telephonic Assamese speech for their case study. In their work, the authors have implemented i-vectors as features to generate an optimal feature set and have used the Feed Forward Neural Network for the recognition purpose which gives a fairly high recognition rate.

2020 ◽  
pp. 805-829
Author(s):  
Mridusmita Sharma ◽  
Rituraj Kaushik ◽  
Kandarpa Kumar Sarma

Speaker recognition is the task of identifying a person by his/her unique identification features or behavioural characteristics that are included in the speech uttered by the person. Speaker recognition deals with the identity of the speaker. It is a biometric modality which uses the features of the speaker that is influenced by one's individual behaviour as well as the characteristics of the vocal cord. The issue becomes more complex when regional languages are considered. Here, the authors report the design of a speaker recognition system using normal and telephonic Assamese speech for their case study. In their work, the authors have implemented i-vectors as features to generate an optimal feature set and have used the Feed Forward Neural Network for the recognition purpose which gives a fairly high recognition rate.


2016 ◽  
Vol 79 (1) ◽  
Author(s):  
Suhail Khokhar ◽  
A. A. Mohd Zin ◽  
M. A. Bhayo ◽  
A. S. Mokhtar

The monitoring of power quality (PQ) disturbances in a systematic and automated way is an important issue to prevent detrimental effects on power system. The development of new methods for the automatic recognition of single and hybrid PQ disturbances is at present a major concern. This paper presents a combined approach of wavelet transform based support vector machine (WT-SVM) for the automatic classification of single and hybrid PQ disturbances. The proposed approach is applied by using synthetic models of various single and hybrid PQ signals. The suitable features of the PQ waveforms were first extracted by using discrete wavelet transform. Then SVM classifies the type of PQ disturbances based on these features. The classification performance of the proposed algorithm is also compared with wavelet based radial basis function neural network, probabilistic neural network and feed-forward neural network. The experimental results show that the recognition rate of the proposed WT-SVM based classification system is more accurate and much better than the other classifiers. 


2020 ◽  
Author(s):  
chaofeng lan ◽  
yuanyuan Zhang ◽  
hongyun Zhao

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.


2020 ◽  
Vol 9 (1) ◽  
pp. 1022-1027

Driving a vehicle or a car has become tedious job nowadays due to heavy traffic so focus on driving is utmost important. This makes a scope for automation in Automobiles in minimizing human intervention in controlling the dashboard functions such as Headlamps, Indicators, Power window, Wiper System, and to make it possible this is a small effort from this paper to make driving distraction free using Voice controlled dashboard. and system proposed in this paper works on speech commands from the user (Driver or Passenger). As Speech Recognition system acts Human machine Interface (HMI) in this system hence this system makes use of Speaker recognition and Speech recognition for recognizing the command and recognize whether the command is coming from authenticated user(Driver or Passenger). System performs Feature Extraction and extracts speech features such Mel Frequency Cepstral Coefficients(MFCC),Power Spectral Density(PSD),Pitch, Spectrogram. Then further for Feature matching system uses Vector Quantization Linde Buzo Gray(VQLBG) algorithm. This algorithm makes use of Euclidean distance for calculating the distance between test feature and codebook feature. Then based on speech command recognized controller (Raspberry Pi-3b) activates the device driver for motor, Solenoid valve depending on function. This system is mainly aimed to work in low noise environment as most speech recognition systems suffer when noise is introduced. When it comes to speech recognition acoustics of the room matters a lot as recognition rate differs depending on acoustics. when several testing and simulation trials were taken for testing, system has speech recognition rate of 76.13%. This system encourages Automation of vehicle dashboard and hence making driving Distraction Free.


2011 ◽  
Vol 368-373 ◽  
pp. 1583-1587
Author(s):  
Jun Ying Chen ◽  
Jing Chen ◽  
Zeng Xi Feng

In this paper, a new shape classification method based on different feature sets using multiple classifiers is proposed. Different feature sets are derived from the shapes by using different extraction methods. The implements of feature extraction are based on two ways: Fourier descriptors and Zernike moments. Multiple classifiers comprise Normal densities based linear classifier, k-nearest neighbor classifier, Feed-Forward neural network, Radial Basis Function neural network classifier. Each classifier is trained by two feature sets respectively to form two classification results. The final classification results are a combined response of the individual classifier using six different classifier combination rules and the results were compared with those derived from multiple classifiers based on the same feature sets and individual classifier. In this study we examined the different classification tasks on Kimia dataset. For the tasks the best combination strategy was found using the product rule, giving an average recognition rate of 95.83%.


2013 ◽  
Vol 765-767 ◽  
pp. 2805-2808
Author(s):  
Guo Wen Wang ◽  
Shi Xin Luo ◽  
Li He ◽  
Gang Yin

According to the question that BP Neural Network has slow velocity of convergence and is apt to fall into the minimum value, chaos thought is adopted in the particle swarm optimization (PSO). For this, chaos particle swarm optimization algorithm, which improve the ability of getting rid of fractional extreme point in the PSO, is presented and applied to the BP network exercise so that the calculation accuracy and velocity of convergence of BP network are increased. The method of training the BP network for speaker recognition, the recognition rate and speed of training have been greatly improved, making the speaker recognition based on BP neural network to get better results.


2012 ◽  
Vol 201-202 ◽  
pp. 329-332
Author(s):  
Yue Fen Chen ◽  
Jun Huan Lin ◽  
Guo Ping Li

An effective online handwritten numeral recognition system is designed based on the Matlab GUI interface. The coordinate locations of the handwritten numerals are recorded, from which the stroke direction variations and the 2-dimensional distance between the starting point and ending point of the numeral are obtained as the features, which are encoded into 42 bits binary sequence, and then input to the Hopfield neural network. The associative memory function of the Hopfield neural network can implement the learning and recognition of the handwritten numeral. Testing results show that the designed system has high recognition rate and fast recognition speed.


Sign in / Sign up

Export Citation Format

Share Document