scholarly journals Research on Speech Recognition Method in Multi Layer Perceptual Network Environment

Author(s):  
Kai Zhao ◽  
Dan Wang

Aiming at the problem of low recognition rate in speech recognition methods, a speech recognition method in multi-layer perceptual network environment is proposed. In the multi-layer perceptual network environment, the speech signal is processed in the filter by using the transfer function of the filter. According to the framing process, the speech signal is windowed and framing processed to remove the silence segment of the speech signal. At the same time, the average energy of the speech signal is calculated and the zero crossing rate is calculated to extract the characteristics of the speech signal. By analyzing the principle of speech signal recognition, the process of speech recognition is designed, and the speech recognition in multi-layer perceptual network environment is realized. The experimental results show that the speech recognition method designed in this paper has good speech recognition performance

2021 ◽  
Vol 39 (1B) ◽  
pp. 1-10
Author(s):  
Iman H. Hadi ◽  
Alia K. Abdul-Hassan

Speaker recognition depends on specific predefined steps. The most important steps are feature extraction and features matching. In addition, the category of the speaker voice features has an impact on the recognition process. The proposed speaker recognition makes use of biometric (voice) attributes to recognize the identity of the speaker. The long-term features were used such that maximum frequency, pitch and zero crossing rate (ZCR).  In features matching step, the fuzzy inner product was used between feature vectors to compute the matching value between a claimed speaker voice utterance and test voice utterances. The experiments implemented using (ELSDSR) data set. These experiments showed that the recognition accuracy is 100% when using text dependent speaker recognition.


Author(s):  
Poonam Bansal ◽  
Amita Dev ◽  
Shail Jain

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.


Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2056
Author(s):  
Junjie Wu ◽  
Jianfeng Xu ◽  
Deyu Lin ◽  
Min Tu

The recognition accuracy of micro-expressions in the field of facial expressions is still understudied, as current research methods mainly focus on feature extraction and classification. Based on optical flow and decision thinking theory, we propose a novel micro-expression recognition method, which can filter low-quality micro-expression video clips. Determined by preset thresholds, we develop two optical flow filtering mechanisms: one based on two-branch decisions (OFF2BD) and the other based on three-way decisions (OFF3WD). In OFF2BD, which use the classical binary logic to classify images, and divide the images into positive or negative domain for further filtering. Differ from the OFF2BD, OFF3WD added boundary domain to delay to judge the motion quality of the images. In this way, the video clips with low degree of morphological change can be eliminated, so as to directly improve the quality of micro-expression features and recognition rate. From the experimental results, we verify the recognition accuracy of 61.57%, and 65.41% for CASMEII, and SMIC datasets, respectively. Through the comparative analysis, it shows that the scheme can effectively improve the recognition performance.


2013 ◽  
Vol 717 ◽  
pp. 475-480
Author(s):  
Yang Jie

The language mixing in multi-language speech recognition is one of the hot issues of concern. After analyzing recognition problem, a method to distinguish language with re-class method according to confidence on multi-language recognition result based on Bayesian decision-making rules with minimum error rate and minimum risk was brought out. It can not only avoid cumbersome language recognition in traditional method but also achieve target of decreasing mixing cognition rate. Experiment on Chinese-English mixing recognition shows that the method can distinguish different language and improve speech recognition rate, which has practicality.


2019 ◽  
Vol 2 (2) ◽  
pp. 1-8
Author(s):  
Nassren A. Alwahed ◽  
Talib M. Jawad

Abstract Most systems of speaker recognition work on speech feature primarily classified of being a low level which considerably relies on speaker physical characteristics and, to the lower extent, the acquired speaking habits. In this paper present a system to recognition and identification in Arabic speaker. It includes two phases (training phase and testing phase) each phase includes the using of audio features (Mean, Standard Division, Zero Crossing, Amplitude). after get the feature, the recognition step is using (J48, KNN, LVQ),) where the Nearest Neighbor (KNN) applied o get the similarity of the data training and data testing , LVQ neural network used for Speech Recognition and Arabic language Identification. This sentence contains words especially kidnappings and kidnappers are ten sentences and pronounce these sentences by 10 people, five men and five women of different ages and each of the ten pronunciation of all sentences, so a total of 100 samples and the samples were recorded on audio and wave. The results of the sentences pronounced by women are higher than the results of the same sentences pronounced by men. They achieved better recognition rate 85, 93, 96.4%


Author(s):  
Vanajakshi Puttaswamy Gowda ◽  
Mathivanan Murugavelu ◽  
Senthil Kumaran Thangamuthu

<p><span>Continuous speech segmentation and its  recognition is playing important role in natural language processing. Continuous context based Kannada speech segmentation depends  on context, grammer and semantics rules present in the kannada language. The significant feature extraction of kannada speech signal  for recognition system is quite exciting for researchers. In this paper proposed method  is  divided into two parts. First part of the method is continuous kannada speech signal segmentation with respect to the context based is carried out  by computing  average short term energy and its spectral centroid coefficients of  the speech signal present in the specified window. The segmented outputs are completely  meaningful  segmentation  for different scenarios with less segmentation error. The second part of the method is speech recognition by extracting less number Mel frequency cepstral coefficients with less  number of codebooks  using vector quantization .In this recognition is completely based on threshold value.This threshold setting is a challenging task however the simple method is used to achieve better recognition rate.The experimental results shows more efficient  and effective segmentation    with high recognition rate for any continuous context based kannada speech signal with different accents for male and female than the existing methods and also used minimal feature dimensions for training data.</span></p>


Stuttering is an involuntary disturbance in the fluent flow of speech characterized by disfluencies such as stop gaps, sound or syllable repetition or prolongation. There are high proportion of stop gaps in stuttering. This work presents automatic removal of stop gaps using combination of spectral parameters such as spectral energy, centroid, Entropy and Zero crossing rate. A method for detecting and removing stop gaps based on threshold is discussed in this paper


2019 ◽  
Vol 29 (1) ◽  
pp. 1261-1274 ◽  
Author(s):  
Vishal Passricha ◽  
Rajesh Kumar Aggarwal

Abstract Deep neural networks (DNNs) have been playing a significant role in acoustic modeling. Convolutional neural networks (CNNs) are the advanced version of DNNs that achieve 4–12% relative gain in the word error rate (WER) over DNNs. Existence of spectral variations and local correlations in speech signal makes CNNs more capable of speech recognition. Recently, it has been demonstrated that bidirectional long short-term memory (BLSTM) produces higher recognition rate in acoustic modeling because they are adequate to reinforce higher-level representations of acoustic data. Spatial and temporal properties of the speech signal are essential for high recognition rate, so the concept of combining two different networks came into mind. In this paper, a hybrid architecture of CNN-BLSTM is proposed to appropriately use these properties and to improve the continuous speech recognition task. Further, we explore different methods like weight sharing, the appropriate number of hidden units, and ideal pooling strategy for CNN to achieve a high recognition rate. Specifically, the focus is also on how many BLSTM layers are effective. This paper also attempts to overcome another shortcoming of CNN, i.e. speaker-adapted features, which are not possible to be directly modeled in CNN. Next, various non-linearities with or without dropout are analyzed for speech tasks. Experiments indicate that proposed hybrid architecture with speaker-adapted features and maxout non-linearity with dropout idea shows 5.8% and 10% relative decrease in WER over the CNN and DNN systems, respectively.


Sign in / Sign up

Export Citation Format

Share Document