Research on Speech Recognition Method in Multi Layer Perceptual Network Environment

Aiming at the problem of low recognition rate in speech recognition methods, a speech recognition method in multi-layer perceptual network environment is proposed. In the multi-layer perceptual network environment, the speech signal is processed in the filter by using the transfer function of the filter. According to the framing process, the speech signal is windowed and framing processed to remove the silence segment of the speech signal. At the same time, the average energy of the speech signal is calculated and the zero crossing rate is calculated to extract the characteristics of the speech signal. By analyzing the principle of speech signal recognition, the process of speech recognition is designed, and the speech recognition in multi-layer perceptual network environment is realized. The experimental results show that the speech recognition method designed in this paper has good speech recognition performance

Download Full-text

Speaker‐independent word recognition method and system based upon zero‐crossing rate and energy measurement of analog speech signal

The Journal of the Acoustical Society of America ◽

10.1121/1.399796 ◽

1990 ◽

Vol 88 (2) ◽

pp. 1196-1196

Author(s):

Periagaram K. Rajasekaran

Keyword(s):

Word Recognition ◽

Speech Signal ◽

Energy Measurement ◽

Recognition Method ◽

Zero Crossing ◽

Speaker Independent ◽

Zero Crossing Rate

Download Full-text

A Proposed Speaker Recognition Method B Based on Long-Term Voice Features and Fuzzy Logic

Engineering and Technology Journal ◽

10.30684/etj.v39i1b.343 ◽

2021 ◽

Vol 39 (1B) ◽

pp. 1-10

Author(s):

Iman H. Hadi ◽

Alia K. Abdul-Hassan

Keyword(s):

Fuzzy Logic ◽

Speaker Recognition ◽

Recognition Accuracy ◽

Inner Product ◽

Maximum Frequency ◽

Recognition Method ◽

Data Set ◽

Zero Crossing ◽

Zero Crossing Rate

Speaker recognition depends on specific predefined steps. The most important steps are feature extraction and features matching. In addition, the category of the speaker voice features has an impact on the recognition process. The proposed speaker recognition makes use of biometric (voice) attributes to recognize the identity of the speaker. The long-term features were used such that maximum frequency, pitch and zero crossing rate (ZCR). In features matching step, the fuzzy inner product was used between feature vectors to compute the matching value between a claimed speaker voice utterance and test voice utterances. The experiments implemented using (ELSDSR) data set. These experiments showed that the recognition accuracy is 100% when using text dependent speaker recognition.

Download Full-text

Robust Feature Vector Set Using Higher Order Autocorrelation Coefficients

Developments in Natural Intelligence Research and Knowledge Engineering ◽

10.4018/978-1-4666-1743-8.ch009 ◽

2012 ◽

pp. 126-134

Author(s):

Poonam Bansal ◽

Amita Dev ◽

Shail Jain

Keyword(s):

Speech Recognition ◽

Speech Signal ◽

Background Noise ◽

Extraction Method ◽

Recognition Performance ◽

Spectral Estimation ◽

Higher Order ◽

Feature Extraction Method ◽

Power Spectral ◽

Cepstral Coefficients

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.

Download Full-text

Optical Flow Filtering-Based Micro-Expression Recognition Method

Electronics ◽

10.3390/electronics9122056 ◽

2020 ◽

Vol 9 (12) ◽

pp. 2056

Author(s):

Junjie Wu ◽

Jianfeng Xu ◽

Deyu Lin ◽

Min Tu

Keyword(s):

Optical Flow ◽

Recognition Accuracy ◽

Recognition Performance ◽

Recognition Rate ◽

Expression Recognition ◽

Recognition Method ◽

Video Clips ◽

Low Degree ◽

Micro Expression

The recognition accuracy of micro-expressions in the field of facial expressions is still understudied, as current research methods mainly focus on feature extraction and classification. Based on optical flow and decision thinking theory, we propose a novel micro-expression recognition method, which can filter low-quality micro-expression video clips. Determined by preset thresholds, we develop two optical flow filtering mechanisms: one based on two-branch decisions (OFF2BD) and the other based on three-way decisions (OFF3WD). In OFF2BD, which use the classical binary logic to classify images, and divide the images into positive or negative domain for further filtering. Differ from the OFF2BD, OFF3WD added boundary domain to delay to judge the motion quality of the images. In this way, the video clips with low degree of morphological change can be eliminated, so as to directly improve the quality of micro-expression features and recognition rate. From the experimental results, we verify the recognition accuracy of 61.57%, and 65.41% for CASMEII, and SMIC datasets, respectively. Through the comparative analysis, it shows that the scheme can effectively improve the recognition performance.

Download Full-text

A Multi-Language Speech Recognition Method Based on Confidence Bayesian Decision-Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.717.475 ◽

2013 ◽

Vol 717 ◽

pp. 475-480

Author(s):

Yang Jie

Keyword(s):

Decision Making ◽

Speech Recognition ◽

Recognition Rate ◽

Bayesian Decision ◽

Language Recognition ◽

Recognition Method ◽

Minimum Risk ◽

Recognition Result ◽

Rate Experiment ◽

Bayesian Decision Making

The language mixing in multi-language speech recognition is one of the hot issues of concern. After analyzing recognition problem, a method to distinguish language with re-class method according to confidence on multi-language recognition result based on Bayesian decision-making rules with minimum error rate and minimum risk was brought out. It can not only avoid cumbersome language recognition in traditional method but also achieve target of decreasing mixing cognition rate. Experiment on Chinese-English mixing recognition shows that the method can distinguish different language and improve speech recognition rate, which has practicality.

Download Full-text

Speech recognition based on zero crossing rate and energy

IEEE Transactions on Acoustics Speech and Signal Processing ◽

10.1109/tassp.1985.1164503 ◽

1985 ◽

Vol 33 (1) ◽

pp. 320-323 ◽

Cited By ~ 6

Author(s):

Yiu-Kei Lau ◽

Chok-Ki Chan

Keyword(s):

Speech Recognition ◽

Zero Crossing ◽

Zero Crossing Rate

Download Full-text

ARABIC SPEECH RECOGNITION BASED ON KNN, J48, AND LVQ

Iraqi Journal of Information & Communications Technology ◽

10.31987/ijict.2.2.57 ◽

2019 ◽

Vol 2 (2) ◽

pp. 1-8

Author(s):

Nassren A. Alwahed ◽

Talib M. Jawad

Keyword(s):

Speech Recognition ◽

Speaker Recognition ◽

Nearest Neighbor ◽

Recognition Rate ◽

Arabic Language ◽

Zero Crossing ◽

Audio Features ◽

Speech Feature ◽

Two Phases ◽

Arabic Speaker

Abstract Most systems of speaker recognition work on speech feature primarily classified of being a low level which considerably relies on speaker physical characteristics and, to the lower extent, the acquired speaking habits. In this paper present a system to recognition and identification in Arabic speaker. It includes two phases (training phase and testing phase) each phase includes the using of audio features (Mean, Standard Division, Zero Crossing, Amplitude). after get the feature, the recognition step is using (J48, KNN, LVQ),) where the Nearest Neighbor (KNN) applied o get the similarity of the data training and data testing , LVQ neural network used for Speech Recognition and Arabic language Identification. This sentence contains words especially kidnappings and kidnappers are ten sentences and pronounce these sentences by 10 people, five men and five women of different ages and each of the ten pronunciation of all sentences, so a total of 100 samples and the samples were recorded on audio and wave. The results of the sentences pronounced by women are higher than the results of the same sentences pronounced by men. They achieved better recognition rate 85, 93, 96.4%

Download Full-text

Continuous kannada speech segmentation and speech recognition based on threshold using MFCC And VQ

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp4684-4695 ◽

2019 ◽

Vol 9 (6) ◽

pp. 4684

Author(s):

Vanajakshi Puttaswamy Gowda ◽

Mathivanan Murugavelu ◽

Senthil Kumaran Thangamuthu

Keyword(s):

Speech Recognition ◽

Language Processing ◽

Speech Signal ◽

Recognition Rate ◽

Recognition System ◽

Training Data ◽

Speech Segmentation ◽

Significant Feature ◽

Mel Frequency Cepstral Coefficients ◽

Simple Method

<p><span>Continuous speech segmentation and its recognition is playing important role in natural language processing. Continuous context based Kannada speech segmentation depends on context, grammer and semantics rules present in the kannada language. The significant feature extraction of kannada speech signal for recognition system is quite exciting for researchers. In this paper proposed method is divided into two parts. First part of the method is continuous kannada speech signal segmentation with respect to the context based is carried out by computing average short term energy and its spectral centroid coefficients of the speech signal present in the specified window. The segmented outputs are completely meaningful segmentation for different scenarios with less segmentation error. The second part of the method is speech recognition by extracting less number Mel frequency cepstral coefficients with less number of codebooks using vector quantization .In this recognition is completely based on threshold value.This threshold setting is a challenging task however the simple method is used to achieve better recognition rate.The experimental results shows more efficient and effective segmentation with high recognition rate for any continuous context based kannada speech signal with different accents for male and female than the existing methods and also used minimal feature dimensions for training data.</span></p>

Download Full-text

Stop gap removal using spectral parameters for stuttered speech signal

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/521032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1862-1866

Keyword(s):

Speech Signal ◽

Spectral Energy ◽

Spectral Parameters ◽

Zero Crossing ◽

Syllable Repetition ◽

Zero Crossing Rate ◽

Automatic Removal

Stuttering is an involuntary disturbance in the fluent flow of speech characterized by disfluencies such as stop gaps, sound or syllable repetition or prolongation. There are high proportion of stop gaps in stuttering. This work presents automatic removal of stop gaps using combination of spectral parameters such as spectral energy, centroid, Entropy and Zero crossing rate. A method for detecting and removing stop gaps based on threshold is discussed in this paper

Download Full-text

A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0372 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1261-1274 ◽

Cited By ~ 5

Author(s):

Vishal Passricha ◽

Rajesh Kumar Aggarwal

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Speech Signal ◽

Short Term Memory ◽

Recognition Rate ◽

Recognition Task ◽

Acoustic Modeling ◽

Hybrid Architecture ◽

Continuous Speech Recognition ◽

Temporal Properties

Abstract Deep neural networks (DNNs) have been playing a significant role in acoustic modeling. Convolutional neural networks (CNNs) are the advanced version of DNNs that achieve 4–12% relative gain in the word error rate (WER) over DNNs. Existence of spectral variations and local correlations in speech signal makes CNNs more capable of speech recognition. Recently, it has been demonstrated that bidirectional long short-term memory (BLSTM) produces higher recognition rate in acoustic modeling because they are adequate to reinforce higher-level representations of acoustic data. Spatial and temporal properties of the speech signal are essential for high recognition rate, so the concept of combining two different networks came into mind. In this paper, a hybrid architecture of CNN-BLSTM is proposed to appropriately use these properties and to improve the continuous speech recognition task. Further, we explore different methods like weight sharing, the appropriate number of hidden units, and ideal pooling strategy for CNN to achieve a high recognition rate. Specifically, the focus is also on how many BLSTM layers are effective. This paper also attempts to overcome another shortcoming of CNN, i.e. speaker-adapted features, which are not possible to be directly modeled in CNN. Next, various non-linearities with or without dropout are analyzed for speech tasks. Experiments indicate that proposed hybrid architecture with speaker-adapted features and maxout non-linearity with dropout idea shows 5.8% and 10% relative decrease in WER over the CNN and DNN systems, respectively.

Download Full-text