speech feature
Recently Published Documents


TOTAL DOCUMENTS

231
(FIVE YEARS 36)

H-INDEX

16
(FIVE YEARS 2)

Author(s):  
Pei Jiang ◽  
Dongchen Wang

In order to improve the effect of e-commerce platform background speech synchronous recognition and solve the problem that traditional methods are vulnerable to sudden noise, resulting in poor recognition effect, this paper proposes a background speech synchronous recognition method based on Hidden Markov model. Combined with the principle of speech recognition, the speech feature is collected. Hidden Markov model is used to input and recognize high fidelity speech filter to ensure the effectiveness of signal processing results. Through the de-noising of e-commerce platform background voice, and the language signal cache and storage recognition, using vector graph buffer audio, through the Ethernet interface transplant related speech recognition sequence, thus realizing background speech synchronization, so as to realize the language recognition, improve the recognition accuracy. Finally, the experimental results show that the background speech synchronous recognition method based on Hidden Markov model is better than the traditional methods.


2022 ◽  
pp. 828-847
Author(s):  
Gaurav Aggarwal ◽  
Latika Singh

Classification of intellectually disabled children through manual assessment of speech at an early age is inconsistent, subjective, time-consuming and prone to error. This study attempts to classify the children with intellectual disabilities using two speech feature extraction techniques: Linear Predictive Coding (LPC) based cepstral parameters, and Mel-frequency cepstral coefficients (MFCC). Four different classification models: k-nearest neighbour (k-NN), support vector machine (SVM), linear discriminant analysis (LDA) and radial basis function neural network (RBFNN) are employed for classification purposes. 48 speech samples of each group are taken for analysis, from subjects with a similar age and socio-economic background. The effect of the different frame length with the number of filterbanks in the MFCC and different frame length with the order in the LPC is also examined for better accuracy. The experimental outcomes show that the projected technique can be used to help speech pathologists in estimating intellectual disability at early ages.


2021 ◽  
Author(s):  
Yuji miao ◽  
Yanan Huang ◽  
Zhenjing Da

Abstract In order to improve the effect of English speech recognition, based on digital means, this paper combines the actual needs of English speech feature recognition to improve the digital algorithm. Moreover, this paper combines fuzzy recognition algorithm to analyze English speech features, and analyzes the shortcomings of traditional algorithms, and proposes the fuzzy digitized English speech recognition algorithm, and builds an English speech feature recognition model on this basis. In addition, this paper conducts time-frequency analysis on chaotic signals and speech signals, eliminates noise in English speech features, improves the recognition effect of English speech features, and builds an English speech feature recognition system based on digital means. Finally, this paper conducts grouping experiments by inputting students' English pronunciation forms, and counts the results of the experiments to test the performance of the system. The research results show that the method proposed in this paper has a certain effect.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Ping Li ◽  
Hua Zhang ◽  
Sang-Bing Tsai

With the application of an automatic scoring system to all kinds of oral English tests at all levels, the efficiency of test implementation has been greatly improved. The traditional speech signal processing method only focuses on the extraction of scoring features, which could not ensure the accuracy of the scoring algorithm. Aiming at the reliability of the automatic scoring system, based on the principle of sequence matching, this paper adopts the spoken speech feature extraction method to extract the features of spoken English test pronunciation and establishes a dynamic optimized spoken English pronunciation signal model based on sequence matching, which could maintain good dynamic selection and clustering ability in a strong interference environment. According to the comprehensive experiment, the automatic scoring result of the system is much higher than that of the traditional method, which greatly improves the recognition ability of oral pronunciation, solves the difference between the automatic scoring of the system and the manual scoring, and promotes the computer automatic scoring system to replace or partially replace the manual marking.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Lina Sun ◽  
Mingzhi Li

With the support of big data and information technology, various sectors such as sports, health, and medical industry can realize the integration and readjustment of the existing resources, which improve the operation efficiency of the industry and tap its huge potential. With the advancement in big data analysis, voice features, and Internet of Things (IoT), personalized health management is becoming the development trend and breakthrough of sports and health industry. The application of big data will tap out the huge potential of the sports and health industry. In this paper, we have used the Mel-requency cepstrum coefficient as the speech feature processing method. When the linear frequency is transformed to the Mel frequency by Fourier transform, the calculation accuracy will decrease with the increase in the frequency, and the low-frequency signal will be retained to improve the anti-noise ability. With further study of the voice feature processing and IoT model of big data’s sports and health management, a vector addition regression was developed to compare the two real scoring features of the processing results that pave the way for further analysis and result evaluation. Through experimental verification, it is proved that the method in this paper can better learn the speech features. At the same time, with the introduction of noise reduction, the big data of speech recognition in sports health management has a stronger robustness and improves the overall system performance.


2021 ◽  
Vol 11 (14) ◽  
pp. 6393
Author(s):  
Ascensión Gallardo-Antolín ◽  
Juan M. Montero

The automatic detection of deceptive behaviors has recently attracted the attention of the research community due to the variety of areas where it can play a crucial role, such as security or criminology. This work is focused on the development of an automatic deception detection system based on gaze and speech features. The first contribution of our research on this topic is the use of attention Long Short-Term Memory (LSTM) networks for single-modal systems with frame-level features as input. In the second contribution, we propose a multimodal system that combines the gaze and speech modalities into the LSTM architecture using two different combination strategies: Late Fusion and Attention-Pooling Fusion. The proposed models are evaluated over the Bag-of-Lies dataset, a multimodal database recorded in real conditions. On the one hand, results show that attentional LSTM networks are able to adequately model the gaze and speech feature sequences, outperforming a reference Support Vector Machine (SVM)-based system with compact features. On the other hand, both combination strategies produce better results than the single-modal systems and the multimodal reference system, suggesting that gaze and speech modalities carry complementary information for the task of deception detection that can be effectively exploited by using LSTMs.


Author(s):  
Nazirul Mubin Bin Mohd Noor ◽  
Nuramira Binti Anuar ◽  
Ahmad Muhyiddin B Yusof ◽  
Puteri Rohani Megat Abdul Rahim ◽  
Daljeet Singh Sedhu A/L Janah Singh

Voice Onset Time (VOT) is commonly found in most spoken languages. It is a speech feature to indicate differences in voicing and meaning. In particular, the duration of Voice Onset Time values is directly determined by place of articulation, with labial VOT values being shorter than velar and alveolar and, sometimes, alveolar being shorter than velar. In the present study, the researchers examined the VOT values of English speakers in Malaysia, particularly Indian-Muslim English speakers in the northwest region of Malaysia.  From the analysis conducted by employing PRAAT software in examining differences in VOT values of voiced and voiceless plosives, the results revealed that there were significant differences in VOT values of bilabial plosives of /p/ and /b/ as well as alveolar plosives of /t/ and /d/ in Indian-Muslim English speakers’ community. However, there is no significant difference in the VOT values of both voiced and voiceless velar plosives of /k/ and /g/, indicating the influence of the speakers’ mother tongue in their English language use. In the case of prominence of aspiration in the present study, the results show that the Indian-Muslim English speakers in Malaysia have high VOT values in voiceless alveolar plosive /t/ (M = 0.0705, SD = 0.0509) and voiced alveolar plosive /d/ (M = 0.015, SD = 0.00). The findings highlight that there are differences in term of VOT values in bilabial plosives and alveolar plosives of English speakers between Indian-Muslim community and Malay community.           


Sign in / Sign up

Export Citation Format

Share Document