speech feature Latest Research Papers

In order to improve the effect of e-commerce platform background speech synchronous recognition and solve the problem that traditional methods are vulnerable to sudden noise, resulting in poor recognition effect, this paper proposes a background speech synchronous recognition method based on Hidden Markov model. Combined with the principle of speech recognition, the speech feature is collected. Hidden Markov model is used to input and recognize high fidelity speech filter to ensure the effectiveness of signal processing results. Through the de-noising of e-commerce platform background voice, and the language signal cache and storage recognition, using vector graph buffer audio, through the Ethernet interface transplant related speech recognition sequence, thus realizing background speech synchronization, so as to realize the language recognition, improve the recognition accuracy. Finally, the experimental results show that the background speech synchronous recognition method based on Hidden Markov model is better than the traditional methods.

Download Full-text

Comparisons of Speech Parameterisation Techniques for Classification of Intellectual Disability Using Machine Learning

10.4018/978-1-6684-3542-7.ch046 ◽

2022 ◽

pp. 828-847

Author(s):

Gaurav Aggarwal ◽

Latika Singh

Keyword(s):

Intellectual Disability ◽

Predictive Coding ◽

Support Vector ◽

Subjective Time ◽

Linear Predictive Coding ◽

Mel Frequency Cepstral Coefficients ◽

Linear Discriminant ◽

Frame Length ◽

Speech Feature

Classification of intellectually disabled children through manual assessment of speech at an early age is inconsistent, subjective, time-consuming and prone to error. This study attempts to classify the children with intellectual disabilities using two speech feature extraction techniques: Linear Predictive Coding (LPC) based cepstral parameters, and Mel-frequency cepstral coefficients (MFCC). Four different classification models: k-nearest neighbour (k-NN), support vector machine (SVM), linear discriminant analysis (LDA) and radial basis function neural network (RBFNN) are employed for classification purposes. 48 speech samples of each group are taken for analysis, from subjects with a similar age and socio-economic background. The effect of the different frame length with the number of filterbanks in the MFCC and different frame length with the order in the LPC is also examined for better accuracy. The experimental outcomes show that the projected technique can be used to help speech pathologists in estimating intellectual disability at early ages.

Download Full-text

Research on Speech Feature Extraction and Synthesis Algorithm Based on EEMD

10.1109/ecice52819.2021.9645625 ◽

2021 ◽

Author(s):

Xuesong Wang ◽

Shigang Wang ◽

Yifeng Guo

Keyword(s):

Feature Extraction ◽

Synthesis Algorithm ◽

Speech Feature ◽

Speech Feature Extraction

Download Full-text

English Speech Feature Recognition Based On Digital Means

10.21203/rs.3.rs-941510/v1 ◽

2021 ◽

Author(s):

Yuji miao ◽

Yanan Huang ◽

Zhenjing Da

Keyword(s):

Speech Recognition ◽

Feature Recognition ◽

Recognition System ◽

Recognition Algorithm ◽

Time Frequency ◽

Chaotic Signals ◽

Speech Feature ◽

Speech Features ◽

Fuzzy Recognition ◽

Digital Algorithm

Abstract In order to improve the effect of English speech recognition, based on digital means, this paper combines the actual needs of English speech feature recognition to improve the digital algorithm. Moreover, this paper combines fuzzy recognition algorithm to analyze English speech features, and analyzes the shortcomings of traditional algorithms, and proposes the fuzzy digitized English speech recognition algorithm, and builds an English speech feature recognition model on this basis. In addition, this paper conducts time-frequency analysis on chaotic signals and speech signals, eliminates noise in English speech features, improves the recognition effect of English speech features, and builds an English speech feature recognition system based on digital means. Finally, this paper conducts grouping experiments by inputting students' English pronunciation forms, and counts the results of the experiments to test the performance of the system. The research results show that the method proposed in this paper has a certain effect.

Download Full-text

Design of Automatic Scoring System for Oral English Test Based on Sequence Matching and Big Data Analysis

Discrete Dynamics in Nature and Society ◽

10.1155/2021/3018285 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Ping Li ◽

Hua Zhang ◽

Sang-Bing Tsai

Keyword(s):

Scoring System ◽

Speech Signal Processing ◽

Sequence Matching ◽

Dynamic Selection ◽

Feature Extraction Method ◽

Spoken English ◽

Recognition Ability ◽

Speech Feature ◽

The Difference ◽

Automatic Scoring

With the application of an automatic scoring system to all kinds of oral English tests at all levels, the efficiency of test implementation has been greatly improved. The traditional speech signal processing method only focuses on the extraction of scoring features, which could not ensure the accuracy of the scoring algorithm. Aiming at the reliability of the automatic scoring system, based on the principle of sequence matching, this paper adopts the spoken speech feature extraction method to extract the features of spoken English test pronunciation and establishes a dynamic optimized spoken English pronunciation signal model based on sequence matching, which could maintain good dynamic selection and clustering ability in a strong interference environment. According to the comprehensive experiment, the automatic scoring result of the system is much higher than that of the traditional method, which greatly improves the recognition ability of oral pronunciation, solves the difference between the automatic scoring of the system and the manual scoring, and promotes the computer automatic scoring system to replace or partially replace the manual marking.

Download Full-text

Involution Based Speech Autoencoder: Investigating the Advanced Vision Operator Performance on Speech Feature Extraction

10.1109/gcce53005.2021.9621826 ◽

2021 ◽

Author(s):

Tianle Zhong ◽

Israel Mendoza Velazquez ◽

Yoichi Haneda

Keyword(s):

Feature Extraction ◽

Operator Performance ◽

Speech Feature ◽

Speech Feature Extraction

Download Full-text

Speech feature profiles in Swedish 5-year-olds with speech sound disorder related to suspected childhood apraxia of speech or cleft palate

International Journal of Speech-Language Pathology ◽

10.1080/17549507.2021.1968951 ◽

2021 ◽

pp. 1-12

Author(s):

Ann Malmenholt ◽

Anita McAllister ◽

Anette Lohmander ◽

Per Östberg

Keyword(s):

Cleft Palate ◽

Speech Sound ◽

Apraxia Of Speech ◽

Speech Sound Disorder ◽

Childhood Apraxia Of Speech ◽

Speech Feature

Download Full-text

Sports and Health Management Using Big Data Based on Voice Feature Processing and Internet of Things

Scientific Programming ◽

10.1155/2021/3271863 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Lina Sun ◽

Mingzhi Li

Keyword(s):

Big Data ◽

Internet Of Things ◽

Health Management ◽

Low Frequency ◽

Development Trend ◽

Feature Processing ◽

Speech Feature ◽

Personalized Health ◽

The Development Trend ◽

The Voice

With the support of big data and information technology, various sectors such as sports, health, and medical industry can realize the integration and readjustment of the existing resources, which improve the operation efficiency of the industry and tap its huge potential. With the advancement in big data analysis, voice features, and Internet of Things (IoT), personalized health management is becoming the development trend and breakthrough of sports and health industry. The application of big data will tap out the huge potential of the sports and health industry. In this paper, we have used the Mel-requency cepstrum coefficient as the speech feature processing method. When the linear frequency is transformed to the Mel frequency by Fourier transform, the calculation accuracy will decrease with the increase in the frequency, and the low-frequency signal will be retained to improve the anti-noise ability. With further study of the voice feature processing and IoT model of big data’s sports and health management, a vector addition regression was developed to compare the two real scoring features of the processing results that pave the way for further analysis and result evaluation. Through experimental verification, it is proved that the method in this paper can better learn the speech features. At the same time, with the introduction of noise reduction, the big data of speech recognition in sports health management has a stronger robustness and improves the overall system performance.

Download Full-text

Detecting Deception from Gaze and Speech Using a Multimodal Attention LSTM-Based Framework

Applied Sciences ◽

10.3390/app11146393 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6393

Author(s):

Ascensión Gallardo-Antolín ◽

Juan M. Montero

Keyword(s):

Short Term Memory ◽

Deception Detection ◽

Detection System ◽

Support Vector ◽

The Gaze ◽

Combination Strategies ◽

Speech Feature ◽

Speech Features ◽

The One ◽

Modal Systems

The automatic detection of deceptive behaviors has recently attracted the attention of the research community due to the variety of areas where it can play a crucial role, such as security or criminology. This work is focused on the development of an automatic deception detection system based on gaze and speech features. The first contribution of our research on this topic is the use of attention Long Short-Term Memory (LSTM) networks for single-modal systems with frame-level features as input. In the second contribution, we propose a multimodal system that combines the gaze and speech modalities into the LSTM architecture using two different combination strategies: Late Fusion and Attention-Pooling Fusion. The proposed models are evaluated over the Bag-of-Lies dataset, a multimodal database recorded in real conditions. On the one hand, results show that attentional LSTM networks are able to adequately model the gaze and speech feature sequences, outperforming a reference Support Vector Machine (SVM)-based system with compact features. On the other hand, both combination strategies produce better results than the single-modal systems and the multimodal reference system, suggesting that gaze and speech modalities carry complementary information for the task of deception detection that can be effectively exploited by using LSTMs.

Download Full-text

Voice Onset Time (VOT) Consonants Realization of Indian-Muslim English Speakers in Malaysia

International Journal of Modern Languages And Applied Linguistics ◽

10.24191/ijmal.v5i2.13094 ◽

2021 ◽

Vol 5 (2) ◽

pp. 57

Author(s):

Nazirul Mubin Bin Mohd Noor ◽

Nuramira Binti Anuar ◽

Ahmad Muhyiddin B Yusof ◽

Puteri Rohani Megat Abdul Rahim ◽

Daljeet Singh Sedhu A/L Janah Singh

Keyword(s):

English Language ◽

Language Use ◽

Voice Onset Time ◽

Mother Tongue ◽

Onset Time ◽

English Speakers ◽

Place Of Articulation ◽

Significant Difference ◽

Speech Feature ◽

Northwest Region

Voice Onset Time (VOT) is commonly found in most spoken languages. It is a speech feature to indicate differences in voicing and meaning. In particular, the duration of Voice Onset Time values is directly determined by place of articulation, with labial VOT values being shorter than velar and alveolar and, sometimes, alveolar being shorter than velar. In the present study, the researchers examined the VOT values of English speakers in Malaysia, particularly Indian-Muslim English speakers in the northwest region of Malaysia. From the analysis conducted by employing PRAAT software in examining differences in VOT values of voiced and voiceless plosives, the results revealed that there were significant differences in VOT values of bilabial plosives of /p/ and /b/ as well as alveolar plosives of /t/ and /d/ in Indian-Muslim English speakers’ community. However, there is no significant difference in the VOT values of both voiced and voiceless velar plosives of /k/ and /g/, indicating the influence of the speakers’ mother tongue in their English language use. In the case of prominence of aspiration in the present study, the results show that the Indian-Muslim English speakers in Malaysia have high VOT values in voiceless alveolar plosive /t/ (M = 0.0705, SD = 0.0509) and voiced alveolar plosive /d/ (M = 0.015, SD = 0.00). The findings highlight that there are differences in term of VOT values in bilabial plosives and alveolar plosives of English speakers between Indian-Muslim community and Malay community.

Download Full-text

speech feature
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Background Speech Synchronous Recognition Method of E-commerce Platform Based on Hidden Markov Model

Comparisons of Speech Parameterisation Techniques for Classification of Intellectual Disability Using Machine Learning

Research on Speech Feature Extraction and Synthesis Algorithm Based on EEMD

English Speech Feature Recognition Based On Digital Means

Design of Automatic Scoring System for Oral English Test Based on Sequence Matching and Big Data Analysis

Involution Based Speech Autoencoder: Investigating the Advanced Vision Operator Performance on Speech Feature Extraction

Speech feature profiles in Swedish 5-year-olds with speech sound disorder related to suspected childhood apraxia of speech or cleft palate

Sports and Health Management Using Big Data Based on Voice Feature Processing and Internet of Things

Detecting Deception from Gaze and Speech Using a Multimodal Attention LSTM-Based Framework

Voice Onset Time (VOT) Consonants Realization of Indian-Muslim English Speakers in Malaysia

Export Citation Format

speech featureRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Background Speech Synchronous Recognition Method of E-commerce Platform Based on Hidden Markov Model

Comparisons of Speech Parameterisation Techniques for Classification of Intellectual Disability Using Machine Learning

Research on Speech Feature Extraction and Synthesis Algorithm Based on EEMD

English Speech Feature Recognition Based On Digital Means

Design of Automatic Scoring System for Oral English Test Based on Sequence Matching and Big Data Analysis

Involution Based Speech Autoencoder: Investigating the Advanced Vision Operator Performance on Speech Feature Extraction

Speech feature profiles in Swedish 5-year-olds with speech sound disorder related to suspected childhood apraxia of speech or cleft palate

Sports and Health Management Using Big Data Based on Voice Feature Processing and Internet of Things

Detecting Deception from Gaze and Speech Using a Multimodal Attention LSTM-Based Framework

Voice Onset Time (VOT) Consonants Realization of Indian-Muslim English Speakers in Malaysia

speech feature
Recently Published Documents