speech recognition Latest Research Papers

Improving Deep Learning based Automatic Speech Recognition for Gujarati

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3483446 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-18

Author(s):

Deepang Raval ◽

Vyom Pathak ◽

Muktan Patel ◽

Brijesh Bhatt

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Short Term Memory ◽

Language Model ◽

Recognition System ◽

Processing Technique ◽

Speech Corpus ◽

Novel Approach ◽

Asr System

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.

Dereverberation of autoregressive envelopes for far-field speech recognition

Computer Speech & Language ◽

10.1016/j.csl.2021.101277 ◽

2022 ◽

Vol 72 ◽

pp. 101277

Author(s):

Anurenjan Purushothaman ◽

Anirudh Sreeram ◽

Rohit Kumar ◽

Sriram Ganapathy

Keyword(s):

Speech Recognition ◽

Far Field

A Computational Look at Oral History Archives

Journal on Computing and Cultural Heritage ◽

10.1145/3477605 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-16

Author(s):

Francisca Pessanha ◽

Almila Akdag Salah

Keyword(s):

Signal Processing ◽

Natural Language Processing ◽

Speech Recognition ◽

Natural Language ◽

Oral History ◽

Language Processing ◽

Automatic Speech Recognition ◽

Visual Cues ◽

Social Signal Processing ◽

Social Signal

Computational technologies have revolutionized the archival sciences field, prompting new approaches to process the extensive data in these collections. Automatic speech recognition and natural language processing create unique possibilities for analysis of oral history (OH) interviews, where otherwise the transcription and analysis of the full recording would be too time consuming. However, many oral historians note the loss of aural information when converting the speech into text, pointing out the relevance of subjective cues for a full understanding of the interviewee narrative. In this article, we explore various computational technologies for social signal processing and their potential application space in OH archives, as well as neighboring domains where qualitative studies is a frequently used method. We also highlight the latest developments in key technologies for multimedia archiving practices such as natural language processing and automatic speech recognition. We discuss the analysis of both visual (body language and facial expressions), and non-visual cues (paralinguistics, breathing, and heart rate), stating the specific challenges introduced by the characteristics of OH collections. We argue that applying social signal processing to OH archives will have a wider influence than solely OH practices, bringing benefits for various fields from humanities to computer sciences, as well as to archival sciences. Looking at human emotions and somatic reactions on extensive interview collections would give scholars from multiple fields the opportunity to focus on feelings, mood, culture, and subjective experiences expressed in these interviews on a larger scale.

Contribution of frequency compressed temporal fine structure cues to the speech recognition in noise: An implication in cochlear implant signal processing

Applied Acoustics ◽

10.1016/j.apacoust.2021.108616 ◽

2022 ◽

Vol 189 ◽

pp. 108616

Author(s):

Venkateswarlu Poluboina ◽

Aparna Pulikala ◽

Arivudai Nambi Pitchai Muthu

Keyword(s):

Signal Processing ◽

Fine Structure ◽

Speech Recognition ◽

Cochlear Implant ◽

Temporal Fine Structure ◽

Speech Recognition In Noise

Supplemental Material for Song Properties and Familiarity Affect Speech Recognition in Musical Noise

Psychomusicology Music Mind and Brain ◽

10.1037/pmu0000284.supp ◽

2022 ◽

Keyword(s):

Speech Recognition

Noise-robust speech recognition in mobile network based on convolution neural networks

International Journal of Speech Technology ◽

10.1007/s10772-021-09950-9 ◽

2022 ◽

Author(s):

Lallouani Bouchakour ◽

Mohamed Debyeche

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Mobile Network ◽

Robust Speech Recognition ◽

Convolution Neural Networks ◽

Noise Robust Speech Recognition ◽

Noise Robust

Optical laser microphone for human-robot interaction: speech recognition in extremely noisy service environments

Advanced Robotics ◽

10.1080/01691864.2021.2023629 ◽

2022 ◽

pp. 1-14

Author(s):

Takahiro Fukumori ◽

Chengkai Cai ◽

Yutao Zhang ◽

Lotfi El Hafi ◽

Yoshinobu Hagiwara ◽

...

Keyword(s):

Speech Recognition ◽

Human Robot Interaction ◽

Robot Interaction ◽

Service Environments ◽

Optical Laser

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00233-4 ◽

2022 ◽

Vol 2022 (1) ◽

Author(s):

Siqing Qin ◽

Longbiao Wang ◽

Sheng Li ◽

Jianwu Dang ◽

Lixin Pan

Keyword(s):

Speech Recognition ◽

Transfer Learning ◽

Recognition Performance ◽

Low Resource ◽

Learning Framework ◽

Positive Effects ◽

End To End ◽

Historical Heritage ◽

First Time ◽

Asr System

AbstractConventional automatic speech recognition (ASR) and emerging end-to-end (E2E) speech recognition have achieved promising results after being provided with sufficient resources. However, for low-resource language, the current ASR is still challenging. The Lhasa dialect is the most widespread Tibetan dialect and has a wealth of speakers and transcriptions. Hence, it is meaningful to apply the ASR technique to the Lhasa dialect for historical heritage protection and cultural exchange. Previous work on Tibetan speech recognition focused on selecting phone-level acoustic modeling units and incorporating tonal information but underestimated the influence of limited data. The purpose of this paper is to improve the speech recognition performance of the low-resource Lhasa dialect by adopting multilingual speech recognition technology on the E2E structure based on the transfer learning framework. Using transfer learning, we first establish a monolingual E2E ASR system for the Lhasa dialect with different source languages to initialize the ASR model to compare the positive effects of source languages on the Tibetan ASR model. We further propose a multilingual E2E ASR system by utilizing initialization strategies with different source languages and multilevel units, which is proposed for the first time. Our experiments show that the performance of the proposed method-based ASR system exceeds that of the E2E baseline ASR system. Our proposed method effectively models the low-resource Lhasa dialect and achieves a relative 14.2% performance improvement in character error rate (CER) compared to DNN-HMM systems. Moreover, from the best monolingual E2E model to the best multilingual E2E model of the Lhasa dialect, the system’s performance increased by 8.4% in CER.

Background Speech Synchronous Recognition Method of E-commerce Platform Based on Hidden Markov Model

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.42 ◽

2022 ◽

Vol 16 ◽

pp. 344-351

Author(s):

Pei Jiang ◽

Dongchen Wang

Keyword(s):

Speech Recognition ◽

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Language Recognition ◽

Recognition Method ◽

Recognition Sequence ◽

Traditional Methods ◽

Speech Feature ◽

And Storage

In order to improve the effect of e-commerce platform background speech synchronous recognition and solve the problem that traditional methods are vulnerable to sudden noise, resulting in poor recognition effect, this paper proposes a background speech synchronous recognition method based on Hidden Markov model. Combined with the principle of speech recognition, the speech feature is collected. Hidden Markov model is used to input and recognize high fidelity speech filter to ensure the effectiveness of signal processing results. Through the de-noising of e-commerce platform background voice, and the language signal cache and storage recognition, using vector graph buffer audio, through the Ethernet interface transplant related speech recognition sequence, thus realizing background speech synchronization, so as to realize the language recognition, improve the recognition accuracy. Finally, the experimental results show that the background speech synchronous recognition method based on Hidden Markov model is better than the traditional methods.

Massive Speech Recognition Resource Scheduling System based on Grid Computing

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.22 ◽

2022 ◽

Vol 16 ◽

pp. 181-190

Author(s):

Shanshan Yang ◽

Jinjin Chao

Keyword(s):

Speech Recognition ◽

Grid Computing ◽

Large Scale ◽

Resource Scheduling ◽

Information Resources ◽

Experimental Results ◽

Scheduling System ◽

Exchange Of Information ◽

Information Scheduling ◽

Speed And Accuracy

Nowadays, there are too many large-scale speech recognition resources, which makes it difficult to ensure the scheduling speed and accuracy. In order to improve the effect of large-scale speech recognition resource scheduling, a large-scale speech recognition resource scheduling system based on grid computing is designed in this paper. In the hardware part, microprocessor, Ethernet control chip, controller and acquisition card are designed. In the software part of the system, it mainly carries out the retrieval and exchange of information resources, so as to realize the information scheduling of the same type of large-scale speech recognition resources. The experimental results show that the information scheduling time of the designed system is short, up to 2.4min, and the scheduling accuracy is high, up to 90%, in order to provide some help to effectively improve the speed and accuracy of information scheduling.

speech recognition
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Deep Learning based Automatic Speech Recognition for Gujarati

Dereverberation of autoregressive envelopes for far-field speech recognition

A Computational Look at Oral History Archives

Contribution of frequency compressed temporal fine structure cues to the speech recognition in noise: An implication in cochlear implant signal processing

Supplemental Material for Song Properties and Familiarity Affect Speech Recognition in Musical Noise

Noise-robust speech recognition in mobile network based on convolution neural networks

Optical laser microphone for human-robot interaction: speech recognition in extremely noisy service environments

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

Background Speech Synchronous Recognition Method of E-commerce Platform Based on Hidden Markov Model

Massive Speech Recognition Resource Scheduling System based on Grid Computing

Export Citation Format

speech recognitionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Deep Learning based Automatic Speech Recognition for Gujarati

Dereverberation of autoregressive envelopes for far-field speech recognition

A Computational Look at Oral History Archives

Contribution of frequency compressed temporal fine structure cues to the speech recognition in noise: An implication in cochlear implant signal processing

Supplemental Material for Song Properties and Familiarity Affect Speech Recognition in Musical Noise

Noise-robust speech recognition in mobile network based on convolution neural networks

Optical laser microphone for human-robot interaction: speech recognition in extremely noisy service environments

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

Background Speech Synchronous Recognition Method of E-commerce Platform Based on Hidden Markov Model

Massive Speech Recognition Resource Scheduling System based on Grid Computing

speech recognition
Recently Published Documents