An iterative mask estimation approach to deep learning based multi-channel speech recognition

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.

Download Full-text

Mel-Spectrographic Mask Estimation for Missing Data Speech Recognition using Short-Time-Fourier-Transform Ratio Estimators

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 ◽

10.1109/icassp.2007.366935 ◽

2007 ◽

Cited By ~ 2

Author(s):

Marco Kuhne ◽

Roberto Togneri ◽

Sven Nordholm

Keyword(s):

Fourier Transform ◽

Speech Recognition ◽

Missing Data ◽

Short Time Fourier Transform ◽

Ratio Estimators ◽

Mask Estimation ◽

Short Time

Download Full-text

Assistive Handwriting Haptic Mechanism Using Deep Learning Speech Recognition

New Trends in Mechanism and Machine Science - Mechanisms and Machine Science ◽

10.1007/978-3-030-55061-5_9 ◽

2020 ◽

pp. 67-77

Author(s):

Erdi Sayar

Keyword(s):

Deep Learning ◽

Speech Recognition

Download Full-text

Emotion Speech Recognition Through Deep Learning

New Trends in Computational Vision and Bio-inspired Computing ◽

10.1007/978-3-030-41862-5_140 ◽

2020 ◽

pp. 1363-1369

Author(s):

Mohammad Mohsin ◽

D. Hemavathi

Keyword(s):

Deep Learning ◽

Speech Recognition

Download Full-text

Towards Slovak-English-Mandarin Speech Recognition Using Deep Learning

2018 International Symposium ELMAR ◽

10.23919/elmar.2018.8534661 ◽

2018 ◽

Author(s):

Matus Pleva ◽

Yuan-Fu Liao ◽

Wuhua Hsu ◽

Daniel Hladek ◽

Jan Stas ◽

...

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Mandarin Speech Recognition

Download Full-text

Evaluation of Automatic Speech Recognition Systems

10.5753/sbbd.2021.17889 ◽

2021 ◽

Author(s):

Matheus Xavier Sampaio ◽

Regis Pires Magalhães ◽

Ticiana Linhares Coelho da Silva ◽

Lívia Almada Cruz ◽

Davi Romero de Vasconcelos ◽

...

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Smart Homes ◽

The Other ◽

Learning Models ◽

Recognition Systems ◽

Microsoft Azure

Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.

Download Full-text

Selection of acoustic modeling unit for Tibetan speech recognition based on deep learning

MATEC Web of Conferences ◽

10.1051/matecconf/202133606014 ◽

2021 ◽

Vol 336 ◽

pp. 06014

Author(s):

Baojia Gong ◽

Rangzhuoma Cai ◽

Zhijie Cai ◽

Yuntao Ding ◽

Maozhaxi Peng

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Experimental Verification ◽

Character Segmentation ◽

Acoustic Modeling ◽

Overall Performance ◽

Primary Problem ◽

Selection Of

The selection of the speech recognition modeling unit is the primary problem of acoustic modeling in speech recognition, and different acoustic modeling units will directly affect the overall performance of speech recognition. This paper designs the Tibetan character segmentation and labeling model and algorithm flow for the purpose of solving the problem of selecting the acoustic modeling unit in Tibetan speech recognition by studying and analyzing the deficiencies of the existing acoustic modeling units in Tibetan speech recognition. After experimental verification, the Tibetan character segmentation and labeling model and algorithm achieved good performance of character segmentation and labeling, and the accuracy of Tibetan character segmentation and labeling reached 99.98%, respectively.

Download Full-text