An iterative mask estimation approach to deep learning based multi-channel speech recognition

2019 ◽  
Vol 106 ◽  
pp. 31-43 ◽  
Author(s):  
Yan-Hui Tu ◽  
Jun Du ◽  
Lei Sun ◽  
Feng Ma ◽  
Hai-Kun Wang ◽  
...  
Author(s):  
Lery Sakti Ramba

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.


Author(s):  
Matus Pleva ◽  
Yuan-Fu Liao ◽  
Wuhua Hsu ◽  
Daniel Hladek ◽  
Jan Stas ◽  
...  

2021 ◽  
Author(s):  
Matheus Xavier Sampaio ◽  
Regis Pires Magalhães ◽  
Ticiana Linhares Coelho da Silva ◽  
Lívia Almada Cruz ◽  
Davi Romero de Vasconcelos ◽  
...  

Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.


2021 ◽  
Vol 336 ◽  
pp. 06014
Author(s):  
Baojia Gong ◽  
Rangzhuoma Cai ◽  
Zhijie Cai ◽  
Yuntao Ding ◽  
Maozhaxi Peng

The selection of the speech recognition modeling unit is the primary problem of acoustic modeling in speech recognition, and different acoustic modeling units will directly affect the overall performance of speech recognition. This paper designs the Tibetan character segmentation and labeling model and algorithm flow for the purpose of solving the problem of selecting the acoustic modeling unit in Tibetan speech recognition by studying and analyzing the deficiencies of the existing acoustic modeling units in Tibetan speech recognition. After experimental verification, the Tibetan character segmentation and labeling model and algorithm achieved good performance of character segmentation and labeling, and the accuracy of Tibetan character segmentation and labeling reached 99.98%, respectively.


2018 ◽  
Vol 9 (5) ◽  
pp. 1-28 ◽  
Author(s):  
Zixing Zhang ◽  
Jürgen Geiger ◽  
Jouni Pohjalainen ◽  
Amr El-Desoky Mousa ◽  
Wenyu Jin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document