speech recognition system
Recently Published Documents


TOTAL DOCUMENTS

1313
(FIVE YEARS 227)

H-INDEX

27
(FIVE YEARS 5)

2022 ◽  
Vol 14 (2) ◽  
pp. 614
Author(s):  
Taniya Hasija ◽  
Virender Kadyan ◽  
Kalpna Guleria ◽  
Abdullah Alharbi ◽  
Hashem Alyami ◽  
...  

Speech recognition has been an active field of research in the last few decades since it facilitates better human–computer interaction. Native language automatic speech recognition (ASR) systems are still underdeveloped. Punjabi ASR systems are in their infancy stage because most research has been conducted only on adult speech systems; however, less work has been performed on Punjabi children’s ASR systems. This research aimed to build a prosodic feature-based automatic children speech recognition system using discriminative modeling techniques. The corpus of Punjabi children’s speech has various runtime challenges, such as acoustic variations with varying speakers’ ages. Efforts were made to implement out-domain data augmentation to overcome such issues using Tacotron-based text to a speech synthesizer. The prosodic features were extracted from Punjabi children’s speech corpus, then particular prosodic features were coupled with Mel Frequency Cepstral Coefficient (MFCC) features before being submitted to an ASR framework. The system modeling process investigated various approaches, which included Maximum Mutual Information (MMI), Boosted Maximum Mutual Information (bMMI), and feature-based Maximum Mutual Information (fMMI). The out-domain data augmentation was performed to enhance the corpus. After that, prosodic features were also extracted from the extended corpus, and experiments were conducted on both individual and integrated prosodic-based acoustic features. It was observed that the fMMI technique exhibited 20% to 25% relative improvement in word error rate compared with MMI and bMMI techniques. Further, it was enhanced using an augmented dataset and hybrid front-end features (MFCC + POV + Fo + Voice quality) with a relative improvement of 13% compared with the earlier baseline system.


Author(s):  
Hosung Park ◽  
Changmin Kim ◽  
Hyunsoo Son ◽  
Soonshin Seo ◽  
Ji-Hwan Kim

In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model (DNN/HMM)-based speech recognition system has driven dramatic improvement in this area. However, it is difficult for non-experts to develop speech recognition for new applications. End-to-end approaches have simplified speech recognition system into a single-network architecture. These approaches can develop speech recognition system that does not require expert knowledge. In this paper, we propose hybrid CTC-attention network as end-to-end speech recognition model for Korean language. This model effectively utilizes a CTC objective function during attention model training. This approach improves the performance in terms of speech recognition accuracy as well as training speed. In most languages, end-to-end speech recognition uses characters as output labels. However, for Korean, character-based end-to-end speech recognition is not an efficient approach because Korean language has 11,172 possible numbers of characters. The number is relatively large compared to other languages. For example, English has 26 characters, and Japanese has 50 characters. To address this problem, we utilize Korean 49 graphemes as output labels. Experimental result shows 10.02% character error rate (CER) when 740 hours of Korean training data are used.


2022 ◽  
pp. 61-77
Author(s):  
Jie Lien ◽  
Md Abdullah Al Momin ◽  
Xu Yuan

Voice assistant systems (e.g., Siri, Alexa) have attracted wide research attention. However, such systems could receive voice information from malicious sources. Recent work has demonstrated that the voice authentication system is vulnerable to different types of attacks. The attacks are categorized into two main types: spoofing attacks and hidden voice commands. In this chapter, how to launch and defend such attacks is explored. For the spoofing attack, there are four main types, such as replay attacks, impersonation attacks, speech synthesis attacks, and voice conversion attacks. Although such attacks could be accurate on the speech recognition system, they could be easily identified by humans. Thus, the hidden voice commands have attracted a lot of research interest in recent years.


Author(s):  
Khalid Satori ◽  
Ouissam Zealouk ◽  
Hassan Satori ◽  
Mohamed Hamidi ◽  
Naouar Laaidi

Author(s):  
Erbaz Khan ◽  
Sahar Rauf ◽  
Farah Adeeba ◽  
Sarmad Hussain

Author(s):  
Ziad A. Alqadi ◽  
Sayel Shareef Rimawi

The stage of extracting the features of the speech file is one of the most important stages of building a system for identifying a person through the use of his voice. Accordingly, the choice of the method of extracting speech features is an important process because of its subsequent negative or positive effects on the speech recognition system. In this paper research we will analyze the most popular methods of speech signal features extraction: LPC, Kmeans clustering, WPT decomposition and MLBP methods. These methods will be implemented and tested using various speech files. The amplitude and sampling frequency will be changed to see the affects of changing on the extracted features. Depending on the results of analysis some recommendations will be given.


Sign in / Sign up

Export Citation Format

Share Document