Evaluating Open-source Toolkits for Automatic Speech Recognition of South African Languages

Author(s):  
Ashentha Naidoo ◽  
Mohohlo Tsoeu
2021 ◽  
pp. 101262
Author(s):  
Astik Biswas ◽  
Emre Yılmaz ◽  
Ewald van der Westhuizen ◽  
Febe de Wet ◽  
Thomas Niesler

2016 ◽  
Vol 81 ◽  
pp. 136-143 ◽  
Author(s):  
Elodie Gauthier ◽  
Laurent Besacier ◽  
Sylvie Voisin

Over the years, many efforts have been made on improving recognition accuracies on Automatic speech recognition (ASR) and speaker recognition (SRE), and many different technologies have been developed. Given the close relationship between these two tasks, researchers have proposed different ways to introduce techniques developed for these tasks to each other. In this paper an open source experimental framework is proposed for speech and speaker recognition. Then a unified model, Nexus-DNN is developed that is trained jointly for speech and speaker recognition. Experimental results show that the combined model can effectively perform ASR and SRE tasks.


2011 ◽  
Vol 45 (3) ◽  
pp. 289-309 ◽  
Author(s):  
Jaco Badenhorst ◽  
Charl van Heerden ◽  
Marelie Davel ◽  
Etienne Barnard

2021 ◽  
Author(s):  
Lotte Weerts ◽  
Claudia Clopath ◽  
Dan F. M. Goodman

Automatic speech recognition (ASR) software has been suggested as a candidate model of the human auditory system thanks to dramatic improvements in performance in recent years. To test this hypothesis, we compared several state-of-the-art ASR systems to results from humans on a barrage of standard psychoacoustic experiments. While some systems showed qualitative agreement with humans in some tests, in others all tested systems diverged markedly from humans. In particular, none of the models used spectral invariance, temporal fine structure or speech periodicity in a similar way to humans. We conclude that none of the tested ASR systems are yet ready to act as a strong proxy for human speech recognition. However, we note that the more recent systems with better performance also tend to better match human results, suggesting that continued cross-fertilisation of ideas between human and automatic speech recognition may be fruitful. Our software is released as an open-source toolbox to allow researchers to assess future ASR systems or add additional psychoacoustic measures.


2017 ◽  
Vol 26 (2) ◽  
pp. 24-37
Author(s):  
Eric Mabaso

This article highlights the problem that the print mode that the indigenous South African languages (IndiSAL) have largely adopted to preserve the folktale is inadequate. It raises shortfalls in support of the contention that not enough is being done to preserve the art of folktale narration and suggests a way out of the cul-de-sac. Most works on IndiSAL folktales focus on the value of preserving the art itself rather than the mode of preservation. The research follows a performance-centred approach as advocated by inter alia Marivate (1991), Bill (1996), Dorji (2010) and Backe (2014). Compared to countries such as Nigeria and Malawi, IndiSAL are lagging behind in digitization for the preservation of folktales. The article is an empirical study based on the author’s experiences and observation of folktale narration and the analysis of the transcribed form. The article critically reviews the various preservation modes and highlights their pros and cons.


Sign in / Sign up

Export Citation Format

Share Document