Speech Recognition Using Energy, MFCCs and Rho Parameters to Classify Syllables in the Spanish Language

Author(s):  
Sergio Suárez Guerra ◽  
José Luis Oropeza Rodríguez ◽  
Edgardo Manuel Felipe Riveron ◽  
Jesús Figueroa Nazuno

2021 ◽  
Vol 11 (19) ◽  
pp. 8872
Author(s):  
Iván G. Torre ◽  
Mónica Romero ◽  
Aitor Álvarez

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.



Author(s):  
Robinson Jiménez-Moreno ◽  
Javier Orlando Pinzón-Arenas ◽  
César Giovany Pachón-Suescún

This article presents a work oriented to assistive robotics, where a scenario is established for a robot to reach a tool in the hand of a user, when they have verbally requested it by his name. For this, three convolutional neural networks are trained, one for recognition of a group of tools, which obtained an accuracy of 98% identifying the tools established for the application, that are scalpel, screwdriver and scissors; one for speech recognition, trained with the names of the tools in Spanish language, where its validation accuracy reach a 97.5% in the recognition of the words; and another for recognition of the user's hand, taking in consideration the classification of 2 gestures: Open and Closed hand, where a 96.25% accuracy was achieved. With those networks, tests in real time are performed, presenting results in the delivery of each tool with a 100% of accuracy, i.e. the robot was able to identify correctly what the user requested, recognize correctly each tool and deliver the one need when the user opened their hand, taking an average time of 45 seconds in the execution of the application.



Ingeniería ◽  
2017 ◽  
Vol 22 (3) ◽  
pp. 362 ◽  
Author(s):  
Juan David Celis Nuñez ◽  
Rodrigo Andres Llanos Castro ◽  
Byron Medina Delgado ◽  
Sergio Basilio Sepúlveda Mora ◽  
Sergio Alexander Castro Casadiego

 Context: Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta's dialect.Method: in this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.Results: we obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100 % accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.Conclusions: The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.



Author(s):  
Sergio Suárez Guerra ◽  
José Luis Oropeza Rodríguez ◽  
Edgardo M. Felipe Riveron ◽  
Jesús Figueroa Nazuno


2019 ◽  
Vol 62 (6) ◽  
pp. 2009-2017
Author(s):  
Yuxia Wang ◽  
Zhaoyu Lu ◽  
Xiaohu Yang ◽  
Chang Liu


2008 ◽  
Vol 18 (1) ◽  
pp. 19-24
Author(s):  
Erin C. Schafer

Children who use cochlear implants experience significant difficulty hearing speech in the presence of background noise, such as in the classroom. To address these difficulties, audiologists often recommend frequency-modulated (FM) systems for children with cochlear implants. The purpose of this article is to examine current empirical research in the area of FM systems and cochlear implants. Discussion topics will include selecting the optimal type of FM receiver, benefits of binaural FM-system input, importance of DAI receiver-gain settings, and effects of speech-processor programming on speech recognition. FM systems significantly improve the signal-to-noise ratio at the child's ear through the use of three types of FM receivers: mounted speakers, desktop speakers, or direct-audio input (DAI). This discussion will aid audiologists in making evidence-based recommendations for children using cochlear implants and FM systems.





1998 ◽  
Vol 41 (2) ◽  
pp. 285-299 ◽  
Author(s):  
Mark C. Flynn ◽  
Richard C. Dowell ◽  
Graeme M. Clark


2010 ◽  
Vol 26 (3) ◽  
pp. 194-202 ◽  
Author(s):  
Daniel A. Newman ◽  
Christine A. Limbers ◽  
James W. Varni

The measurement of health-related quality of life (HRQOL) in children has witnessed significant international growth over the past decade in an effort to improve pediatric health and well-being, and to determine the value of health-care services. In order to compare international HRQOL research findings across language groups, it is important to demonstrate factorial invariance, i.e., that the items have an equivalent meaning across the language groups studied. This study examined the factorial invariance of child self-reported HRQOL across English- and Spanish-language groups in a Hispanic population of 2,899 children ages 8–18 utilizing the 23-item PedsQL™ 4.0 Generic Core Scales. Multigroup confirmatory factor analysis (CFA) was performed specifying a five-factor model across language groups. The findings support an equivalent 5-factor structure across English- and Spanish-language groups. Based on these data, it can be concluded that children across the two languages studied interpreted the instrument in a similar manner. The multigroup CFA statistical methods utilized in the present study have important implications for cross-cultural assessment research in children in which different language groups are compared.



Sign in / Sign up

Export Citation Format

Share Document