scholarly journals Evaluation of Automatic Speech Recognition Systems

2021 ◽  
Author(s):  
Matheus Xavier Sampaio ◽  
Regis Pires Magalhães ◽  
Ticiana Linhares Coelho da Silva ◽  
Lívia Almada Cruz ◽  
Davi Romero de Vasconcelos ◽  
...  

Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 634
Author(s):  
Alakbar Valizada ◽  
Natavan Akhundova ◽  
Samir Rustamov

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group.


Author(s):  
S. Arokiaraj ◽  
Dr. N. Viswanathan

With the advent of Internet of things(IoT),HA (HA) recognition has contributed the more application in health care in terms of diagnosis and Clinical process. These devices must be aware of human movements to provide better aid in the clinical applications as well as user’s daily activity.Also , In addition to machine and deep learning algorithms, HA recognition systems has significantly improved in terms of high accurate recognition. However, the most of the existing models designed needs improvisation in terms of accuracy and computational overhead. In this research paper, we proposed a BAT optimized Long Short term Memory (BAT-LSTM) for an effective recognition of human activities using real time IoT systems. The data are collected by implanting the Internet of things) devices invasively. Then, proposed BAT-LSTM is deployed to extract the temporal features which are then used for classification to HA. Nearly 10,0000 dataset were collected and used for evaluating the proposed model. For the validation of proposed framework, accuracy, precision, recall, specificity and F1-score parameters are chosen and comparison is done with the other state-of-art deep learning models. The finding shows the proposed model outperforms the other learning models and finds its suitability for the HA recognition.


2021 ◽  
Vol 11 (19) ◽  
pp. 8872
Author(s):  
Iván G. Torre ◽  
Mónica Romero ◽  
Aitor Álvarez

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.


2021 ◽  
Author(s):  
Jaydip Sen ◽  
Sidra Mehtab ◽  
Gourab Nath

Prediction of future movement of stock prices has been a subject matter of many research work. On one hand, we have proponents of the Efficient Market Hypothesis who claim that stock prices cannot be predicted, on the other hand, there are propositions illustrating that, if appropriately modeled, stock prices can be predicted with a high level of accuracy. There is also a gamut of literature on technical analysis of stock prices where the objective is to identify patterns in stock price movements and profit from it. In this work, we propose a hybrid approach for stock price prediction using five deep learning-based regression models. We select the NIFTY 50 index values of the National Stock Exchange (NSE) of India, over a period of December 29, 2014 to July 31, 2020. Based on the NIFTY data during December 29, 2014 to December 28, 2018, we build two regression models using <i>convolutional neural networks</i> (CNNs), and three regression models using <i>long-and-short-term memory</i> (LSTM) networks for predicting the <i>open</i> values of the NIFTY 50 index records for the period December 31, 2018 to July 31, 2020. We adopted a multi-step prediction technique with <i>walk-forward validation</i>. The parameters of the five deep learning models are optimized using the grid-search technique so that the validation losses of the models stabilize with an increasing number of epochs in the model training, and the training and validation accuracies converge. Extensive results are presented on various metrics for all the proposed regression models. The results indicate that while both CNN and LSTM-based regression models are very accurate in forecasting the NIFTY 50 <i>open</i> values, the CNN model that previous one week’s data as the input is the fastest in its execution. On the other hand, the encoder-decoder convolutional LSTM model uses the previous two weeks’ data as the input is found to be the most accurate in its forecasting results.


2020 ◽  
Author(s):  
Saeed Nosratabadi ◽  
Amir Mosavi ◽  
Puhong Duan ◽  
Pedram Ghamisi ◽  
Filip Ferdinand ◽  
...  

Abstract This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.


Author(s):  
Daniel Bolanos

This chapter provides practitioners in the field with a set of guidelines to help them through the process of elaborating an adequate automated testing framework to competently test automatic speech recognition systems. Through this chapter the testing process of such a system is analyzed from different angles, and different methods and techniques are proposed that are well suited for this task.


Sign in / Sign up

Export Citation Format

Share Document