Evaluation of Automatic Speech Recognition Systems

Mapping Intimacies ◽

10.5753/sbbd.2021.17889 ◽

2021 ◽

Author(s):

Matheus Xavier Sampaio ◽

Regis Pires Magalhães ◽

Ticiana Linhares Coelho da Silva ◽

Lívia Almada Cruz ◽

Davi Romero de Vasconcelos ◽

...

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Smart Homes ◽

The Other ◽

Learning Models ◽

Recognition Systems ◽

Microsoft Azure

Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.

Download Full-text

A Semantic-Aware Strategy for Automatic Speech Recognition Incorporating Deep Learning Models

Advances in Intelligent Systems and Computing - Intelligent System Design ◽

10.1007/978-981-15-5400-1_25 ◽

2020 ◽

pp. 247-254

Author(s):

A. Santhanavijayan ◽

D. Naresh Kumar ◽

Gerard Deepak

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Learning Models

Download Full-text

Development of Speech Recognition Systems in Emergency Call Centers

Symmetry ◽

10.3390/sym13040634 ◽

2021 ◽

Vol 13 (4) ◽

pp. 634

Author(s):

Alakbar Valizada ◽

Natavan Akhundova ◽

Samir Rustamov

Keyword(s):

Speech Recognition ◽

Markov Model ◽

Hidden Markov ◽

Call Centers ◽

The Other ◽

Language Models ◽

Emergency Call ◽

Acoustic Model ◽

Different Types ◽

Recognition Systems

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group.

Download Full-text

Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems

10.21437/interspeech.2013-672 ◽

2013 ◽

Author(s):

Kartik Audhkhasi ◽

Andreas M. Zavou ◽

Panayiotis G. Georgiou ◽

Shrikanth Narayanan

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Systems

Download Full-text

A Hybrid Optimized LSTM Models for Human Activity Recognition with IOT Devices

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-2326 ◽

2021 ◽

pp. 182-189

Author(s):

S. Arokiaraj ◽

Dr. N. Viswanathan

Keyword(s):

Deep Learning ◽

Internet Of Things ◽

Short Term Memory ◽

The Other ◽

Learning Models ◽

Computational Overhead ◽

Temporal Features ◽

Human Movements ◽

Proposed Model ◽

Iot Devices

With the advent of Internet of things(IoT),HA (HA) recognition has contributed the more application in health care in terms of diagnosis and Clinical process. These devices must be aware of human movements to provide better aid in the clinical applications as well as user’s daily activity.Also , In addition to machine and deep learning algorithms, HA recognition systems has significantly improved in terms of high accurate recognition. However, the most of the existing models designed needs improvisation in terms of accuracy and computational overhead. In this research paper, we proposed a BAT optimized Long Short term Memory (BAT-LSTM) for an effective recognition of human activities using real time IoT systems. The data are collected by implanting the Internet of things) devices invasively. Then, proposed BAT-LSTM is deployed to extract the temporal features which are then used for classification to HA. Nearly 10,0000 dataset were collected and used for evaluating the proposed model. For the validation of proposed framework, accuracy, precision, recall, specificity and F1-score parameters are chosen and comparison is done with the other state-of-art deep learning models. The finding shows the proposed model outperforms the other learning models and finds its suitability for the HA recognition.

Download Full-text

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

Applied Sciences ◽

10.3390/app11198872 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8872

Author(s):

Iván G. Torre ◽

Mónica Romero ◽

Aitor Álvarez

Keyword(s):

Speech Recognition ◽

Supervised Learning ◽

Automatic Speech Recognition ◽

English Language ◽

Spanish Language ◽

Learning Methods ◽

Text Data ◽

Lower Performance ◽

Recognition Systems ◽

Fine Tune

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.

Download Full-text

Stock Price Prediction Using Deep Learning Models

10.36227/techrxiv.16640197 ◽

2021 ◽

Author(s):

Jaydip Sen ◽

Sidra Mehtab ◽

Gourab Nath

Keyword(s):

Deep Learning ◽

Stock Prices ◽

Stock Price ◽

Regression Models ◽

Research Work ◽

The Other ◽

Learning Models ◽

Stock Price Prediction ◽

Price Prediction ◽

Other Hand

Prediction of future movement of stock prices has been a subject matter of many research work. On one hand, we have proponents of the Efficient Market Hypothesis who claim that stock prices cannot be predicted, on the other hand, there are propositions illustrating that, if appropriately modeled, stock prices can be predicted with a high level of accuracy. There is also a gamut of literature on technical analysis of stock prices where the objective is to identify patterns in stock price movements and profit from it. In this work, we propose a hybrid approach for stock price prediction using five deep learning-based regression models. We select the NIFTY 50 index values of the National Stock Exchange (NSE) of India, over a period of December 29, 2014 to July 31, 2020. Based on the NIFTY data during December 29, 2014 to December 28, 2018, we build two regression models using convolutional neural networks (CNNs), and three regression models using long-and-short-term memory (LSTM) networks for predicting the open values of the NIFTY 50 index records for the period December 31, 2018 to July 31, 2020. We adopted a multi-step prediction technique with walk-forward validation. The parameters of the five deep learning models are optimized using the grid-search technique so that the validation losses of the models stabilize with an increasing number of epochs in the model training, and the training and validation accuracies converge. Extensive results are presented on various metrics for all the proposed regression models. The results indicate that while both CNN and LSTM-based regression models are very accurate in forecasting the NIFTY 50 open values, the CNN model that previous one week’s data as the input is the fastest in its execution. On the other hand, the encoder-decoder convolutional LSTM model uses the previous two weeks’ data as the input is found to be the most accurate in its forecasting results.

Download Full-text

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

ETRI Journal ◽

10.4218/etrij.13.0112.0074 ◽

2013 ◽

Vol 35 (1) ◽

pp. 100-108 ◽

Cited By ~ 13

Author(s):

Yasser Shekofteh ◽

Farshad Almasganj

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Phase Space ◽

Automatic Speech Recognition ◽

Reconstructed Phase Space ◽

Recognition Systems

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.21203/rs.3.rs-91905/v1 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Data Science ◽

State Of The Art ◽

Hybrid Models ◽

The Other ◽

Learning Models ◽

Comprehensive Review

Abstract This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.

Download Full-text

On the Application of Automated Software Testing Techniques to the Development and Maintenance of Speech Recognition Systems

Advanced Automated Software Testing ◽

10.4018/978-1-4666-0089-8.ch002 ◽

2012 ◽

pp. 30-48

Author(s):

Daniel Bolanos

Keyword(s):

Speech Recognition ◽

Software Testing ◽

Automatic Speech Recognition ◽

Automated Testing ◽

Automated Software Testing ◽

Testing Framework ◽

Methods And Techniques ◽

Testing Techniques ◽

Recognition Systems ◽

Automated Software

This chapter provides practitioners in the field with a set of guidelines to help them through the process of elaborating an adequate automated testing framework to competently test automatic speech recognition systems. Through this chapter the testing process of such a system is analyzed from different angles, and different methods and techniques are proposed that are well suited for this task.

Download Full-text