scholarly journals Speech Emotion Recognition Using Deep Learning LSTM for Tamil Language

2021 ◽  
Vol 29 (3) ◽  
Author(s):  
Bennilo Fernandes ◽  
Kasiprasad Mannepalli

Deep Neural Networks (DNN) are more than just neural networks with several hidden units that gives better results with classification algorithm in automated voice recognition activities. Then spatial correlation was considered in traditional feedforward neural networks and which do not manage speech signal properly to it extend, so recurrent neural networks (RNNs) were implemented. Long Short-Term Memory (LSTM) systems is a unique case of RNNs for speech processing, thus considering long-term dependencies Deep Hierarchical LSTM and BiLSTM is designed with dropout layers to reduce the gradient and long-term learning error in emotional speech analysis. Thus, four different combinations of deep hierarchical learning architecture Deep Hierarchical LSTM and LSTM (DHLL), Deep Hierarchical LSTM and BiLSTM (DHLB), Deep Hierarchical BiLSTM and LSTM (DHBL) and Deep Hierarchical dual BiLSTM (DHBB) is designed with dropout layers to improve the networks. The performance test of all four model were compared in this paper and better efficiency of classification is attained with minimal dataset of Tamil Language. The experimental results show that DHLB reaches the best precision of about 84% in recognition of emotions for Tamil database, however, the DHBL gives 83% of efficiency. Other design layers also show equal performance but less than the above models DHLL & DHBB shows 81% of efficiency for lesser dataset and minimal execution and training time.

2017 ◽  
Vol 10 (1) ◽  
pp. 01-10
Author(s):  
Kostantin Nikolic

This paper presents the application of stochastic search algorithms to train artificial neural networks. Methodology approaches in the work created primarily to provide training complex recurrent neural networks. It is known that training recurrent networks is more complex than the type of training feedforward neural networks. Through simulation of recurrent networks is realized propagation signal from input to output and training process achieves a stochastic search in the space of parameters. The performance of this type of algorithm is superior to most of the training algorithms, which are based on the concept of gradient. The efficiency of these algorithms is demonstrated in the training network created from units that are characterized by long term and long shot term memory of networks. The presented methology is effective and relative simple.


Biomimetics ◽  
2019 ◽  
Vol 5 (1) ◽  
pp. 1 ◽  
Author(s):  
Michelle Gutiérrez-Muñoz ◽  
Astryd González-Salazar ◽  
Marvin Coto-Jiménez

Speech signals are degraded in real-life environments, as a product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions. To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combinations of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation was made based on quality measurements of the signal’s spectrum, the training time of the networks, and statistical validation of results. In total, 120 artificial neural networks of eight different types were trained and compared. The results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, given that reduction in training time is on the order of 30%, in processes that can normally take several days or weeks, depending on the amount of data. The results also present advantages in efficiency, but without a significant drop in quality.


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2761
Author(s):  
Vaios Ampelakiotis ◽  
Isidoros Perikos ◽  
Ioannis Hatzilygeroudis ◽  
George Tsihrintzis

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.


Author(s):  
Michelle Gutiérrez-Muñoz ◽  
Astryd González-Salazar ◽  
Marvin Coto-Jiménez

Speech signals are degraded in real-life environments, product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions.To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long and short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combination of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation has been made based on quality measurements of the signal's spectrum, training time of the networks and statistical validation of results. Results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, with advantages in efficiency, but without a significan drop in quality.


2020 ◽  
Vol 44 (3) ◽  
pp. 326-332
Author(s):  
Audreaiona Waters ◽  
Liye Zou ◽  
Myungjin Jung ◽  
Qian Yu ◽  
Jingyuan Lin ◽  
...  

Objective: Sustained attention is critical for various activities of daily living, including engaging in health-enhancing behaviors and inhibition of health compromising behaviors. Sustained attention activates neural networks involved in episodic memory function, a critical cognition for healthy living. Acute exercise has been shown to activate these same neural networks. Thus, it is plausible that engaging in a sustained attention task and engaging in a bout of acute exercise may have an additive effect in enhancing memory function, which was the purpose of this experiment. Methods: 23 young adults (Mage = 20.7 years) completed 2 visits, with each visit occurring approximately 24 hours apart, in a counterbalanced order, including: (1) acute exercise with sustained attention, and (2) sustained attention only. Memory was assessed using a word-list paradigm and included a short- and long-term memory assessment. Sustained attention was induced via a sustained attention to response task (SART). Acute exercise involved a 15-minute bout of moderate-intensity exercise. Results: Short-term memory performance was significantly greater than long-term memory, Mdiff = 1.86, p < .001, and short-term memory for Exercise with Sustained Attention was significantly greater than short-term memory for Sustained Attention Only, Mdiff = 1.50, p = .01. Conclusion: Engaging in an acute bout of exercise before a sustained attention task additively influenced short-term memory function.


2020 ◽  
Vol 34 (04) ◽  
pp. 4115-4122
Author(s):  
Kyle Helfrich ◽  
Qiang Ye

Several variants of recurrent neural networks (RNNs) with orthogonal or unitary recurrent matrices have recently been developed to mitigate the vanishing/exploding gradient problem and to model long-term dependencies of sequences. However, with the eigenvalues of the recurrent matrix on the unit circle, the recurrent state retains all input information which may unnecessarily consume model capacity. In this paper, we address this issue by proposing an architecture that expands upon an orthogonal/unitary RNN with a state that is generated by a recurrent matrix with eigenvalues in the unit disc. Any input to this state dissipates in time and is replaced with new inputs, simulating short-term memory. A gradient descent algorithm is derived for learning such a recurrent matrix. The resulting method, called the Eigenvalue Normalized RNN (ENRNN), is shown to be highly competitive in several experiments.


2020 ◽  
Vol 12 (8) ◽  
pp. 3177 ◽  
Author(s):  
Dimitrios Kontogiannis ◽  
Dimitrios Bargiotas ◽  
Aspassia Daskalopulu

Power forecasting is an integral part of the Demand Response design philosophy for power systems, enabling utility companies to understand the electricity consumption patterns of their customers and adjust price signals accordingly, in order to handle load demand more effectively. Since there is an increasing interest in real-time automation and more flexible Demand Response programs that monitor changes in the residential load profiles and reflect them according to changes in energy pricing schemes, high granularity time series forecasting is at the forefront of energy and artificial intelligence research, aimed at developing machine learning models that can produce accurate time series predictions. In this study we compared the baseline performance and structure of different types of neural networks on residential energy data by formulating a suitable supervised learning problem, based on real world data. After training and testing long short-term memory (LSTM) network variants, a convolutional neural network (CNN), and a multi-layer perceptron (MLP), we observed that the latter performed better on the given problem, yielding the lowest mean absolute error and achieving the fastest training time.


2004 ◽  
Vol 4 (1) ◽  
pp. 143-146 ◽  
Author(s):  
D. J. Lary ◽  
M. D. Müller ◽  
H. Y. Mussa

Abstract. Neural networks are ideally suited to describe the spatial and temporal dependence of tracer-tracer correlations. The neural network performs well even in regions where the correlations are less compact and normally a family of correlation curves would be required. For example, the CH4-N2O correlation can be well described using a neural network trained with the latitude, pressure, time of year, and CH4 volume mixing ratio (v.m.r.). In this study a neural network using Quickprop learning and one hidden layer with eight nodes was able to reproduce the CH4-N2O correlation with a correlation coefficient between simulated and training values of 0.9995. Such an accurate representation of tracer-tracer correlations allows more use to be made of long-term datasets to constrain chemical models. Such as the dataset from the Halogen Occultation Experiment (HALOE) which has continuously observed CH4  (but not N2O) from 1991 till the present. The neural network Fortran code used is available for download.


2021 ◽  
Vol 247 ◽  
pp. 06029
Author(s):  
E. Szames ◽  
K. Ammar ◽  
D. Tomatis ◽  
J.M. Martinez

This work deals with the modeling of homogenized few-group cross sections by Artificial Neural Networks (ANN). A comprehensive sensitivity study on data normalization, network architectures and training hyper-parameters specifically for Deep and Shallow Feed Forward ANN is presented. The optimal models in terms of reduction in the library size and training time are compared to multi-linear interpolation on a Cartesian grid. The use case is provided by the OECD-NEA Burn-up Credit Criticality Benchmark [1]. The Pytorch [2] machine learning framework is used.


Sign in / Sign up

Export Citation Format

Share Document