EMOTIONS RECOGNITION IN HUMAN SPEECH USING DEEP NEURAL NETWORKS

Author(s):  
E. Yu. Shchetinin

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.

2022 ◽  
pp. 1559-1575
Author(s):  
Mário Pereira Véstias

Machine learning is the study of algorithms and models for computing systems to do tasks based on pattern identification and inference. When it is difficult or infeasible to develop an algorithm to do a particular task, machine learning algorithms can provide an output based on previous training data. A well-known machine learning model is deep learning. The most recent deep learning models are based on artificial neural networks (ANN). There exist several types of artificial neural networks including the feedforward neural network, the Kohonen self-organizing neural network, the recurrent neural network, the convolutional neural network, the modular neural network, among others. This article focuses on convolutional neural networks with a description of the model, the training and inference processes and its applicability. It will also give an overview of the most used CNN models and what to expect from the next generation of CNN models.


Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 98 ◽  
Author(s):  
Tariq Ahmad ◽  
Allan Ramsay ◽  
Hanady Ahmed

Assigning sentiment labels to documents is, at first sight, a standard multi-label classification task. Many approaches have been used for this task, but the current state-of-the-art solutions use deep neural networks (DNNs). As such, it seems likely that standard machine learning algorithms, such as these, will provide an effective approach. We describe an alternative approach, involving the use of probabilities to construct a weighted lexicon of sentiment terms, then modifying the lexicon and calculating optimal thresholds for each class. We show that this approach outperforms the use of DNNs and other standard algorithms. We believe that DNNs are not a universal panacea and that paying attention to the nature of the data that you are trying to learn from can be more important than trying out ever more powerful general purpose machine learning algorithms.


Author(s):  
Denis Sato ◽  
Adroaldo José Zanella ◽  
Ernane Xavier Costa

Vehicle-animal collisions represent a serious problem in roadway infrastructure. To avoid these roadway collisions, different mitigation systems have been applied in various regions of the world. In this article, a system for detecting animals on highways is presented using computer vision and machine learning algorithms. The models were trained to classify two groups of animals: capybaras and donkeys. Two variants of the convolutional neural network called Yolo (You only look once) were used, Yolov4 and Yolov4-tiny (a lighter version of the network). The training was carried out using pre-trained models. Detection tests were performed on 147 images. The accuracy results obtained were 84.87% and 79.87% for Yolov4 and Yolov4-tiny, respectively. The proposed system has the potential to improve road safety by reducing or preventing accidents with animals.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1576 ◽  
Author(s):  
Li Zhu ◽  
Lianghao Huang ◽  
Linyu Fan ◽  
Jinsong Huang ◽  
Faming Huang ◽  
...  

Landslide susceptibility prediction (LSP) modeling is an important and challenging problem. Landslide features are generally uncorrelated or nonlinearly correlated, resulting in limited LSP performance when leveraging conventional machine learning models. In this study, a deep-learning-based model using the long short-term memory (LSTM) recurrent neural network and conditional random field (CRF) in cascade-parallel form was proposed for making LSPs based on remote sensing (RS) images and a geographic information system (GIS). The RS images are the main data sources of landslide-related environmental factors, and a GIS is used to analyze, store, and display spatial big data. The cascade-parallel LSTM-CRF consists of frequency ratio values of environmental factors in the input layers, cascade-parallel LSTM for feature extraction in the hidden layers, and cascade-parallel full connection for classification and CRF for landslide/non-landslide state modeling in the output layers. The cascade-parallel form of LSTM can extract features from different layers and merge them into concrete features. The CRF is used to calculate the energy relationship between two grid points, and the extracted features are further smoothed and optimized. As a case study, the cascade-parallel LSTM-CRF was applied to Shicheng County of Jiangxi Province in China. A total of 2709 landslide grid cells were recorded and 2709 non-landslide grid cells were randomly selected from the study area. The results show that, compared with existing main traditional machine learning algorithms, such as multilayer perception, logistic regression, and decision tree, the proposed cascade-parallel LSTM-CRF had a higher landslide prediction rate (positive predictive rate: 72.44%, negative predictive rate: 80%, total predictive rate: 75.67%). In conclusion, the proposed cascade-parallel LSTM-CRF is a novel data-driven deep learning model that overcomes the limitations of traditional machine learning algorithms and achieves promising results for making LSPs.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Nighat Bibi ◽  
Misba Sikandar ◽  
Ikram Ud Din ◽  
Ahmad Almogren ◽  
Sikandar Ali

For the last few years, computer-aided diagnosis (CAD) has been increasing rapidly. Numerous machine learning algorithms have been developed to identify different diseases, e.g., leukemia. Leukemia is a white blood cells- (WBC-) related illness affecting the bone marrow and/or blood. A quick, safe, and accurate early-stage diagnosis of leukemia plays a key role in curing and saving patients’ lives. Based on developments, leukemia consists of two primary forms, i.e., acute and chronic leukemia. Each form can be subcategorized as myeloid and lymphoid. There are, therefore, four leukemia subtypes. Various approaches have been developed to identify leukemia with respect to its subtypes. However, in terms of effectiveness, learning process, and performance, these methods require improvements. This study provides an Internet of Medical Things- (IoMT-) based framework to enhance and provide a quick and safe identification of leukemia. In the proposed IoMT system, with the help of cloud computing, clinical gadgets are linked to network resources. The system allows real-time coordination for testing, diagnosis, and treatment of leukemia among patients and healthcare professionals, which may save both time and efforts of patients and clinicians. Moreover, the presented framework is also helpful for resolving the problems of patients with critical condition in pandemics such as COVID-19. The methods used for the identification of leukemia subtypes in the suggested framework are Dense Convolutional Neural Network (DenseNet-121) and Residual Convolutional Neural Network (ResNet-34). Two publicly available datasets for leukemia, i.e., ALL-IDB and ASH image bank, are used in this study. The results demonstrated that the suggested models supersede the other well-known machine learning algorithms used for healthy-versus-leukemia-subtypes identification.


IoT ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 222-235
Author(s):  
Guillaume Coiffier ◽  
Ghouthi Boukli Hacene ◽  
Vincent Gripon

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.


Sign in / Sign up

Export Citation Format

Share Document