Intelligent module for recognizing emotions by voice

2021 ◽  
pp. 46-52
Author(s):  
Oleg Ilarionov ◽  
Anton Astakhov ◽  
Anna Krasovska ◽  
Iryna Domanetska

Speech is the main way of communication for people, and people can receive not only semantic but also emotional information from speech. Recognition of emotions by voice is relevant to areas such as psychological care, security systems development, lie detection, customer relationship analysis, video game development. Because the recognition of emotions by a person is subjective, and therefore inexact and time consuming, there is a need to create software that could solve this problem. The article considers the state of the problem of recognizing human emotions by voice. Modern publications, the approaches used in them, namely models of emotions, data sets, methods of extraction of signs, classifiers are analyzed. It is determined that existing developments have an average accuracy of about 0.75. The general structure of the system of recognition of human emotions by voice is analyzed, the corresponding intellectual module is designed and developed. A Unified Modeling Language (UML) is used to create a component diagram and a class diagram. RAVDESS and TESS datasets were selected as datasets to diversify the training sample. A discrete model of emotions (joy, sadness, anger, disgust, fear, surprise, calm, neutral emotion), MFCC (Mel Frequency Cepstral Coefficients) method for extracting signs, convolutional neural network for classification were used. . The neural network was developed using the TensorFlow and Keras machine learning libraries. The spectrogram and graphs of the audio signal, as well as graphs of accuracy and recognition errors are constructed. As a result of the software implementation of the intelligent module for recognizing emotions by voice, the accuracy of validation has been increased to 0.8.

Author(s):  
Azadeh Bashiri ◽  
Roghaye Hosseinkhani

Cry as the only way of communication of babies with the surrounding environment can be happened for many reasons such as diseases, suffocation, hunger, cold and heat feeling, pain and etc. So, the analysis and detection of its source are very important for parents and health care providers. So the present study designed with the aim to test the performance of neural networks in the identification of the source of babies crying. The present study combines the genetic algorithm and artificial neural network with (Linear Predictive Coding) LPC and MFCC (Mel-Frequency Cepstral Coefficients) to classify the babies crying. The results of this study indicate the superiority of the proposed method compared to the other previous methods. This method could achieve the highest accuracy in the classification of newborns crying among the previous studies. Developing methods for classification audio signal analysis are promising and can be effectively applied in different areas such as babies crying.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 676
Author(s):  
Andrej Zgank

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.


2021 ◽  
Vol 13 (4) ◽  
pp. 628
Author(s):  
Liang Ye ◽  
Tong Liu ◽  
Tian Han ◽  
Hany Ferdinando ◽  
Tapio Seppänen ◽  
...  

Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and video sequence-based methods. Sensors and surveillance cameras are used to detect campus violence. In this paper, the authors use image features and acoustic features for campus violence detection. Campus violence data are gathered by role-playing, and 4096-dimension feature vectors are extracted from every 16 frames of video images. The C3D (Convolutional 3D) neural network is used for feature extraction and classification, and an average recognition accuracy of 92.00% is achieved. Mel-frequency cepstral coefficients (MFCCs) are extracted as acoustic features, and three speech emotion databases are involved. The C3D neural network is used for classification, and the average recognition accuracies are 88.33%, 95.00%, and 91.67%, respectively. To solve the problem of evidence conflict, the authors propose an improved Dempster–Shafer (D–S) algorithm. Compared with existing D–S theory, the improved algorithm increases the recognition accuracy by 10.79%, and the recognition accuracy can ultimately reach 97.00%.


2011 ◽  
Vol 467-469 ◽  
pp. 1505-1510
Author(s):  
Dan Liu ◽  
Ni Hong Wang ◽  
Gui Ying Li

This paper proposes a new method that it uses the neural network to construct the solution of the Hamiltion-Jacobi inequality (HJ), and it carries on the optimization of the neural network weight using the genetic algorithm. This method causes the Lyapunov function to satisfy the HJ, avoides solving the HJ parital differential inequality, and overcomes the difficulty which the HJ parital differential inequality analysis. Beside this, it proposes a design method of a nonlinear state feedback L2-gain disturbance rejection controller based on HJ, and introduces general structure of L2-gain disturbance rejection controller in the form of neural network. The simulation demonstrates the design of controller is feasible and the closed-loop system ensures a finite gain between the disturbance and the output.


2017 ◽  
Vol 120 ◽  
pp. 417-421 ◽  
Author(s):  
Fatih Veysel Nurçin ◽  
Elbrus Imanov ◽  
Ali Işın ◽  
Dilber Uzun Ozsahin

Hypertension ◽  
2017 ◽  
Vol 70 (suppl_1) ◽  
Author(s):  
Francesco Lamonaca ◽  
Vitaliano Spagnuolo ◽  
Serena De Prisco ◽  
Domenico L Carnì ◽  
Domenico Grimaldi

The analysis of the PPG signal in the time domain for the evaluation of the blood pressure (BP) is proposed. Some features extracted from the PPG signal are used to train an Artificial Neural Network (ANN) to determine the function that fit the target systolic and diastolic BP. The data related to the PPG signals and BP used in the analysis are provided by the Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC II) database. The pre-analysis of the signal to remove inconsistent data is also proposed. A set of 1750 valid pulse is considered. The 80% of the input samples is used for the training of the network. Instead, the 10% of the input data are used for the validation of the network and 10% for final test of this last. The results show as the error for both the systolic and diastolic BP evaluation is included in the range of ±3 mmHg. Tab.1 shows the results for 20 PPG pulses randomly selected analyzed together with the systolic and diastolic blood pressure furnished by MIMC and evaluated by the trained ANN. Tab.1 experimental results comparing MIMIC and the ANN results. Moreover, a suitable hardware to validate the ANN with the sphygmomanometer is designed and realized. This hardware allows clinicians to collect data according to the requirements of the validation procedure. With the sphygmomanometer the systolic and diastolic values are referred to two different PPG pulses. As a consequence, it is proposed a new hardware interface allowing the synchronized acquisition and storage of the PPG signal and clinician voice. For the validation, the clinician: (i) evaluates the BP on both the arms and assesses that no significant differences occur; (ii) plugs the PPG sensor on the finger of one arm; (iii) starts the recording of both the PPG signal and the audio signal; (iv) evaluates the BP on the other arm with sphygmomanometer and says the systolic and diastolic values when detected. Through suitable post processing algorithm, the Systolic and Diastolic values are associated to the corresponding PPG Pulses. Following this procedure, the dataset to further validate the ANN according the standard is obtained. Once the ANN is validated it will be implemented on smartphone to have always in the pocket a reliable measurement system for Blood Pressure, oximetry and heart rate.


Sign in / Sign up

Export Citation Format

Share Document