scholarly journals Bimodal Emotion Recognition Model for Minnan Songs

Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 145 ◽  
Author(s):  
Zhenglong Xiang ◽  
Xialei Dong ◽  
Yuanxiang Li ◽  
Fei Yu ◽  
Xing Xu ◽  
...  

Most of the existing research papers study the emotion recognition of Minnan songs from the perspectives of music analysis theory and music appreciation. However, these investigations do not explore any possibility of carrying out an automatic emotion recognition of Minnan songs. In this paper, we propose a model that consists of four main modules to classify the emotion of Minnan songs by using the bimodal data—song lyrics and audio. In the proposed model, an attention-based Long Short-Term Memory (LSTM) neural network is applied to extract lyrical features, and a Convolutional Neural Network (CNN) is used to extract the audio features from the spectrum. Then, two kinds of extracted features are concatenated by multimodal compact bilinear pooling, and finally, the concatenated features are input to the classifying module to determine the song emotion. We designed three experiment groups to investigate the classifying performance of combinations of the four main parts, the comparisons of proposed model with the current approaches and the influence of a few key parameters on the performance of emotion recognition. The results show that the proposed model exhibits better performance over all other experimental groups. The accuracy, precision and recall of the proposed model exceed 0.80 in a combination of appropriate parameters.

2021 ◽  
pp. 1-10
Author(s):  
Hye-Jeong Song ◽  
Tak-Sung Heo ◽  
Jong-Dae Kim ◽  
Chan-Young Park ◽  
Yu-Seop Kim

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.


2021 ◽  
pp. 1-17
Author(s):  
Enda Du ◽  
Yuetian Liu ◽  
Ziyan Cheng ◽  
Liang Xue ◽  
Jing Ma ◽  
...  

Summary Accurate production forecasting is an essential task and accompanies the entire process of reservoir development. With the limitation of prediction principles and processes, the traditional approaches are difficult to make rapid predictions. With the development of artificial intelligence, the data-driven model provides an alternative approach for production forecasting. To fully take the impact of interwell interference on production into account, this paper proposes a deep learning-based hybrid model (GCN-LSTM), where graph convolutional network (GCN) is used to capture complicated spatial patterns between each well, and long short-term memory (LSTM) neural network is adopted to extract intricate temporal correlations from historical production data. To implement the proposed model more efficiently, two data preprocessing procedures are performed: Outliers in the data set are removed by using a box plot visualization, and measurement noise is reduced by a wavelet transform. The robustness and applicability of the proposed model are evaluated in two scenarios of different data types with the root mean square error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE). The results show that the proposed model can effectively capture spatial and temporal correlations to make a rapid and accurate oil production forecast.


Sensors ◽  
2020 ◽  
Vol 20 (2) ◽  
pp. 376 ◽  
Author(s):  
Md. Shahinur Alam ◽  
Ki-Chul Kwon ◽  
Md. Ashraful Alam ◽  
Mohammed Y. Abbass ◽  
Shariar Md Imtiaz ◽  
...  

Trajectory-based writing system refers to writing a linguistic character or word in free space by moving a finger, marker, or handheld device. It is widely applicable where traditional pen-up and pen-down writing systems are troublesome. Due to the simple writing style, it has a great advantage over the gesture-based system. However, it is a challenging task because of the non-uniform characters and different writing styles. In this research, we developed an air-writing recognition system using three-dimensional (3D) trajectories collected by a depth camera that tracks the fingertip. For better feature selection, the nearest neighbor and root point translation was used to normalize the trajectory. We employed the long short-term memory (LSTM) and a convolutional neural network (CNN) as a recognizer. The model was tested and verified by the self-collected dataset. To evaluate the robustness of our model, we also employed the 6D motion gesture (6DMG) alphanumeric character dataset and achieved 99.32% accuracy which is the highest to date. Hence, it verifies that the proposed model is invariant for digits and characters. Moreover, we publish a dataset containing 21,000 digits; which solves the lack of dataset in the current research.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 124928-124938 ◽  
Author(s):  
Simin Wang ◽  
Junhuai Li ◽  
Ting Cao ◽  
Huaijun Wang ◽  
Pengjia Tu ◽  
...  

2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Linqin Cai ◽  
Yaxin Hu ◽  
Jiangong Dong ◽  
Sitong Zhou

With the rapid development in social media, single-modal emotion recognition is hard to satisfy the demands of the current emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a multimodal emotion recognition model from speech and text was proposed in this paper. Considering the complementarity between different modes, CNN (convolutional neural network) and LSTM (long short-term memory) were combined in a form of binary channels to learn acoustic emotion features; meanwhile, an effective Bi-LSTM (bidirectional long short-term memory) network was resorted to capture the textual features. Furthermore, we applied a deep neural network to learn and classify the fusion features. The final emotional state was determined by the output of both speech and text emotion analysis. Finally, the multimodal fusion experiments were carried out to validate the proposed model on the IEMOCAP database. In comparison with the single modal, the overall recognition accuracy of text increased 6.70%, and that of speech emotion recognition soared 13.85%. Experimental results show that the recognition accuracy of our multimodal is higher than that of the single modal and outperforms other published multimodal models on the test datasets.


Batteries ◽  
2021 ◽  
Vol 7 (4) ◽  
pp. 66
Author(s):  
Tadele Mamo ◽  
Fu-Kwun Wang

Monitoring cycle life can provide a prediction of the remaining battery life. To improve the prediction accuracy of lithium-ion battery capacity degradation, we propose a hybrid long short-term memory recurrent neural network model with an attention mechanism. The hyper-parameters of the proposed model are also optimized by a differential evolution algorithm. Using public battery datasets, the proposed model is compared to some published models, and it gives better prediction performance in terms of mean absolute percentage error and root mean square error. In addition, the proposed model can achieve higher prediction accuracy of battery end of life.


IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 49325-49338 ◽  
Author(s):  
Bahareh Nakisa ◽  
Mohammad Naim Rastgoo ◽  
Andry Rakotonirainy ◽  
Frederic Maire ◽  
Vinod Chandran

Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 713 ◽  
Author(s):  
Yeonguk Yu ◽  
Yoon-Joong Kim

We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hua Zhang ◽  
Ruoyun Gou ◽  
Jili Shang ◽  
Fangyao Shen ◽  
Yifan Wu ◽  
...  

Speech emotion recognition (SER) is a difficult and challenging task because of the affective variances between different speakers. The performances of SER are extremely reliant on the extracted features from speech signals. To establish an effective features extracting and classification model is still a challenging task. In this paper, we propose a new method for SER based on Deep Convolution Neural Network (DCNN) and Bidirectional Long Short-Term Memory with Attention (BLSTMwA) model (DCNN-BLSTMwA). We first preprocess the speech samples by data enhancement and datasets balancing. Secondly, we extract three-channel of log Mel-spectrograms (static, delta, and delta-delta) as DCNN input. Then the DCNN model pre-trained on ImageNet dataset is applied to generate the segment-level features. We stack these features of a sentence into utterance-level features. Next, we adopt BLSTM to learn the high-level emotional features for temporal summarization, followed by an attention layer which can focus on emotionally relevant features. Finally, the learned high-level emotional features are fed into the Deep Neural Network (DNN) to predict the final emotion. Experiments on EMO-DB and IEMOCAP database obtain the unweighted average recall (UAR) of 87.86 and 68.50%, respectively, which are better than most popular SER methods and demonstrate the effectiveness of our propose method.


Energies ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 3493 ◽  
Author(s):  
Chujie Tian ◽  
Jian Ma ◽  
Chunhong Zhang ◽  
Panpan Zhan

Accurate electrical load forecasting is of great significance to help power companies in better scheduling and efficient management. Since high levels of uncertainties exist in the load time series, it is a challenging task to make accurate short-term load forecast (STLF). In recent years, deep learning approaches provide better performance to predict electrical load in real world cases. The convolutional neural network (CNN) can extract the local trend and capture the same pattern, and the long short-term memory (LSTM) is proposed to learn the relationship in time steps. In this paper, a new deep neural network framework that integrates the hidden feature of the CNN model and the LSTM model is proposed to improve the forecasting accuracy. The proposed model was tested in a real-world case, and detailed experiments were conducted to validate its practicality and stability. The forecasting performance of the proposed model was compared with the LSTM model and the CNN model. The Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) were used as the evaluation indexes. The experimental results demonstrate that the proposed model can achieve better and stable performance in STLF.


Sign in / Sign up

Export Citation Format

Share Document