A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Huiyun Zhang; Heming Huang; Henry Han

doi:10.3390/app11219897

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Applied Sciences ◽

10.3390/app11219897 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9897

Author(s):

Huiyun Zhang ◽

Heming Huang ◽

Henry Han

Keyword(s):

Feature Extraction ◽

Natural Language Processing ◽

Emotion Recognition ◽

Language Processing ◽

Speech Emotion Recognition ◽

Dense Layer ◽

Acoustic Model ◽

Spatiotemporal Information ◽

Proposed Model ◽

The Right

Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

10.20944/preprints202108.0433.v1 ◽

2021 ◽

Author(s):

Huiyun Zhang ◽

Heming Huang ◽

Henry Han

Keyword(s):

Feature Extraction ◽

Natural Language Processing ◽

Emotion Recognition ◽

Language Processing ◽

Speech Emotion Recognition ◽

Dense Layer ◽

Acoustic Model ◽

Spatiotemporal Information ◽

Proposed Model ◽

The Right

Speech emotion recognition remains a heavy lifting in natural language processing. It has strict requirements to the effectiveness of feature extraction and that of acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address these challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recall on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

A Hybrid Technique using CNN+LSTM for Speech Emotion Recognition

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1027.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1126-1130

Keyword(s):

Feature Extraction ◽

Human Computer Interaction ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Hybrid Technique ◽

Proposed Model ◽

High Level ◽

High Level Feature ◽

Convolutional Lstm

Automatic speech emotion recognition is a very necessary activity for effective human-computer interaction. This paper is motivated by using spectrograms as inputs to the hybrid deep convolutional LSTM for speech emotion recognition. In this study, we trained our proposed model using four convolutional layers for high-level feature extraction from input spectrograms, LSTM layer for accumulating long-term dependencies and finally two dense layers. Experimental results on the SAVEE database shows promising performance. Our proposed model is highly capable as it obtained an accuracy of 94.26%.

A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic (Preprint)

10.2196/preprints.25320 ◽

2020 ◽

Cited By ~ 1

Author(s):

Rohan Pandey ◽

Vaibhav Gautam ◽

Ridam Pal ◽

Harsh Bandhey ◽

Lovedeep Singh Dhingra ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

User Feedback ◽

Who Guidelines ◽

The Times ◽

The Right ◽

Local Languages

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable

A Hindi Image Caption Generation Framework Using Deep Learning

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3432246 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-19

Author(s):

Santosh Kumar Mishra ◽

Rijul Dhir ◽

Sriparna Saha ◽

Pushpak Bhattacharyya

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Image Captioning ◽

Textual Description ◽

Proposed Model ◽

Hindi Language ◽

The Given

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .

A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

Mathematical Problems in Engineering ◽

10.1155/2014/749604 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 21

Author(s):

Chenchen Huang ◽

Wei Gong ◽

Wenlong Fu ◽

Dongyu Feng

Keyword(s):

Feature Extraction ◽

Emotion Recognition ◽

Recognition Rate ◽

Original Method ◽

Speech Emotion Recognition ◽

High Dimensional ◽

Svm Classifier ◽

Multiple Classifier System ◽

Classifier System ◽

Multiple Classifier

Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

Natural Language Processing by Enhanced Honey Encryption Technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l1048.10812s19 ◽

2019 ◽

Vol 8 (12S) ◽

pp. 159-163

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Cyber Attacks ◽

Binary Form ◽

Brute Force ◽

Natural Languages ◽

Cipher Text ◽

The Right ◽

Binary Strings

Traditional encryption systems and techniques have always been vulnerable to brute force cyber-attacks. This is due to bytes encoding of characters utf8 also known as ASCII characters. Therefore, an opponent who intercepts a cipher text and attempts to decrypt the signal by applying brute force with a faulty pass key can detect some of the decrypted signals by employing a mixture of symbols that are not uniformly dispersed and contain no meaningful significance. Honey encoding technique is suggested to curb this classical authentication weakness by developing cipher-texts that provide correct and evenly dispersed but untrue plaintexts after decryption with a false key. This technique is only suitable for passkeys and PINs. Its adjustment in order to promote the encoding of the texts of natural languages such as electronic mails, records generated by man, still remained an open-end drawback. Prevailing proposed schemes to expand the encryption of natural language messages schedule exposes fragments of the plaintext embedded with coded data, thus they are more prone to cipher text attacks. In this paper, amending honey encoded system is proposed to promote natural language message encryption. The main aim was to create a framework that would encrypt a signal fully in binary form. As an end result, most binary strings semantically generate the right texts to trick an opponent who tries to decipher an error key in the cipher text. The security of the suggested system is assessed..

EMOSIS Sentiment Analysis on Tweets with Emotion and Intensity Level Recognition Considering Ending Punctuation Marks

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4518.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10289-10293

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Emotion Recognition ◽

Sentiment Analysis ◽

Language Processing ◽

Significant Role ◽

Language Model ◽

Intensity Level ◽

Processing Stage ◽

Overall Performance

Sentiment Analysis is a tool used for determining the Polarity or Emotion of a Sentence. It is a field of Natural Language Processing which focuses on the study of opinions. In this study, the researchers solved one key challenge in Sentiment Analysis, which is to consider the Ending Punctuation Marks present in a sentence. Ending punctuation marks plays a significant role in Emotion Recognition and Intensity Level Recognition. The research made used of tweets expressing opinions about Philippine President Rodrigo Duterte. These downloaded tweets served as the inputs. It was initially subjected to pre-processing stage to be able to prepare the sentences for processing. A Language Model was created to serve as the classifier for determining the scores of the tweets. The scores give the polarity of the sentence. Accuracy is very important in sentiment analysis. To increase the chance of correctly identifying the polarity of the tweets, the input undergone Intensity Level Recognition which determines the intensifiers and negations within the sentences. The system was evaluated with overall performance of 80.27%.

Feature extraction algorithms to improve the speech emotion recognition rate

International Journal of Speech Technology ◽

10.1007/s10772-020-09672-4 ◽

2020 ◽

Vol 23 (1) ◽

pp. 45-55 ◽

Cited By ~ 7

Author(s):

Anusha Koduru ◽

Hima Bindu Valiveti ◽

Anil Kumar Budati

Keyword(s):

Feature Extraction ◽

Emotion Recognition ◽

Recognition Rate ◽

Speech Emotion Recognition

Deep Convolutional Neural Networks for Feature Extraction in Speech Emotion Recognition

Human-Computer Interaction. Recognition and Interaction Technologies - Lecture Notes in Computer Science ◽

10.1007/978-3-030-22643-5_9 ◽

2019 ◽

pp. 117-132 ◽

Cited By ~ 1

Author(s):

Panikos Heracleous ◽

Yasser Mohammad ◽

Akio Yoneyama

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Emotion Recognition ◽

Convolutional Neural Networks ◽

Speech Emotion Recognition ◽

Deep Convolutional Neural Networks

Deep Persian sentiment analysis: Cross-lingual training for low-resource languages

Journal of Information Science ◽

10.1177/0165551520962781 ◽

2020 ◽

pp. 016555152096278

Author(s):

Rouzbeh Ghasemi ◽

Seyed Arad Ashrafi Asli ◽

Saeedeh Momtazi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Training Data ◽

Target Language ◽

Low Resource ◽

Proposed Model ◽

Significant Difference ◽

Cross Lingual

With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source–target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.