Attention-Based Multimodal Neural Network for Automatic Evaluation of Press Conferences

Author(s):  
Shengzhou Yi ◽  
Koshiro Mochitomi ◽  
Isao Suzuki ◽  
Xueting Wang ◽  
Toshihiko Yamasaki

In the study, a multimodal neural network is proposed to automatically predict the evaluation of a professional consultant team for press conferences using text and audio data. Seven publicly available press conference videos were collected, and all the Q&A pairs between speakers and journalists were annotated by the consultant team. The proposed multimodal neural network consists of a language model, an audio model, and a feature fusion network. The word representation is made up by a token embedding using ELMo and a type embedding. The language model is an LSTM with an attention layer. The audio model is based on a six-layer CNN to extract segmental feature as well as an attention network to measure the importance of each segment. Two approaches of feature fusion are proposed: a shared attention network and the production of text features and audio features. The former can explain the importance between speech content and speaking style. The latter achieved the best performance with the average accuracy of 60.1% for all evaluation criteria.

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zhencong Li ◽  
Qin Yao ◽  
Wanzhi Ma

This paper firstly introduces the basic knowledge of music, proposes the detailed design of a music retrieval system based on the knowledge of music, and analyzes the feature extraction algorithm and matching algorithm by using the features of music. Feature extraction of audio data is the important research of this paper. In this paper, the main melody features, MFCC features, GFCC features, and rhythm features, are extracted from audio data and a feature fusion algorithm is proposed to achieve the fusion of GFCC features and rhythm features to form new features under the processing of principal component analysis (PCA) dimensionality reduction. After learning the main melody features, MFCC features, GFCC features, and rhythm features, based on the property that PCA dimensionality reduction can effectively reduce noise and improve retrieval efficiency, this paper proposes vector fusion by dimensionality reduction of GFCC features and rhythm features. The matching retrieval of audio features is an important task in music retrieval. In this paper, the DTW algorithm is chosen as the main algorithm for retrieving music. The classification retrieval of music is also achieved by the K-nearest neighbor algorithm. In this paper, after implementing the research and improvement of algorithms, these algorithms are integrated into the system to achieve audio preprocessing, feature extraction, feature postprocessing, and matching retrieval. This article uses 100 different kinds of MP3 format music as the music library and randomly selects 4 pieces each time, and it tests the system under different system parameters, recording duration, and environmental noise. Through the research of this paper, the efficiency of music retrieval is improved and theoretical support is provided for the design of music retrieval software integration system.


Author(s):  
Jiaming Chen ◽  
Weibo Yi ◽  
Dan Wang ◽  
Jinlian Du ◽  
Lihua Fu ◽  
...  

Abstract Objective. Motor imagery-based brain computer interface (MI-BCI) is one of the most important BCI paradigms and can identify the target limb of subjects from the feature of MI-based Electroencephalography (EEG) signals. Deep learning methods, especially lightweight neural networks, provide an efficient technique for MI decoding, but the performance of lightweight neural networks is still limited and need further improving. This paper aimed to design a novel lightweight neural network for improving the performance of multi-class MI decoding. Approach. A hybrid filter bank structure that can extract information in both time and frequency domain was proposed and combined with a novel channel attention method Channel Group Attention (CGA) to build a lightweight neural network Filter Bank Channel Group Attention Network (FB-CGANet). Accompanied with FB-CGANet, the Band Exchange data augmentation method was proposed to generate training data for networks with filter bank structure. Main results. The proposed method can achieve higher 4-class average accuracy (79.4%) than compared methods on the BCI Competition IV IIa dataset in the experiment on the unseen evaluation data. Also, higher average accuracy (93.5%) than compared methods can be obtained in the cross-validation experiment. Significance. This work implies the effectiveness of channel attention and filter bank structure in lightweight neural networks and provides a novel option for multi-class motor imagery classification.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6758
Author(s):  
Xiujuan Wang ◽  
Yi Sui ◽  
Kangfeng Zheng ◽  
Yutong Shi ◽  
Siwei Cao

Based on the openness and accessibility of user data, personality recognition is widely used in personalized recommendation, intelligent medicine, natural language processing, and so on. Existing approaches usually adopt a single deep learning mechanism to extract personality information from user data, which leads to semantic loss to some extent. In addition, researchers encode scattered user posts in a sequential or hierarchical manner, ignoring the connection between posts and the unequal value of different posts to classification tasks. We propose a hierarchical hybrid model based on a self-attention mechanism, namely HMAttn-ECBiL, to fully excavate deep semantic information horizontally and vertically. Multiple modules composed of convolutional neural network and bi-directional long short-term memory encode different types of personality representations in a hierarchical and partitioned manner, which pays attention to the contribution of different words in posts and different posts to personality information and captures the dependencies between scattered posts. Moreover, the addition of a word embedding module effectively makes up for the original semantics filtered by a deep neural network. We verified the hybrid model on the MyPersonality dataset. The experimental results showed that the classification performance of the hybrid model exceeds the different model architectures and baseline models, and the average accuracy reached 72.01%.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Mohammed Aliy Mohammed ◽  
Fetulhak Abdurahman ◽  
Yodit Abebe Ayalew

Abstract Background Automating cytology-based cervical cancer screening could alleviate the shortage of skilled pathologists in developing countries. Up until now, computer vision experts have attempted numerous semi and fully automated approaches to address the need. Yet, these days, leveraging the astonishing accuracy and reproducibility of deep neural networks has become common among computer vision experts. In this regard, the purpose of this study is to classify single-cell Pap smear (cytology) images using pre-trained deep convolutional neural network (DCNN) image classifiers. We have fine-tuned the top ten pre-trained DCNN image classifiers and evaluated them using five class single-cell Pap smear images from SIPaKMeD dataset. The pre-trained DCNN image classifiers were selected from Keras Applications based on their top 1% accuracy. Results Our experimental result demonstrated that from the selected top-ten pre-trained DCNN image classifiers DenseNet169 outperformed with an average accuracy, precision, recall, and F1-score of 0.990, 0.974, 0.974, and 0.974, respectively. Moreover, it dashed the benchmark accuracy proposed by the creators of the dataset with 3.70%. Conclusions Even though the size of DenseNet169 is small compared to the experimented pre-trained DCNN image classifiers, yet, it is not suitable for mobile or edge devices. Further experimentation with mobile or small-size DCNN image classifiers is required to extend the applicability of the models in real-world demands. In addition, since all experiments used the SIPaKMeD dataset, additional experiments will be needed using new datasets to enhance the generalizability of the models.


Machines ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 66
Author(s):  
Tianci Chen ◽  
Rihong Zhang ◽  
Lixue Zhu ◽  
Shiang Zhang ◽  
Xiaomin Li

In an orchard environment with a complex background and changing light conditions, the banana stalk, fruit, branches, and leaves are very similar in color. The fast and accurate detection and segmentation of a banana stalk are crucial to realize the automatic picking using a banana picking robot. In this paper, a banana stalk segmentation method based on a lightweight multi-feature fusion deep neural network (MFN) is proposed. The proposed network is mainly composed of encoding and decoding networks, in which the sandglass bottleneck design is adopted to alleviate the information a loss in high dimension. In the decoding network, a different sized dilated convolution kernel is used for convolution operation to make the extracted banana stalk features denser. The proposed network is verified by experiments. In the experiments, the detection precision, segmentation accuracy, number of parameters, operation efficiency, and average execution time are used as evaluation metrics, and the proposed network is compared with Resnet_Segnet, Mobilenet_Segnet, and a few other networks. The experimental results show that compared to other networks, the number of network parameters of the proposed network is significantly reduced, the running frame rate is improved, and the average execution time is shortened.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2117
Author(s):  
Hui Han ◽  
Zhiyuan Ren ◽  
Lin Li ◽  
Zhigang Zhu

Automatic modulation classification (AMC) is playing an increasingly important role in spectrum monitoring and cognitive radio. As communication and electronic technologies develop, the electromagnetic environment becomes increasingly complex. The high background noise level and large dynamic input have become the key problems for AMC. This paper proposes a feature fusion scheme based on deep learning, which attempts to fuse features from different domains of the input signal to obtain a more stable and efficient representation of the signal modulation types. We consider the complementarity among features that can be used to suppress the influence of the background noise interference and large dynamic range of the received (intercepted) signals. Specifically, the time-series signals are transformed into the frequency domain by Fast Fourier transform (FFT) and Welch power spectrum analysis, followed by the convolutional neural network (CNN) and stacked auto-encoder (SAE), respectively, for detailed and stable frequency-domain feature representations. Considering the complementary information in the time domain, the instantaneous amplitude (phase) statistics and higher-order cumulants (HOC) are extracted as the statistical features for fusion. Based on the fused features, a probabilistic neural network (PNN) is designed for automatic modulation classification. The simulation results demonstrate the superior performance of the proposed method. It is worth noting that the classification accuracy can reach 99.8% in the case when signal-to-noise ratio (SNR) is 0 dB.


Sign in / Sign up

Export Citation Format

Share Document