scholarly journals A Deep Multimodal Model for Predicting Affective Responses Evoked by Movies Based on Shot Segmentation

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Chunxiao Wang ◽  
Jingjing Zhang ◽  
Wei Jiang ◽  
Shuang Wang

Predicting the emotions evoked in a viewer watching movies is an important research element in affective video content analysis over a wide range of applications. Generally, the emotion of the audience is evoked by the combined effect of the audio-visual messages of the movies. Current research has mainly used rough middle- and high-level audio and visual features to predict experienced emotions, but combining semantic information to refine features to improve emotion prediction results is still not well studied. Therefore, on the premise of considering the time structure and semantic units of a movie, this paper proposes a shot-based audio-visual feature representation method and a long short-term memory (LSTM) model incorporating a temporal attention mechanism for experienced emotion prediction. First, the shot-based audio-visual feature representation defines a method for extracting and combining audio and visual features of each shot clip, and the advanced pretraining models in the related audio-visual tasks are used to extract the audio and visual features with different semantic levels. Then, four components are included in the prediction model: a nonlinear multimodal feature fusion layer, a temporal feature capture layer, a temporal attention layer, and a sentiment prediction layer. This paper focuses on experienced emotion prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method performs significantly better than the state-of-the-art while significantly reducing the number of calculations, with increases in the Pearson correlation coefficient (PCC) from 0.46 to 0.62 for arousal and from 0.18 to 0.34 for valence in experienced emotion.

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Md. Rabiul Islam ◽  
Md. Abdus Sobhan

The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI) system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM) is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs) and Linear Prediction Cepstral Coefficients (LPCCs) are combined to get the audio feature vectors and Active Shape Model (ASM) based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA) method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yezhen Liu ◽  
Xilong Yu ◽  
Yanhua Wu ◽  
Shuhong Song

Forecasting stock price trends accurately appears a huge challenge because the environment of stock markets is extremely stochastic and complicated. This challenge persistently motivates us to seek reliable pathways to guide stock trading. While the Long Short-Term Memory (LSTM) network has the dedicated gate structure quite suitable for the prediction based on contextual features, we propose a novel LSTM-based model. Also, we devise a multiscale convolutional feature fusion mechanism for the model to extensively exploit the contextual relationships hidden in consecutive time steps. The significance of our designed scheme is twofold. (1) Benefiting from the gate structure designed for both long- and short-term memories, our model can use the given stock history data more adaptively than traditional models, which greatly guarantees the prediction performance in financial time series (FTS) scenarios and thus profits the prediction of stock trends. (2) The multiscale convolutional feature fusion mechanism can diversify the feature representation and more extensively capture the FTS feature essence than traditional models, which fairly facilitates the generalizability. Empirical studies conducted on three classic stock history data sets, i.e., S&P 500, DJIA, and VIX, demonstrated the effectiveness and stability superiority of the suggested method against a few state-of-the-art models using multiple validity indices. For example, our method achieved the highest average directional accuracy (around 0.71) on the three employed stock data sets.


Author(s):  
Boyan Zhang ◽  
Yong Zhong ◽  
Zhendong Li

Deep visual feature-based method has demonstrated impressive performance in visual tracking attributing to its powerful capability of visual feature representation. However, in some complex environments such as dramatic change of appearance, illumination variation and rotation, the extracted deep visual feature is insufficient for accurately characterizing the target. To solve this problem, we present an integrated tracking framework which combines a Long Short-Term Memory (LSTM) network and a Convolutional Neural Network (CNN). Firstly, the LSTM extracted dynamics feature of target on time sequence, resulting the state of target at present time step. With that state, the accurate preprocessed bounding box was obtained. Then, deep convolutional feature of the target was extracted using a CNN, based on the processed bounding box. Finally, the position of the target was determined based on the score of the feature. During tracking stage, in order to improve the adaptation of the network, the parameters of the network were updated using samples of the target captured while successful tracking. The experiment shows that the proposed method achieves outstanding tracking performance and robustness in cases of partial occlusion, out-of-view, motion blur and fast motion.


Author(s):  
Lijun Cai ◽  
Li Wang ◽  
Xiangzheng Fu ◽  
Chenxing Xia ◽  
Xiangxiang Zeng ◽  
...  

Abstract The peptide therapeutics market is providing new opportunities for the biotechnology and pharmaceutical industries. Therefore, identifying therapeutic peptides and exploring their properties are important. Although several studies have proposed different machine learning methods to predict peptides as being therapeutic peptides, most do not explain the decision factors of model in detail. In this work, an Interpretable Therapeutic Peptide Prediction (ITP-Pred) model based on efficient feature fusion was developed. First, we proposed three kinds of feature descriptors based on sequence and physicochemical property encoded, namely amino acid composition (AAC), group AAC and coding autocorrelation, and concatenated them to obtain the feature representation of therapeutic peptide. Then, we input it into the CNN-Bi-directional Long Short-Term Memory (BiLSTM) model to automatically learn recognition of therapeutic peptides. The cross-validation and independent verification experiments results indicated that ITP-Pred has a higher prediction performance on the benchmark dataset than other comparison methods. Finally, we analyzed the output of the model from two aspects: sequence order and physical and chemical properties, mining important features as guidance for the design of better models that can complement existing methods.


Energies ◽  
2019 ◽  
Vol 12 (17) ◽  
pp. 3308 ◽  
Author(s):  
Ruijin Zhu ◽  
Weilin Guo ◽  
Xuejiao Gong

Combined cooling, heating, and power (CCHP) systems is a distributed energy system that uses the power station or heat engine to generate electricity and useful heat simultaneously. Due to its wide range of advantages including efficiency, ecological, and financial, the CCHP will be the main direction of the integrated system. The accurate prediction of heating, gas, and electrical loads plays an essential role in energy management in CCHP systems. This paper combined long short-term memory (LSTM) network and convolutional neural network (CNN) to design a novel hybrid neural network for short-term loads forecasting considering their correlation. Pearson correlation coefficient will be utilized to measure the temporal correlation between current load and historical loads, and analyze the coupling between heating, gas and electrical loads. The dropout technique is proposed to solve the over-fitting of the network due to the lack of data diversity and network parameter redundancy. The case study shows that considering the coupling between heating, gas and electrical loads can effectively improve the forecasting accuracy, the performance of the proposed approach is better than that of the traditional methods.


2019 ◽  
Author(s):  
Sushrut Thorat

A mediolateral gradation in neural responses for images spanning animals to artificial objects is observed in the ventral temporal cortex (VTC). Which information streams drive this organisation is an ongoing debate. Recently, in Proklova et al. (2016), the visual shape and category (“animacy”) dimensions in a set of stimuli were dissociated using a behavioural measure of visual feature information. fMRI responses revealed a neural cluster (extra-visual animacy cluster - xVAC) which encoded category information unexplained by visual feature information, suggesting extra-visual contributions to the organisation in the ventral visual stream. We reassess these findings using Convolutional Neural Networks (CNNs) as models for the ventral visual stream. The visual features developed in the CNN layers can categorise the shape-matched stimuli from Proklova et al. (2016) in contrast to the behavioural measures used in the study. The category organisations in xVAC and VTC are explained to a large degree by the CNN visual feature differences, casting doubt over the suggestion that visual feature differences cannot account for the animacy organisation. To inform the debate further, we designed a set of stimuli with animal images to dissociate the animacy organisation driven by the CNN visual features from the degree of familiarity and agency (thoughtfulness and feelings). Preliminary results from a new fMRI experiment designed to understand the contribution of these non-visual features are presented.


2019 ◽  
Vol 14 (5) ◽  
pp. 406-421 ◽  
Author(s):  
Ting-He Zhang ◽  
Shao-Wu Zhang

Background: Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods. Methods: In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers. Result & Conclusion: Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Sungmin O. ◽  
Rene Orth

AbstractWhile soil moisture information is essential for a wide range of hydrologic and climate applications, spatially-continuous soil moisture data is only available from satellite observations or model simulations. Here we present a global, long-term dataset of soil moisture derived through machine learning trained with in-situ measurements, SoMo.ml. We train a Long Short-Term Memory (LSTM) model to extrapolate daily soil moisture dynamics in space and in time, based on in-situ data collected from more than 1,000 stations across the globe. SoMo.ml provides multi-layer soil moisture data (0–10 cm, 10–30 cm, and 30–50 cm) at 0.25° spatial and daily temporal resolution over the period 2000–2019. The performance of the resulting dataset is evaluated through cross validation and inter-comparison with existing soil moisture datasets. SoMo.ml performs especially well in terms of temporal dynamics, making it particularly useful for applications requiring time-varying soil moisture, such as anomaly detection and memory analyses. SoMo.ml complements the existing suite of modelled and satellite-based datasets given its distinct derivation, to support large-scale hydrological, meteorological, and ecological analyses.


Information ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 3
Author(s):  
Shuang Chen ◽  
Zengcai Wang ◽  
Wenxin Chen

The effective detection of driver drowsiness is an important measure to prevent traffic accidents. Most existing drowsiness detection methods only use a single facial feature to identify fatigue status, ignoring the complex correlation between fatigue features and the time information of fatigue features, and this reduces the recognition accuracy. To solve these problems, we propose a driver sleepiness estimation model based on factorized bilinear feature fusion and a long- short-term recurrent convolutional network to detect driver drowsiness efficiently and accurately. The proposed framework includes three models: fatigue feature extraction, fatigue feature fusion, and driver drowsiness detection. First, we used a convolutional neural network (CNN) to effectively extract the deep representation of eye and mouth-related fatigue features from the face area detected in each video frame. Then, based on the factorized bilinear feature fusion model, we performed a nonlinear fusion of the deep feature representations of the eyes and mouth. Finally, we input a series of fused frame-level features into a long-short-term memory (LSTM) unit to obtain the time information of the features and used the softmax classifier to detect sleepiness. The proposed framework was evaluated with the National Tsing Hua University drowsy driver detection (NTHU-DDD) video dataset. The experimental results showed that this method had better stability and robustness compared with other methods.


2004 ◽  
Vol 05 (03) ◽  
pp. 313-327 ◽  
Author(s):  
Akihiro Miyakawa ◽  
Kaoru Sugita ◽  
Tomoyuki Ishida ◽  
Yoshitaka Shibata

In this paper, we propose a Kansei retrieval method based on the design pattern of traditional Japanese crafting object to provide a user with the desired presentation space in digital traditional Japanese crafting system. The visual quantitative feature values are extracted by using Visual Pattern Image Coding (VPIC). These values include the total number, the frequency, the dispersion rate and the deviation rate for different edges. The quantitative feature values for traditional Japanese crafting objects are registered in the multimedia database and the relation between Kansei words and the visual feature of traditional Japanese crafting objects are analyzed by using the questionnaire. Then, the visual features are compared with the quantitative feature values. Through the above process, we can find the relation between the design pattern components and edge types using VPIC. By finding this relation, the Kansei retrieval method can be realized.


Sign in / Sign up

Export Citation Format

Share Document