scholarly journals Predicting Implicit User Preferences with Multimodal Feature Fusion for Similar User Recommendation in Social Media

2021 ◽  
Vol 11 (3) ◽  
pp. 1064
Author(s):  
Jenq-Haur Wang ◽  
Yen-Tsang Wu ◽  
Long Wang

In social networks, users can easily share information and express their opinions. Given the huge amount of data posted by many users, it is difficult to search for relevant information. In addition to individual posts, it would be useful if we can recommend groups of people with similar interests. Past studies on user preference learning focused on single-modal features such as review contents or demographic information of users. However, such information is usually not easy to obtain in most social media without explicit user feedback. In this paper, we propose a multimodal feature fusion approach to implicit user preference prediction which combines text and image features from user posts for recommending similar users in social media. First, we use the convolutional neural network (CNN) and TextCNN models to extract image and text features, respectively. Then, these features are combined using early and late fusion methods as a representation of user preferences. Lastly, a list of users with the most similar preferences are recommended. The experimental results on real-world Instagram data show that the best performance can be achieved when we apply late fusion of individual classification results for images and texts, with the best average top-k accuracy of 0.491. This validates the effectiveness of utilizing deep learning methods for fusing multimodal features to represent social user preferences. Further investigation is needed to verify the performance in different types of social media.

2020 ◽  
Vol 2020 (8) ◽  
pp. 270-1-270-6
Author(s):  
Naina Said ◽  
Aysha Nayab ◽  
Kashif Ahmad ◽  
Muhib Ullah ◽  
Touqir Gohar ◽  
...  

In recent years, social media outlets have been widely exploited for disaster analysis and retrieving relevant information. Social media information can help in several ways, such as finding the mostly affected areas and information on casualties and scope of the damage etc. In this paper, we tackle a specific facet of social media in natural disasters, namely the identification of passable routs in a flooded region. In detail, we propose several solutions for two relevant tasks, namely (i) identification of flooded and non-flooded images in a collection of images retrieved from social media, and (ii) identification of passable roads in a flooded region. To this aim, we mainly rely on existing deep models pre-trained on ImageNet and Places dataset, where the models pre-trained on ImageNet extract object specific and the ones pre-trained on places dataset extract scene-level features. In order to properly utilize the object and scene-level features, we rely on different fusion methods including Particle Swarm Optimization (PSO) and Genetic Modeling of the deep features in a late fusion manner. The evaluation of the proposed methods are carried out on the large-scale datasets provided for MediaEval- 2018 benchmarking competition on Multimedia and Satellites. The results demonstrate significant improvement in the performance over the baselines.


2021 ◽  
Vol 8 (7) ◽  
pp. 97-105
Author(s):  
Ali Ahmed ◽  
◽  
Sara Mohamed ◽  

Content-Based Image Retrieval (CBIR) systems retrieve images from the image repository or database in which they are visually similar to the query image. CBIR plays an important role in various fields such as medical diagnosis, crime prevention, web-based searching, and architecture. CBIR consists mainly of two stages: The first is the extraction of features and the second is the matching of similarities. There are several ways to improve the efficiency and performance of CBIR, such as segmentation, relevance feedback, expansion of queries, and fusion-based methods. The literature has suggested several methods for combining and fusing various image descriptors. In general, fusion strategies are typically divided into two groups, namely early and late fusion strategies. Early fusion is the combination of image features from more than one descriptor into a single vector before the similarity computation, while late fusion refers either to the combination of outputs produced by various retrieval systems or to the combination of different rankings of similarity. In this study, a group of color and texture features is proposed to be used for both methods of fusion strategies. Firstly, an early combination of eighteen color features and twelve texture features are combined into a single vector representation and secondly, the late fusion of three of the most common distance measures are used in the late fusion stage. Our experimental results on two common image datasets show that our proposed method has good performance retrieval results compared to the traditional way of using single features descriptor and also has an acceptable retrieval performance compared to some of the state-of-the-art methods. The overall accuracy of our proposed method is 60.6% and 39.07% for Corel-1K and GHIM-10K ‎datasets, respectively.


2020 ◽  
Vol 2020 ◽  
pp. 1-18
Author(s):  
Chao Tang ◽  
Huosheng Hu ◽  
Wenjian Wang ◽  
Wei Li ◽  
Hua Peng ◽  
...  

The representation and selection of action features directly affect the recognition effect of human action recognition methods. Single feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that the existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper proposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action features with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good geometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion and has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space structure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decision-making classification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and CAD60 datasets.


Author(s):  
C Santha Kumar ◽  
V Mallesi

In recent years, photo-based social media has become one of the most common social media platforms. Understanding user preferences in user-generated images and making suggestions has become a major necessity due to the large number of images uploaded daily. Several types of hybrids have been suggested to improve the performance of the recommendations by combining different types of third-party information (e.g., image representation, interaction) with user object history. Previous research, however, has failed to incorporate complex factors that affect user preferences into the corresponding framework due to various image features created by users on social media. In addition, many of these hybrid models have used pre-defined weights to combine different types of data, resulting in less favorable performance. To this end, we present a consistent model for capturing public imagery in this paper. We define three key elements (i.e., upload history, social exposure, and proprietary information) that affect each user's preferences, where each item summarizes the content aspect from complex interactions between users and images, in addition to the basic matrix interest model matrix factorization proposal. After that, we create a consecutive natural attention network that demonstrates a consistent relationship between hidden user interests and known key elements (elements at each level and feature level). A sequential attention network will learn to pay attention to more or less content using embedding from higher learning models designed for each type of data. Finally, the availability of extensive tests on real-world information indicates that our proposed model is superior.


Author(s):  
Hamidreza Tahmasbi ◽  
Mehrdad Jalali ◽  
Hassan Shakeri

AbstractAn essential problem in real-world recommender systems is that user preferences are not static and users are likely to change their preferences over time. Recent studies have shown that the modelling and capturing the dynamics of user preferences lead to significant improvements on recommendation accuracy and, consequently, user satisfaction. In this paper, we develop a framework to capture user preference dynamics in a personalized manner based on the fact that changes in user preferences can vary individually. We also consider the plausible assumption that older user activities should have less influence on a user’s current preferences. We introduce an individual time decay factor for each user according to the rate of his preference dynamics to weigh the past user preferences and decrease their importance gradually. We exploit users’ demographics as well as the extracted similarities among users over time, aiming to enhance the prior knowledge about user preference dynamics, in addition to the past weighted user preferences in a developed coupled tensor factorization technique to provide top-K recommendations. The experimental results on the two real social media datasets—Last.fm and Movielens—indicate that our proposed model is better and more robust than other competitive methods in terms of recommendation accuracy and is more capable of coping with problems such as cold-start and data sparsity.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xiaodong Liu ◽  
Songyang Li ◽  
Miao Wang

The context, such as scenes and objects, plays an important role in video emotion recognition. The emotion recognition accuracy can be further improved when the context information is incorporated. Although previous research has considered the context information, the emotional clues contained in different images may be different, which is often ignored. To address the problem of emotion difference between different modes and different images, this paper proposes a hierarchical attention-based multimodal fusion network for video emotion recognition, which consists of a multimodal feature extraction module and a multimodal feature fusion module. The multimodal feature extraction module has three subnetworks used to extract features of facial, scene, and global images. Each subnetwork consists of two branches, where the first branch extracts the features of different modes, and the other branch generates the emotion score for each image. Features and emotion scores of all images in a modal are aggregated to generate the emotion feature of the modal. The other module takes multimodal features as input and generates the emotion score for each modal. Finally, features and emotion scores of multiple modes are aggregated, and the final emotion representation of the video will be produced. Experimental results show that our proposed method is effective on the emotion recognition dataset.


2021 ◽  
Vol 11 (22) ◽  
pp. 10567
Author(s):  
Reishi Amitani ◽  
Kazuyuki Matsumoto ◽  
Minoru Yoshida ◽  
Kenji Kita

This study investigates social media trends and proposes a buzz tweet classification method to explore the factors causing the buzz phenomenon on Twitter. It is difficult to identify the causes of the buzz phenomenon based solely on texts posted on Twitter. It is expected that by limiting the tweets to those with attached images and using the characteristics of the images and the relationships between the text and images, a more detailed analysis than that of with text-only tweets can be conducted. Therefore, an analysis method was devised based on a multi-task neural network that uses both the features extracted from the image and text as input and the buzz class (buzz/non-buzz) and the number of “likes (favorites)” and “retweets (RTs)” as output. The predictions made using a single feature of the text and image were compared with the predictions using a combination of multiple features. The differences between buzz and non-buzz features were analyzed based on the cosine similarity between the text and the image. The buzz class was correctly identified with a correctness rate of approximately 80% for all combinations of image and text features, with the combination of BERT and VGG16 providing the highest correctness rate.


Author(s):  
Pengwei Hu ◽  
Chenhao Lin ◽  
Hui Su ◽  
Shaochun Li ◽  
Xue Han ◽  
...  

The use of social media runs through our lives, and users' emotions are also affected by it. Previous studies have reported social organizations and psychologists using social media to find depressed patients. However, due to the variety of content published by users, it isn't effortless for the system to consider the text, image, and even the hidden information behind the image. To address this problem, we proposed a new system for social media screening of depressed patients named BlueMemo. We collected real-time posts from Twitter. Based on the posts, learned text features, image features, and visual attributes were extracted as three modalities and were fed into a multi-modal fusion and classification model to implement our system. The proposed BlueMemo has the power to help physicians and clinicians quickly and accurately identify users at potential risk for depression.


2017 ◽  
Vol 2017 ◽  
pp. 1-11
Author(s):  
Ibrahim Delibalta ◽  
Lemi Baruh ◽  
Suleyman Serdar Kozat

We provide a causal inference framework to model the effects of machine learning algorithms on user preferences. We then use this mathematical model to prove that the overall system can be tuned to alter those preferences in a desired manner. A user can be an online shopper or a social media user, exposed to digital interventions produced by machine learning algorithms. A user preference can be anything from inclination towards a product to a political party affiliation. Our framework uses a state-space model to represent user preferences as latent system parameters which can only be observed indirectly via online user actions such as a purchase activity or social media status updates, shares, blogs, or tweets. Based on these observations, machine learning algorithms produce digital interventions such as targeted advertisements or tweets. We model the effects of these interventions through a causal feedback loop, which alters the corresponding preferences of the user. We then introduce algorithms in order to estimate and later tune the user preferences to a particular desired form. We demonstrate the effectiveness of our algorithms through experiments in different scenarios.


Symmetry ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 2010
Author(s):  
Kang Zhang ◽  
Yushui Geng ◽  
Jing Zhao ◽  
Jianxin Liu ◽  
Wenxiao Li

In recent years, with the popularity of social media, users are increasingly keen to express their feelings and opinions in the form of pictures and text, which makes multimodal data with text and pictures the con tent type with the most growth. Most of the information posted by users on social media has obvious sentimental aspects, and multimodal sentiment analysis has become an important research field. Previous studies on multimodal sentiment analysis have primarily focused on extracting text and image features separately and then combining them for sentiment classification. These studies often ignore the interaction between text and images. Therefore, this paper proposes a new multimodal sentiment analysis model. The model first eliminates noise interference in textual data and extracts more important image features. Then, in the feature-fusion part based on the attention mechanism, the text and images learn the internal features from each other through symmetry. Then the fusion features are applied to sentiment classification tasks. The experimental results on two common multimodal sentiment datasets demonstrate the effectiveness of the proposed model.


Sign in / Sign up

Export Citation Format

Share Document