scholarly journals Combined centrality measures for an improved characterization of influence spread in social networks

2020 ◽  
Vol 8 (1) ◽  
Author(s):  
Mehmet Şimşek ◽  
Henning Meyerhenke

Abstract Influence Maximization (IM) aims at finding the most influential users in a social network, that is, users who maximize the spread of an opinion within a certain propagation model. Previous work investigated the correlation between influence spread and nodal centrality measures to bypass more expensive IM simulations. The results were promising but incomplete, since these studies investigated the performance (i.e. the ability to identify influential users) of centrality measures only in restricted settings, for example, in undirected/unweighted networks and/or within a propagation model less common for IM. In this article, we first show that good results within the Susceptible-Infected-Removed propagation model for unweighted and undirected networks do not necessarily transfer to directed or weighted networks under the popular Independent Cascade (IC) propagation model. Then, we identify a set of centrality measures with good performance for weighted and directed networks within the IC model. Our main contribution is a new way to combine the centrality measures in a closed formula to yield even better results. Additionally, we also extend gravitational centrality (GC) with the proposed combined centrality measures. Our experiments on 50 real-world data sets show that our proposed centrality measures outperform well-known centrality measures and the state-of-the art GC measure significantly.

Author(s):  
K Sobha Rani

Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.


2021 ◽  
pp. 1-13
Author(s):  
Qingtian Zeng ◽  
Xishi Zhao ◽  
Xiaohui Hu ◽  
Hua Duan ◽  
Zhongying Zhao ◽  
...  

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.


2018 ◽  
Author(s):  
Arghavan Bahadorinejad ◽  
Ivan Ivanov ◽  
Johanna W Lampe ◽  
Meredith AJ Hullar ◽  
Robert S Chapkin ◽  
...  

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.


2021 ◽  
Author(s):  
Alberto Vera ◽  
Siddhartha Banerjee ◽  
Samitha Samaranayake

Motivated by the needs of modern transportation service platforms, we study the problem of computing constrained shortest paths (CSP) at scale via preprocessing techniques. Our work makes two contributions in this regard: 1) We propose a scalable algorithm for CSP queries and show how its performance can be parametrized in terms of a new network primitive, the constrained highway dimension. This development extends recent work that established the highway dimension as the appropriate primitive for characterizing the performance of unconstrained shortest-path (SP) algorithms. Our main theoretical contribution is deriving conditions relating the two notions, thereby providing a characterization of networks where CSP and SP queries are of comparable hardness. 2) We develop practical algorithms for scalable CSP computation, augmenting our theory with additional network clustering heuristics. We evaluate these algorithms on real-world data sets to validate our theoretical findings. Our techniques are orders of magnitude faster than existing approaches while requiring only limited additional storage and preprocessing.


Author(s):  
Xuan Wu ◽  
Qing-Guo Chen ◽  
Yao Hu ◽  
Dengbao Wang ◽  
Xiaodong Chang ◽  
...  

Multi-view multi-label learning serves an important framework to learn from objects with diverse representations and rich semantics. Existing multi-view multi-label learning techniques focus on exploiting shared subspace for fusing multi-view representations, where helpful view-specific information for discriminative modeling is usually ignored. In this paper, a novel multi-view multi-label learning approach named SIMM is proposed which leverages shared subspace exploitation and view-specific information extraction. For shared subspace exploitation, SIMM jointly minimizes confusion adversarial loss and multi-label loss to utilize shared information from all views. For view-specific information extraction, SIMM enforces an orthogonal constraint w.r.t. the shared subspace to utilize view-specific discriminative information. Extensive experiments on real-world data sets clearly show the favorable performance of SIMM against other state-of-the-art multi-view multi-label learning approaches.


2018 ◽  
Vol 30 (6) ◽  
pp. 1647-1672 ◽  
Author(s):  
Bei Wu ◽  
Bifan Wei ◽  
Jun Liu ◽  
Zhaotong Guo ◽  
Yuanhao Zheng ◽  
...  

Most community question answering (CQA) websites manage plenty of question-answer pairs (QAPs) through topic-based organizations, which may not satisfy users' fine-grained search demands. Facets of topics serve as a powerful tool to navigate, refine, and group the QAPs. In this work, we propose FACM, a model to annotate QAPs with facets by extending convolution neural networks (CNNs) with a matching strategy. First, phrase information is incorporated into text representation by CNNs with different kernel sizes. Then, through a matching strategy among QAPs and facet label texts (FaLTs) acquired from Wikipedia, we generate similarity matrices to deal with the facet heterogeneity. Finally, a three-channel CNN is trained for facet label assignment of QAPs. Experiments on three real-world data sets show that FACM outperforms the state-of-the-art methods.


Author(s):  
Zhi Lu ◽  
Yang Hu ◽  
Bing Zeng

Factorization models have been extensively used for recovering the missing entries of a matrix or tensor. However, directly computing all of the entries using the learned factorization models is prohibitive when the size of the matrix/tensor is large. On the other hand, in many applications, such as collaborative filtering, we are only interested in a few entries that are the largest among them. In this work, we propose a sampling-based approach for finding the top entries of a tensor which is decomposed by the CANDECOMP/PARAFAC model. We develop an algorithm to sample the entries with probabilities proportional to their values. We further extend it to make the sampling proportional to the $k$-th power of the values, amplifying the focus on the top ones. We provide theoretical analysis of the sampling algorithm and evaluate its performance on several real-world data sets. Experimental results indicate that the proposed approach is orders of magnitude faster than exhaustive computing. When applied to the special case of searching in a matrix, it also requires fewer samples than the other state-of-the-art method.


2021 ◽  
Vol 2021 (12) ◽  
pp. 124006
Author(s):  
Zhenyu Liao ◽  
Romain Couillet ◽  
Michael W Mahoney

Abstract This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples n, their dimension p, and the dimension of feature space N are all large and comparable. In this regime, the random RFF Gram matrix no longer converges to the well-known limiting Gaussian kernel matrix (as it does when N → ∞ alone), but it still has a tractable behavior that is captured by our analysis. This analysis also provides accurate estimates of training and test regression errors for large n, p, N. Based on these estimates, a precise characterization of two qualitatively different phases of learning, including the phase transition between them, is provided; and the corresponding double descent test error curve is derived from this phase transition behavior. These results do not depend on strong assumptions on the data distribution, and they perfectly match empirical results on real-world data sets.


Entropy ◽  
2019 ◽  
Vol 21 (3) ◽  
pp. 254 ◽  
Author(s):  
Shaokai Wang ◽  
Xutao Li ◽  
Yunming Ye ◽  
Shanshan Feng ◽  
Raymond Lau ◽  
...  

Presently, many users are involved in multiple social networks. Identifying the same user in different networks, also known as anchor link prediction, becomes an important problem, which can serve numerous applications, e.g., cross-network recommendation, user profiling, etc. Previous studies mainly use hand-crafted structure features, which, if not carefully designed, may fail to reflect the intrinsic structure regularities. Moreover, most of the methods neglect the attribute information of social networks. In this paper, we propose a novel semi-supervised network-embedding model to address the problem. In the model, each node of the multiple networks is represented by a vector for anchor link prediction, which is learnt with awareness of observed anchor links as semi-supervised information, and topology structure and attributes as input. Experimental results on the real-world data sets demonstrate the superiority of the proposed model compared to state-of-the-art techniques.


2020 ◽  
Vol 34 (04) ◽  
pp. 4691-4698
Author(s):  
Shu Li ◽  
Wen-Tao Li ◽  
Wei Wang

In many real-world applications, the data have several disjoint sets of features and each set is called as a view. Researchers have developed many multi-view learning methods in the past decade. In this paper, we bring Graph Convolutional Network (GCN) into multi-view learning and propose a novel multi-view semi-supervised learning method Co-GCN by adaptively exploiting the graph information from the multiple views with combined Laplacians. Experimental results on real-world data sets verify that Co-GCN can achieve better performance compared with state-of-the-art multi-view semi-supervised methods.


Sign in / Sign up

Export Citation Format

Share Document