Selecting a Cost-Effective Seed for Maximizing the Social Influence Under Real-life Constraints

10.36227/techrxiv.14489733.v1 ◽

2021 ◽

Author(s):

Tarun Kumer Biswas

Keyword(s):

Real World ◽

Seed Set ◽

Real Life ◽

Optimal Solution ◽

Cost Effective ◽

Influence Maximization ◽

Cardinality Constraints ◽

De Algorithm ◽

Real World Datasets ◽

Simple Additive Weighting

The Influence Maximization (IM) problem aims at maximizing the diffusion of information or adoption of products among users in a social network by identifying and activating a set of initial users. In real-life applications, it is not unrealistic to have a higher activation cost for a user with higher influence. However, the existing works on IM consider finding the most influential users as the seed set, ignoring either the activation costs of such individual nodes and the total budget or the size of the seed set, which may not be always an optimal solution, particularly from the financial and managerial perspectives, respectively. To address these issues, we propose a more realistic and generalized formulation termed as multi-constraint influence maximization (MCIM) aiming to achieve a cost-effective solution under both budgetary and cardinality constraints. Unlike the existing IM formulations, the proposed MCIM is no longer a monotone but a submodular function. As it is also proved to be an NP-hard problem, we propose a simple additive weighting (SAW) assisted differential evolution (DE) algorithm for solving the large-size real-world problems. Experimental results on four real-world datasets show that the proposed formulation and algorithm are effective in finding a cost-effective seed set.

Download Full-text

Heterogeneous Influence Maximization Through Community Detection in Social Networks

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2021100107 ◽

2021 ◽

Vol 12 (4) ◽

pp. 118-131

Author(s):

Jaya Krishna Raguru ◽

Devi Prasad Sharma

Keyword(s):

Community Detection ◽

Greedy Algorithms ◽

Computational Cost ◽

Optimal Solution ◽

Influence Maximization ◽

Centrality Measures ◽

Influence Spread ◽

Real World Datasets ◽

Initial Seed ◽

High Computational Cost

The problem of identifying a seed set composed of K nodes that increase influence spread over a social network is known as influence maximization (IM). Past works showed this problem to be NP-hard and an optimal solution to this problem using greedy algorithms achieved only 63% of spread. However, this approach is expensive and suffered from performance issues like high computational cost. Furthermore, in a network with communities, IM spread is not always certain. In this paper, heterogeneous influence maximization through community detection (HIMCD) algorithm is proposed. This approach addresses initial seed nodes selection in communities using various centrality measures, and these seed nodes act as sources for influence spread. A parallel influence maximization is applied with the aid of seed node set contained in each group. In this approach, graph is partitioned and IM computations are done in a distributed manner. Extensive experiments with two real-world datasets reveals that HCDIM achieves substantial performance improvement over state-of-the-art techniques.

Download Full-text

Multiple Noisy Label Distribution Propagation for Crowdsourcing

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/204 ◽

2019 ◽

Cited By ~ 1

Author(s):

Hao Zhang ◽

Liangxiao Jiang ◽

Wenqiang Xu

Keyword(s):

Supervised Learning ◽

Real World ◽

Effective Means ◽

Ground Truth ◽

Cost Effective ◽

Nearest Neighbors ◽

True Label ◽

Real World Datasets ◽

The Individual ◽

Label Distribution

Crowdsourcing services provide a fast, efficient, and cost-effective means of obtaining large labeled data for supervised learning. Ground truth inference, also called label integration, designs proper aggregation strategies to infer the unknown true label of each instance from the multiple noisy label set provided by ordinary crowd workers. However, to the best of our knowledge, nearly all existing label integration methods focus solely on the multiple noisy label set itself of the individual instance while totally ignoring the intercorrelation among multiple noisy label sets of different instances. To solve this problem, a multiple noisy label distribution propagation (MNLDP) method is proposed in this study. MNLDP first transforms the multiple noisy label set of each instance into its multiple noisy label distribution and then propagates its multiple noisy label distribution to its nearest neighbors. Consequently, each instance absorbs a fraction of the multiple noisy label distributions from its nearest neighbors and yet simultaneously maintains a fraction of its own original multiple noisy label distribution. Promising experimental results on simulated and real-world datasets validate the effectiveness of our proposed method.

Download Full-text

Quadruply Stochastic Gradient Method for Large Scale Nonlinear Semi-Supervised Ordinal Regression AUC Optimization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6029 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5734-5741

Author(s):

Wanli Shi ◽

Bin Gu ◽

Xiang Li ◽

Heng Huang

Keyword(s):

Real World ◽

Large Scale ◽

Optimal Solution ◽

Ordinal Regression ◽

Data Sampling ◽

Decomposition Approach ◽

Scalable Algorithm ◽

Auc Optimization ◽

Stochastic Data ◽

Real World Datasets

Semi-supervised ordinal regression (S2OR) problems are ubiquitous in real-world applications, where only a few ordered instances are labeled and massive instances remain unlabeled. Recent researches have shown that directly optimizing concordance index or AUC can impose a better ranking on the data than optimizing the traditional error rate in ordinal regression (OR) problems. In this paper, we propose an unbiased objective function for S2OR AUC optimization based on ordinal binary decomposition approach. Besides, to handle the large-scale kernelized learning problems, we propose a scalable algorithm called QS3ORAO using the doubly stochastic gradients (DSG) framework for functional optimization. Theoretically, we prove that our method can converge to the optimal solution at the rate of O(1/t), where t is the number of iterations for stochastic data sampling. Extensive experimental results on various benchmark and real-world datasets also demonstrate that our method is efficient and effective while retaining similar generalization performance.

Download Full-text

NeuRec: On Nonlinear Transformation for Personalized Ranking

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/510 ◽

2018 ◽

Cited By ~ 13

Author(s):

Shuai Zhang ◽

Lina Yao ◽

Aixin Sun ◽

Sen Wang ◽

Guodong Long ◽

...

Keyword(s):

Neural Network ◽

Real World ◽

Real Life ◽

Interaction Matrix ◽

Interaction Patterns ◽

Latent Factors ◽

Integrated Network ◽

Ranking Task ◽

Real World Datasets ◽

Personalized Ranking

Modeling user-item interaction patterns is an important task for personalized recommendations. Many recommender systems are based on the assumption that there exists a linear relationship between users and items while neglecting the intricacy and non-linearity of real-life historical interactions. In this paper, we propose a neural network based recommendation model (NeuRec) that untangles the complexity of user-item interactions and establish an integrated network to combine non-linear transformation with latent factors. We further design two variants of NeuRec: user-based NeuRec and item-based NeuRec, by focusing on different aspects of the interaction matrix. Extensive experiments on four real-world datasets demonstrated their superior performances on personalized ranking task.

Download Full-text

Distributed Latent Dirichlet Allocation on Streams

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451528 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-20

Author(s):

Yunyan Guo ◽

Jianzhong Li

Keyword(s):

Real Time ◽

Language Processing ◽

Real World ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Real Life ◽

Streaming Data ◽

Real World Datasets ◽

Dirichlet Allocation ◽

Online Inference

Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution , data turbulence , and real-time inference . In this article, we propose a novel distributed LDA algorithm—referred to as StreamFed-LDA— to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics from the most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.

Download Full-text

AOI-shapes: An Efficient Footprint Algorithm to Support Visualization of User-defined Urban Areas of Interest

ACM Transactions on Interactive Intelligent Systems ◽

10.1145/3431817 ◽

2021 ◽

Vol 11 (3-4) ◽

pp. 1-32

Author(s):

Mingzhao Li ◽

Zhifeng Bao ◽

Farhana Choudhury ◽

Hanan Samet ◽

Matt Duckham ◽

...

Keyword(s):

Real Estate ◽

Real World ◽

Urban Areas ◽

Real Life ◽

Scalable Algorithms ◽

Interactive Query ◽

Boundary Information ◽

Real World Datasets ◽

Effective Visualization ◽

Areas Of Interest

Understanding urban areas of interest (AOIs) is essential in many real-life scenarios, and such AOIs can be computed based on the geographic points that satisfy user queries. In this article, we study the problem of efficient and effective visualization of user-defined urban AOIs in an interactive manner. In particular, we first define the problem of user-defined AOI visualization based on a real estate data visualization scenario, and we illustrate why a novel footprint method is needed to support the visualization. After extensively reviewing existing “footprint” methods, we propose a parameter-free footprint method, named AOI-shapes, to capture the boundary information of a user-defined urban AOI. Next, to allow interactive query refinements by the user, we propose two efficient and scalable algorithms to incrementally generate urban AOIs by reusing existing visualization results. Finally, we conduct extensive experiments with both synthetic and real-world datasets to demonstrate the quality and efficiency of the proposed methods.

Download Full-text

Neuropsychology in the Real World

Zeitschrift für Neuropsychologie ◽

10.1024/1016-264x/a000139 ◽

2014 ◽

Vol 25 (4) ◽

pp. 233-238 ◽

Cited By ~ 2

Author(s):

Martin Peper ◽

Simone N. Loeffler

Keyword(s):

Neuropsychological Assessment ◽

Real World ◽

Ecological Validity ◽

Real Life ◽

Emotional States ◽

Context Sensitive ◽

Traditional Assessment ◽

Life Data ◽

Real Life Data ◽

Assessment And Treatment

Current ambulatory technologies are highly relevant for neuropsychological assessment and treatment as they provide a gateway to real life data. Ambulatory assessment of cognitive complaints, skills and emotional states in natural contexts provides information that has a greater ecological validity than traditional assessment approaches. This issue presents an overview of current technological and methodological innovations, opportunities, problems and limitations of these methods designed for the context-sensitive measurement of cognitive, emotional and behavioral function. The usefulness of selected ambulatory approaches is demonstrated and their relevance for an ecologically valid neuropsychology is highlighted.

Download Full-text

Time-Efficient Ensemble Learning with Sample Exchange for Edge Computing

ACM Transactions on Internet Technology ◽

10.1145/3409265 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Wu Chen ◽

Yong Yu ◽

Keke Gai ◽

Jiamou Liu ◽

Kim-Kwang Raymond Choo

Keyword(s):

Ensemble Learning ◽

Real World ◽

Interaction Mechanism ◽

Training Model ◽

Edge Computing ◽

Learning Techniques ◽

Multi Agent ◽

Real World Datasets ◽

Entire Dataset ◽

Exchange Data

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text