iBGP: A Bipartite Graph Propagation Approach for Mobile Advertising Fraud Detection

Online mobile advertising plays a vital financial role in supporting free mobile apps, but detecting malicious apps publishers who generate fraudulent actions on the advertisements hosted on their apps is difficult, since fraudulent traffic often mimics behaviors of legitimate users and evolves rapidly. In this paper, we propose a novel bipartite graph-based propagation approach, iBGP, for mobile apps advertising fraud detection in large advertising system. We exploit the characteristics of mobile advertising user’s behavior and identify two persistent patterns: power law distribution and pertinence and propose an automatic initial score learning algorithm to formulate both concepts to learn the initial scores of non-seed nodes. We propose a weighted graph propagation algorithm to propagate the scores of all nodes in the user-app bipartite graphs until convergence. To extend our approach for large-scale settings, we decompose the objective function of the initial score learning model into separate one-dimensional problems and parallelize the whole approach on an Apache Spark cluster. iBGP was applied on a large synthetic dataset and a large real-world mobile advertising dataset; experiment results demonstrate that iBGP significantly outperforms other popular graph-based propagation methods.

Download Full-text

GFD: A Weighted Heterogeneous Graph Embedding Based Approach for Fraud Detection in Mobile Advertising

Security and Communication Networks ◽

10.1155/2020/8810817 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Jinlong Hu ◽

Tenghui Li ◽

Yi Zhuang ◽

Song Huang ◽

Shoubin Dong

Keyword(s):

Time Window ◽

Mobile Apps ◽

Online Advertising ◽

Fraud Detection ◽

Graph Embedding ◽

Vital Role ◽

Mobile App ◽

Mobile Advertising ◽

Learning Methods ◽

Statistical Analysis Method

Online mobile advertising plays a vital role in the mobile app ecosystem. The mobile advertising frauds caused by fraudulent clicks or other actions on advertisements are considered one of the most critical issues in mobile advertising systems. To combat the evolving mobile advertising frauds, machine learning methods have been successfully applied to identify advertising frauds in tabular data, distinguishing suspicious advertising fraud operation from normal one. However, such approaches may suffer from labor-intensive feature engineering and robustness of the detection algorithms, since the online advertising big data and complex fraudulent advertising actions generated by malicious codes, botnets, and click-firms are constantly changing. In this paper, we propose a novel weighted heterogeneous graph embedding and deep learning-based fraud detection approach, namely, GFD, to identify fraudulent apps for mobile advertising. In the proposed GFD approach, (i) we construct a weighted heterogeneous graph to represent behavior patterns between users, mobile apps, and mobile ads and design a weighted metapath to vector algorithm to learn node representations (graph-based features) from the graph; (ii) we use a time window based statistical analysis method to extract intrinsic features (attribute-based features) from the tabular sample data; (iii) we propose a hybrid neural network to fuse graph-based features and attribute-based features for classifying the fraudulent apps from normal apps. The GFD approach was applied on a large real-world mobile advertising dataset, and experiment results demonstrate that the approach significantly outperforms well-known learning methods.

Download Full-text

Interactive Exploration of Large-Scale UI Datasets with Design Maps

Interacting with Computers ◽

10.1093/iwcomp/iwab006 ◽

2021 ◽

Author(s):

Luis A Leiva ◽

Asutosh Hota ◽

Antti Oulasvirta

Keyword(s):

User Interface ◽

Point Cloud ◽

Large Scale ◽

Mobile Apps ◽

Online Resources ◽

Design Tools ◽

Support Design ◽

Design Exploration ◽

Clustering Techniques ◽

Cloud Points

Abstract Designers are increasingly using online resources for inspiration. How to best support design exploration without compromising creativity? We introduce and study Design Maps, a class of point-cloud visualizations that makes large user interface datasets explorable. Design Maps are computed using dimensionality reduction and clustering techniques, which we analyze thoroughly in this paper. We present concepts for integrating Design Maps into design tools, including interactive visualization, local neighborhood exploration and functionality to integrate existing solutions to the design at hand. These concepts were implemented in a wireframing tool for mobile apps, which was evaluated with actual designers performing realistic tasks. Overall, designers find Design Maps supporting their creativity (avg. CSI score of 74/100) and indicate that the maps producing consistent whitespacing within cloud points are the most informative ones.

Download Full-text

A Scalable Redefined Stochastic Blockmodel

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442589 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-28

Author(s):

Xueyan Liu ◽

Bo Yang ◽

Hechang Chen ◽

Katarzyna Musial ◽

Hongxu Chen ◽

...

Keyword(s):

Large Scale ◽

Network Science ◽

Learning Algorithm ◽

State Of The Art ◽

Real World Data ◽

Computational Overhead ◽

Stochastic Blockmodel ◽

Np Hard Problem ◽

Large Scale Networks ◽

The Cost

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1

Download Full-text

Exploiting Proximity-Based Mobile Apps for Large-Scale Location Privacy Probing

Security and Communication Networks ◽

10.1155/2018/3182402 ◽

2018 ◽

Vol 2018 ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Shuang Zhao ◽

Xiapu Luo ◽

Xiaobo Ma ◽

Bo Bai ◽

Yankang Zhao ◽

...

Keyword(s):

Social Networks ◽

Case Studies ◽

Real World ◽

Emission Reduction ◽

Large Scale ◽

Location Privacy ◽

Mobile Apps ◽

Physical World ◽

Typical Type ◽

Location Spoofing

Proximity-based apps have been changing the way people interact with each other in the physical world. To help people extend their social networks, proximity-based nearby-stranger (NS) apps that encourage people to make friends with nearby strangers have gained popularity recently. As another typical type of proximity-based apps, some ridesharing (RS) apps allowing drivers to search nearby passengers and get their ridesharing requests also become popular due to their contribution to economy and emission reduction. In this paper, we concentrate on the location privacy of proximity-based mobile apps. By analyzing the communication mechanism, we find that many apps of this type are vulnerable to large-scale location spoofing attack (LLSA). We accordingly propose three approaches to performing LLSA. To evaluate the threat of LLSA posed to proximity-based mobile apps, we perform real-world case studies against an NS app named Weibo and an RS app called Didi. The results show that our approaches can effectively and automatically collect a huge volume of users’ locations or travel records, thereby demonstrating the severity of LLSA. We apply the LLSA approaches against nine popular proximity-based apps with millions of installations to evaluate the defense strength. We finally suggest possible countermeasures for the proposed attacks.

Download Full-text

Incremental multi-classifier learning algorithm on grid'5000 for large scale image annotation

Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval - VLS-MCMR '10 ◽

10.1145/1878137.1878139 ◽

2010 ◽

Cited By ~ 1

Author(s):

Yubing Tong ◽

Bahjat Safadi ◽

Georges Quénot

Keyword(s):

Large Scale ◽

Image Annotation ◽

Learning Algorithm ◽

Classifier Learning

Download Full-text

Search engine click spam detection based on bipartite graph propagation

Proceedings of the 7th ACM international conference on Web search and data mining - WSDM '14 ◽

10.1145/2556195.2556214 ◽

2014 ◽

Cited By ~ 8

Author(s):

Xin Li ◽

Min Zhang ◽

Yiqun Liu ◽

Shaoping Ma ◽

Yijiang Jin ◽

...

Keyword(s):

Bipartite Graph ◽

Search Engine ◽

Spam Detection ◽

Graph Propagation

Download Full-text

Short Time Solar Power Forecasting Using Persistence Extreme Learning Machine Approach

E3S Web of Conferences ◽

10.1051/e3sconf/202129401002 ◽

2021 ◽

Vol 294 ◽

pp. 01002

Author(s):

Xiaoyan Xiang ◽

Yao Sun ◽

Xiaofei Deng

Keyword(s):

Solar Energy ◽

Extreme Learning Machine ◽

Power Output ◽

Smart Grids ◽

Large Scale ◽

Solar Power ◽

Learning Algorithm ◽

Power Performance ◽

Energy Prediction ◽

Learning Machine

Solar energy in nature is irregular, so photovoltaic (PV) power performance is intermittent, and highly dependent on solar radiation, temperature and other meteorological parameters. Accurately predicting solar power to ensure the economic operation of micro-grids (MG) and smart grids is an important challenge to improve the large-scale application of PV to traditional power systems. In this paper, a hybrid machine learning algorithm is proposed to predict solar power accurately, and Persistence Extreme Learning Machine(P-ELM) algorithm is used to train the system. The input parameters are the temperature, sunshine and solar power output at the time of i, and the output parameters are the temperature, sunshine and solar power output at the time i+1. The proposed method can realize the prediction of solar power output 20 minutes in advance. Mean absolute error (MAE) and root-mean-square error (RMSE) are used to characterize the performance of P-ELM algorithm, and compared with ELM algorithm. The results show that the accuracy of P-ELM algorithm is better in short-term prediction, and P-ELM algorithm is very suitable for real-time solar energy prediction accuracy and reliability.

Download Full-text

Research on Efficient Deep Learning Algorithm Based on ShuffleGhost in the Field of Virtual Reality

Wireless Communications and Mobile Computing ◽

10.1155/2021/1382781 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Bangtong Huang ◽

Hongquan Zhang ◽

Zihong Chen ◽

Lingling Li ◽

Lihua Shi

Keyword(s):

Virtual Reality ◽

Deep Learning ◽

Large Scale ◽

Learning Algorithm ◽

Feature Maps ◽

Embedded Devices ◽

Feature Map ◽

Deep Learning Algorithm ◽

Proper Design ◽

The Cost

Deep learning algorithms are facing the limitation in virtual reality application due to the cost of memory, computation, and real-time computation problem. Models with rigorous performance might suffer from enormous parameters and large-scale structure, and it would be hard to replant them onto embedded devices. In this paper, with the inspiration of GhostNet, we proposed an efficient structure ShuffleGhost to make use of the redundancy in feature maps to alleviate the cost of computations, as well as tackling some drawbacks of GhostNet. Since GhostNet suffers from high computation of convolution in Ghost module and shortcut, the restriction of downsampling would make it more difficult to apply Ghost module and Ghost bottleneck to other backbone. This paper proposes three new kinds of ShuffleGhost structure to tackle the drawbacks of GhostNet. The ShuffleGhost module and ShuffleGhost bottlenecks are utilized by the shuffle layer and group convolution from ShuffleNet, and they are designed to redistribute the feature maps concatenated from Ghost Feature Map and Primary Feature Map. Besides, they eliminate the gap of them and extract the features. Then, SENet layer is adopted to reduce the computation cost of group convolution, as well as evaluating the importance of the feature maps which concatenated from Ghost Feature Maps and Primary Feature Maps and giving proper weights for the feature maps. This paper conducted some experiments and proved that the ShuffleGhostV3 has smaller trainable parameters and FLOPs with the ensurance of accuracy. And with proper design, it could be more efficient in both GPU and CPU side.

Download Full-text

GRUN: an observation-based global gridded runoff dataset from 1902 to 2014

Earth System Science Data ◽

10.5194/essd-11-1655-2019 ◽

2019 ◽

Vol 11 (4) ◽

pp. 1655-1674 ◽

Cited By ~ 13

Author(s):

Gionata Ghiggi ◽

Vincent Humphrey ◽

Sonia I. Seneviratne ◽

Lukas Gudmundsson

Keyword(s):

Large Scale ◽

Learning Algorithm ◽

Independent Set ◽

Large River ◽

Antecedent Precipitation ◽

Societal Relevance ◽

Atmospheric Teleconnections ◽

Runoff Variability ◽

Monthly Runoff

Abstract. Freshwater resources are of high societal relevance, and understanding their past variability is vital to water management in the context of ongoing climate change. This study introduces a global gridded monthly reconstruction of runoff covering the period from 1902 to 2014. In situ streamflow observations are used to train a machine learning algorithm that predicts monthly runoff rates based on antecedent precipitation and temperature from an atmospheric reanalysis. The accuracy of this reconstruction is assessed with cross-validation and compared with an independent set of discharge observations for large river basins. The presented dataset agrees on average better with the streamflow observations than an ensemble of 13 state-of-the art global hydrological model runoff simulations. We estimate a global long-term mean runoff of 38 452 km3 yr−1 in agreement with previous assessments. The temporal coverage of the reconstruction offers an unprecedented view on large-scale features of runoff variability in regions with limited data coverage, making it an ideal candidate for large-scale hydro-climatic process studies, water resource assessments, and evaluating and refining existing hydrological models. The paper closes with example applications fostering the understanding of global freshwater dynamics, interannual variability, drought propagation and the response of runoff to atmospheric teleconnections. The GRUN dataset is available at https://doi.org/10.6084/m9.figshare.9228176 (Ghiggi et al., 2019).

Download Full-text

Does Tail Label Help for Large-Scale Multi-Label Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/395 ◽

2018 ◽

Cited By ~ 5

Author(s):

Tong Wei ◽

Yu-Feng Li

Keyword(s):

Large Scale ◽

Performance Metrics ◽

Learning Algorithm ◽

Predictive Performance ◽

Low Complexity ◽

Tail Distribution ◽

Prediction Time ◽

Unseen Data ◽

Model Size ◽

Fast Prediction

Large-scale multi-label learning annotates relevant labels for unseen data from a huge number of candidate labels. It is well known that in large-scale multi-label learning, labels exhibit a long tail distribution in which a significant fraction of labels are tail labels. Nonetheless, how tail labels make impact on the performance metrics in large-scale multi-label learning was not explicitly quantified. In this paper, we disclose that whatever labels are randomly missing or misclassified, tail labels impact much less than common labels in terms of commonly used performance metrics (Top-$k$ precision and nDCG@$k$). With the observation above, we develop a low-complexity large-scale multi-label learning algorithm with the goal of facilitating fast prediction and compact models by trimming tail labels adaptively. Experiments clearly verify that both the prediction time and the model size are significantly reduced without sacrificing much predictive performance for state-of-the-art approaches.

Download Full-text