scholarly journals iBGP: A Bipartite Graph Propagation Approach for Mobile Advertising Fraud Detection

2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Jinlong Hu ◽  
Junjie Liang ◽  
Shoubin Dong

Online mobile advertising plays a vital financial role in supporting free mobile apps, but detecting malicious apps publishers who generate fraudulent actions on the advertisements hosted on their apps is difficult, since fraudulent traffic often mimics behaviors of legitimate users and evolves rapidly. In this paper, we propose a novel bipartite graph-based propagation approach, iBGP, for mobile apps advertising fraud detection in large advertising system. We exploit the characteristics of mobile advertising user’s behavior and identify two persistent patterns: power law distribution and pertinence and propose an automatic initial score learning algorithm to formulate both concepts to learn the initial scores of non-seed nodes. We propose a weighted graph propagation algorithm to propagate the scores of all nodes in the user-app bipartite graphs until convergence. To extend our approach for large-scale settings, we decompose the objective function of the initial score learning model into separate one-dimensional problems and parallelize the whole approach on an Apache Spark cluster. iBGP was applied on a large synthetic dataset and a large real-world mobile advertising dataset; experiment results demonstrate that iBGP significantly outperforms other popular graph-based propagation methods.

2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Jinlong Hu ◽  
Tenghui Li ◽  
Yi Zhuang ◽  
Song Huang ◽  
Shoubin Dong

Online mobile advertising plays a vital role in the mobile app ecosystem. The mobile advertising frauds caused by fraudulent clicks or other actions on advertisements are considered one of the most critical issues in mobile advertising systems. To combat the evolving mobile advertising frauds, machine learning methods have been successfully applied to identify advertising frauds in tabular data, distinguishing suspicious advertising fraud operation from normal one. However, such approaches may suffer from labor-intensive feature engineering and robustness of the detection algorithms, since the online advertising big data and complex fraudulent advertising actions generated by malicious codes, botnets, and click-firms are constantly changing. In this paper, we propose a novel weighted heterogeneous graph embedding and deep learning-based fraud detection approach, namely, GFD, to identify fraudulent apps for mobile advertising. In the proposed GFD approach, (i) we construct a weighted heterogeneous graph to represent behavior patterns between users, mobile apps, and mobile ads and design a weighted metapath to vector algorithm to learn node representations (graph-based features) from the graph; (ii) we use a time window based statistical analysis method to extract intrinsic features (attribute-based features) from the tabular sample data; (iii) we propose a hybrid neural network to fuse graph-based features and attribute-based features for classifying the fraudulent apps from normal apps. The GFD approach was applied on a large real-world mobile advertising dataset, and experiment results demonstrate that the approach significantly outperforms well-known learning methods.


Author(s):  
Luis A Leiva ◽  
Asutosh Hota ◽  
Antti Oulasvirta

Abstract Designers are increasingly using online resources for inspiration. How to best support design exploration without compromising creativity? We introduce and study Design Maps, a class of point-cloud visualizations that makes large user interface datasets explorable. Design Maps are computed using dimensionality reduction and clustering techniques, which we analyze thoroughly in this paper. We present concepts for integrating Design Maps into design tools, including interactive visualization, local neighborhood exploration and functionality to integrate existing solutions to the design at hand. These concepts were implemented in a wireframing tool for mobile apps, which was evaluated with actual designers performing realistic tasks. Overall, designers find Design Maps supporting their creativity (avg. CSI score of 74/100) and indicate that the maps producing consistent whitespacing within cloud points are the most informative ones.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-28
Author(s):  
Xueyan Liu ◽  
Bo Yang ◽  
Hechang Chen ◽  
Katarzyna Musial ◽  
Hongxu Chen ◽  
...  

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1


2018 ◽  
Vol 2018 ◽  
pp. 1-22 ◽  
Author(s):  
Shuang Zhao ◽  
Xiapu Luo ◽  
Xiaobo Ma ◽  
Bo Bai ◽  
Yankang Zhao ◽  
...  

Proximity-based apps have been changing the way people interact with each other in the physical world. To help people extend their social networks, proximity-based nearby-stranger (NS) apps that encourage people to make friends with nearby strangers have gained popularity recently. As another typical type of proximity-based apps, some ridesharing (RS) apps allowing drivers to search nearby passengers and get their ridesharing requests also become popular due to their contribution to economy and emission reduction. In this paper, we concentrate on the location privacy of proximity-based mobile apps. By analyzing the communication mechanism, we find that many apps of this type are vulnerable to large-scale location spoofing attack (LLSA). We accordingly propose three approaches to performing LLSA. To evaluate the threat of LLSA posed to proximity-based mobile apps, we perform real-world case studies against an NS app named Weibo and an RS app called Didi. The results show that our approaches can effectively and automatically collect a huge volume of users’ locations or travel records, thereby demonstrating the severity of LLSA. We apply the LLSA approaches against nine popular proximity-based apps with millions of installations to evaluate the defense strength. We finally suggest possible countermeasures for the proposed attacks.


2021 ◽  
Vol 294 ◽  
pp. 01002
Author(s):  
Xiaoyan Xiang ◽  
Yao Sun ◽  
Xiaofei Deng

Solar energy in nature is irregular, so photovoltaic (PV) power performance is intermittent, and highly dependent on solar radiation, temperature and other meteorological parameters. Accurately predicting solar power to ensure the economic operation of micro-grids (MG) and smart grids is an important challenge to improve the large-scale application of PV to traditional power systems. In this paper, a hybrid machine learning algorithm is proposed to predict solar power accurately, and Persistence Extreme Learning Machine(P-ELM) algorithm is used to train the system. The input parameters are the temperature, sunshine and solar power output at the time of i, and the output parameters are the temperature, sunshine and solar power output at the time i+1. The proposed method can realize the prediction of solar power output 20 minutes in advance. Mean absolute error (MAE) and root-mean-square error (RMSE) are used to characterize the performance of P-ELM algorithm, and compared with ELM algorithm. The results show that the accuracy of P-ELM algorithm is better in short-term prediction, and P-ELM algorithm is very suitable for real-time solar energy prediction accuracy and reliability.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Bangtong Huang ◽  
Hongquan Zhang ◽  
Zihong Chen ◽  
Lingling Li ◽  
Lihua Shi

Deep learning algorithms are facing the limitation in virtual reality application due to the cost of memory, computation, and real-time computation problem. Models with rigorous performance might suffer from enormous parameters and large-scale structure, and it would be hard to replant them onto embedded devices. In this paper, with the inspiration of GhostNet, we proposed an efficient structure ShuffleGhost to make use of the redundancy in feature maps to alleviate the cost of computations, as well as tackling some drawbacks of GhostNet. Since GhostNet suffers from high computation of convolution in Ghost module and shortcut, the restriction of downsampling would make it more difficult to apply Ghost module and Ghost bottleneck to other backbone. This paper proposes three new kinds of ShuffleGhost structure to tackle the drawbacks of GhostNet. The ShuffleGhost module and ShuffleGhost bottlenecks are utilized by the shuffle layer and group convolution from ShuffleNet, and they are designed to redistribute the feature maps concatenated from Ghost Feature Map and Primary Feature Map. Besides, they eliminate the gap of them and extract the features. Then, SENet layer is adopted to reduce the computation cost of group convolution, as well as evaluating the importance of the feature maps which concatenated from Ghost Feature Maps and Primary Feature Maps and giving proper weights for the feature maps. This paper conducted some experiments and proved that the ShuffleGhostV3 has smaller trainable parameters and FLOPs with the ensurance of accuracy. And with proper design, it could be more efficient in both GPU and CPU side.


2019 ◽  
Vol 11 (4) ◽  
pp. 1655-1674 ◽  
Author(s):  
Gionata Ghiggi ◽  
Vincent Humphrey ◽  
Sonia I. Seneviratne ◽  
Lukas Gudmundsson

Abstract. Freshwater resources are of high societal relevance, and understanding their past variability is vital to water management in the context of ongoing climate change. This study introduces a global gridded monthly reconstruction of runoff covering the period from 1902 to 2014. In situ streamflow observations are used to train a machine learning algorithm that predicts monthly runoff rates based on antecedent precipitation and temperature from an atmospheric reanalysis. The accuracy of this reconstruction is assessed with cross-validation and compared with an independent set of discharge observations for large river basins. The presented dataset agrees on average better with the streamflow observations than an ensemble of 13 state-of-the art global hydrological model runoff simulations. We estimate a global long-term mean runoff of 38 452 km3 yr−1 in agreement with previous assessments. The temporal coverage of the reconstruction offers an unprecedented view on large-scale features of runoff variability in regions with limited data coverage, making it an ideal candidate for large-scale hydro-climatic process studies, water resource assessments, and evaluating and refining existing hydrological models. The paper closes with example applications fostering the understanding of global freshwater dynamics, interannual variability, drought propagation and the response of runoff to atmospheric teleconnections. The GRUN dataset is available at https://doi.org/10.6084/m9.figshare.9228176 (Ghiggi et al., 2019).


Author(s):  
Tong Wei ◽  
Yu-Feng Li

Large-scale multi-label learning annotates relevant labels for unseen data from a huge number of candidate labels. It is well known that in large-scale multi-label learning, labels exhibit a long tail distribution in which a significant fraction of labels are tail labels. Nonetheless, how tail labels make impact on the performance metrics in large-scale multi-label learning was not explicitly quantified. In this paper, we disclose that whatever labels are randomly missing or misclassified, tail labels impact much less than common labels in terms of commonly used performance metrics (Top-$k$ precision and nDCG@$k$). With the observation above, we develop a low-complexity large-scale multi-label learning algorithm with the goal of facilitating fast prediction and compact models by trimming tail labels adaptively. Experiments clearly verify that both the prediction time and the model size are significantly reduced without sacrificing much predictive performance for state-of-the-art approaches.


Sign in / Sign up

Export Citation Format

Share Document