Unifying Node Labels, Features, and Distances for Deep Network Completion

Collected network data are often incomplete, with both missing nodes and missing edges. Thus, network completion that infers the unobserved part of the network is essential for downstream tasks. Despite the emerging literature related to network recovery, the potential information has not been effectively exploited. In this paper, we propose a novel unified deep graph convolutional network that infers missing edges by leveraging node labels, features, and distances. Specifically, we first construct an estimated network topology for the unobserved part using node labels, then jointly refine the network topology and learn the edge likelihood with node labels, node features and distances. Extensive experiments using several real-world datasets show the superiority of our method compared with the state-of-the-art approaches.

Download Full-text

A Novel Tagging Augmented LDA Model for Clustering

International Journal of Web Services Research ◽

10.4018/ijwsr.2019070104 ◽

2019 ◽

Vol 16 (3) ◽

pp. 59-77

Author(s):

Yi Zhao ◽

Yu Qiao ◽

Keqing He

Keyword(s):

Transfer Learning ◽

Real World ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Positive Influence ◽

The State ◽

Clustering Methods ◽

Automatic Clustering ◽

Real World Datasets ◽

High Representation

Clustering has become an increasingly important task in the analysis of large documents. Clustering aims to organize these documents, and facilitate better search and knowledge extraction. Most existing clustering methods that use user-generated tags only consider their positive influence for improving automatic clustering performance. The authors argue that not all user-generated tags can provide useful information for clustering. In this article, the authors propose a new solution for clustering, named HRT-LDA (High Representation Tags Latent Dirichlet Allocation), which considers the effects of different tags on clustering performance. For this, the authors perform a tag filtering strategy and a tag appending strategy based on transfer learning, Word2vec, TF-IDF and semantic computing. Extensive experiments on real-world datasets demonstrate that HRT-LDA outperforms the state-of-the-art tagging augmented LDA methods for clustering.

Download Full-text

Simandro-plus: On computing similarity of android applications

Computer Science and Information Systems ◽

10.2298/csis210208036h ◽

2021 ◽

pp. 36-36

Author(s):

Masoud Hamedani ◽

Sang-Wook Kim

Keyword(s):

Real World ◽

State Of The Art ◽

Similarity Score ◽

The State ◽

The Other ◽

Android Applications ◽

Similarity Computation ◽

Real World Datasets

In this paper, we propose SimAndro-Plus as an improved variant of the state-of-the-art method, SimAndro, to compute the similarity of Android applications (apps) regarding their functionalities. SimAndro-Plus has two major differences with SimAndro: 1) it exploits two beneficial features to similarity computation, which are totally disregarded by SimAndro; 2) to compute the similarity score of an app-pair based on strings and package name features, SimAndro-Plus considers not only those terms co-appearing in both apps but also considers those terms appearing in one app while missing in the other one. The results of our extensive ex periments with three real-world datasets and a dataset constructed by human experts demonstrate that 1) each of the two aforementioned differences is really effective to achieve better accuracy and 2) SimAndro-Plus outperforms SimAndro in similarity computation by 14% in average.

Download Full-text

GraphER: Token-Centric Entity Resolution with Graph Convolutional Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6330 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8172-8179

Author(s):

Bing Li ◽

Wei Wang ◽

Yifang Sun ◽

Linhan Zhang ◽

Muhammad Asif Ali ◽

...

Keyword(s):

Real World ◽

State Of The Art ◽

Structural Information ◽

Entity Resolution ◽

Coarse Grained ◽

Critical Problem ◽

Convolutional Network ◽

Single Attribute ◽

Er Model ◽

Real World Datasets

Entity resolution (ER) aims to identify entity records that refer to the same real-world entity, which is a critical problem in data cleaning and integration. Most of the existing models are attribute-centric, that is, matching entity pairs by comparing similarities of pre-aligned attributes, which require the schemas of records to be identical and are too coarse-grained to capture subtle key information within a single attribute. In this paper, we propose a novel graph-based ER model GraphER. Our model is token-centric: the final matching results are generated by directly aggregating token-level comparison features, in which both the semantic and structural information has been softly embedded into token embeddings by training an Entity Record Graph Convolutional Network (ER-GCN). To the best of our knowledge, our work is the first effort to do token-centric entity resolution with the help of GCN in entity resolution task. Extensive experiments on two real-world datasets demonstrate that our model stably outperforms state-of-the-art models.

Download Full-text

Intention2Basket: A Neural Intention-driven Approach for Dynamic Next-basket Planning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/323 ◽

2020 ◽

Author(s):

Shoujin Wang ◽

Liang Hu ◽

Yan Wang ◽

Quan Z. Sheng ◽

Mehmet Orgun ◽

...

Keyword(s):

Real World ◽

State Of The Art ◽

Sequence Data ◽

Multiple Choice ◽

Prediction Performance ◽

The State ◽

Psychological Theories ◽

User Intentions ◽

Real World Datasets ◽

User Actions

User purchase behaviours are complex and dynamic, which are usually observed as multiple choice actions across a sequence of shopping baskets. Most of the existing next-basket prediction approaches model user actions as homogeneous sequence data without considering complex and heterogeneous user intentions, impeding deep under-standing of user behaviours from the perspective of human inside drivers and thus reducing the prediction performance. Psychological theories have indicated that user actions are essentially driven by certain underlying intentions (e.g., diet and entertainment). Moreover, different intentions may influence each other while different choices usually have different utilities to accomplish an intention. Inspired by such psychological insights, we formalize the next-basket prediction as an Intention Recognition, Modelling and Accomplishing problem and further design the Intention2Basket (Int2Ba in short) model. In Int2Ba, an Intention Recognizer, a Coupled Intention Chain Net, and a Dynamic Basket Planner are specifically designed to respectively recognize, model and accomplish the heterogeneous intentions behind a sequence of baskets to better plan the next-basket. Extensive experiments on real-world datasets show the superiority of Int2Ba over the state-of-the-art approaches.

Download Full-text

MASTER: across Multiple social networks, integrate Attribute and STructure Embedding for Reconciliation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/537 ◽

2018 ◽

Cited By ~ 3

Author(s):

Sen Su ◽

Li Sun ◽

Zhongbao Zhang ◽

Gen Li ◽

Jielun Qu

Keyword(s):

Social Networks ◽

Real World ◽

State Of The Art ◽

The State ◽

Effective Algorithm ◽

Kkt Points ◽

Real World Datasets ◽

Significant Attention

Recently, reconciling social networks receives significant attention. Most of the existing studies have limitations in the following three aspects: multiplicity, comprehensiveness and robustness. To address these three limitations, we rethink this problem and propose the MASTER framework, i.e., across Multiple social networks, integrate Attribute and STructure Embedding for Reconciliation. In this framework, we first design a novel Constrained Dual Embedding model by simultaneously embedding and reconciling multiple social networks to formulate our problem into a unified optimization. To address this optimization, we then design an effective algorithm called NS-Alternating. We also prove that this algorithm converges to KKT points. Through extensive experiments on real-world datasets, we demonstrate that MASTER outperforms the state-of-the-art approaches.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text

Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5329 ◽

2020 ◽

Vol 34 (01) ◽

pp. 19-26 ◽

Cited By ~ 5

Author(s):

Chong Chen ◽

Min Zhang ◽

Yongfeng Zhang ◽

Weizhi Ma ◽

Yiqun Liu ◽

...

Keyword(s):

Collaborative Filtering ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Heterogeneous Data ◽

Model Parameters ◽

Online Systems ◽

Practical Applications ◽

Real World Datasets ◽

Primary Type

Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.

Download Full-text

Particle Swarm Contour Search Algorithm

Entropy ◽

10.3390/e22040407 ◽

2020 ◽

Vol 22 (4) ◽

pp. 407 ◽

Cited By ~ 1

Author(s):

Dominik Weikert ◽

Sebastian Mai ◽

Sanaz Mostaghim

Keyword(s):

Image Processing ◽

Real World ◽

State Of The Art ◽

Search Algorithm ◽

Particle Swarm ◽

Search Space ◽

Local Information ◽

The State ◽

Complete Knowledge ◽

Real World Applications

In this article, we present a new algorithm called Particle Swarm Contour Search (PSCS)—a Particle Swarm Optimisation inspired algorithm to find object contours in 2D environments. Currently, most contour-finding algorithms are based on image processing and require a complete overview of the search space in which the contour is to be found. However, for real-world applications this would require a complete knowledge about the search space, which may not be always feasible or possible. The proposed algorithm removes this requirement and is only based on the local information of the particles to accurately identify a contour. Particles search for the contour of an object and then traverse alongside using their known information about positions in- and out-side of the object. Our experiments show that the proposed PSCS algorithm can deliver comparable results as the state-of-the-art.

Download Full-text

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Download Full-text

Waste generation prediction under uncertainty in smart cities through deep neuroevolution

Revista Facultad de Ingeniería Universidad de Antioquia ◽

10.17533/udea.redin.20190736 ◽

2019 ◽

pp. 128-138 ◽

Cited By ~ 1

Author(s):

Andrés Camero ◽

Jamal Toutouh ◽

Javier Ferrer ◽

Enrique Alba

Keyword(s):

Real World ◽

State Of The Art ◽

Smart Cities ◽

Uncertain Data ◽

Recurrent Network ◽

The State ◽

Waste Generation ◽

Waste Collection ◽

The Way

The unsustainable development of countries has created a problem due to the unstoppable waste generation. Moreover, waste collection is carried out following a pre-defined route that does not take into account the actual level of the containers collected. Therefore, optimizing the way the waste is collected presents an interesting opportunity. In this study, we tackle the problem of predicting the waste generation ratio in real-world conditions, i.e., under uncertainty. Particularly, we use a deep neuroevolutionary technique to automatically design a recurrent network that captures the filling level of all waste containers in a city at once, and we study the suitability of our proposal when faced to noisy and faulty data. We validate our proposal using a real-world case study, consisting of more than two hundred waste containers located in a city in Spain, and we compare our results to the state-of-the-art. The results show that our approach exceeds all its competitors and that its accuracy in a real-world scenario, i.e., under uncertain data, is good enough for optimizing the waste collection planning.

Download Full-text