CoSam: An Efficient Collaborative Adaptive Sampler for Recommendation

Jiawei Chen; Chengquan Jiang; Can Wang; Sheng Zhou; Yan Feng; Chun Chen; Martin Ester; Xiangnan He

doi:10.1145/3450289

CoSam: An Efficient Collaborative Adaptive Sampler for Recommendation

ACM Transactions on Information Systems ◽

10.1145/3450289 ◽

2021 ◽

Vol 39 (3) ◽

pp. 1-24

Author(s):

Jiawei Chen ◽

Chengquan Jiang ◽

Can Wang ◽

Sheng Zhou ◽

Yan Feng ◽

...

Keyword(s):

Domain Knowledge ◽

Model Learning ◽

Interaction Information ◽

Information Awareness ◽

Promising Solution ◽

Recommendation Accuracy ◽

Low Efficiency ◽

Real World Datasets ◽

Feedback Data ◽

Uneven Sampling

Sampling strategies have been widely applied in many recommendation systems to accelerate model learning from implicit feedback data. A typical strategy is to draw negative instances with uniform distribution, which, however, will severely affect a model’s convergence, stability, and even recommendation accuracy. A promising solution for this problem is to over-sample the “difficult” (a.k.a. informative) instances that contribute more on training. But this will increase the risk of biasing the model and leading to non-optimal results. Moreover, existing samplers are either heuristic, which require domain knowledge and often fail to capture real “difficult” instances, or rely on a sampler model that suffers from low efficiency. To deal with these problems, we propose CoSam, an efficient and effective collaborative sampling method that consists of (1) a collaborative sampler model that explicitly leverages user-item interaction information in sampling probability and exhibits good properties of normalization, adaption, interaction information awareness, and sampling efficiency, and (2) an integrated sampler-recommender framework, leveraging the sampler model in prediction to offset the bias caused by uneven sampling. Correspondingly, we derive a fast reinforced training algorithm of our framework to boost the sampler performance and sampler-recommender collaboration. Extensive experiments on four real-world datasets demonstrate the superiority of the proposed collaborative sampler model and integrated sampler-recommender framework.

Fast Adaptively Weighted Matrix Factorization for Recommendation with Implicit Feedback

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5751 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3470-3477

Author(s):

Jiawei Chen ◽

Can Wang ◽

Sheng Zhou ◽

Qihao Shi ◽

Jingbang Chen ◽

...

Keyword(s):

Matrix Factorization ◽

Learning Algorithm ◽

Implicit Feedback ◽

Model Learning ◽

Adaptive Weights ◽

Network Function ◽

Fast Learning ◽

Weighted Matrix ◽

Real World Datasets ◽

Feedback Data

Recommendation from implicit feedback is a highly challenging task due to the lack of the reliable observed negative data. A popular and effective approach for implicit recommendation is to treat unobserved data as negative but downweight their confidence. Naturally, how to assign confidence weights and how to handle the large number of the unobserved data are two key problems for implicit recommendation models. However, existing methods either pursuit fast learning by manually assigning simple confidence weights, which lacks flexibility and may create empirical bias in evaluating user's preference; or adaptively infer personalized confidence weights but suffer from low efficiency.To achieve both adaptive weights assignment and efficient model learning, we propose a fast adaptively weighted matrix factorization (FAWMF) based on variational auto-encoder. The personalized data confidence weights are adaptively assigned with a parameterized neural network (function) and the network can be inferred from the data. Further, to support fast and stable learning of FAWMF, a new specific batch-based learning algorithm fBGD has been developed, which trains on all feedback data but its complexity is linear to the number of observed data. Extensive experiments on real-world datasets demonstrate the superiority of the proposed FAWMF and its learning algorithm fBGD.

NeuSE: A Neural Snapshot Ensemble Method for Collaborative Filtering

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3450526 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-20

Author(s):

Dongsheng Li ◽

Haodong Liu ◽

Chao Chen ◽

Yingying Zhao ◽

Stephen M. Chu ◽

...

Keyword(s):

Collaborative Filtering ◽

Optimization Problems ◽

Empirical Studies ◽

Large Datasets ◽

Model Learning ◽

Global Models ◽

Convex Optimization Problems ◽

Memory Network ◽

Real World Datasets ◽

Performance Tradeoff

In collaborative filtering (CF) algorithms, the optimal models are usually learned by globally minimizing the empirical risks averaged over all the observed data. However, the global models are often obtained via a performance tradeoff among users/items, i.e., not all users/items are perfectly fitted by the global models due to the hard non-convex optimization problems in CF algorithms. Ensemble learning can address this issue by learning multiple diverse models but usually suffer from efficiency issue on large datasets or complex algorithms. In this article, we keep the intermediate models obtained during global model learning as the snapshot models, and then adaptively combine the snapshot models for individual user-item pairs using a memory network-based method. Empirical studies on three real-world datasets show that the proposed method can extensively and significantly improve the accuracy (up to 15.9% relatively) when applied to a variety of existing collaborative filtering methods.

Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks for Fake News Detection

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3451215 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-23

Author(s):

Shengsheng Qian ◽

Jun Hu ◽

Quan Fang ◽

Changsheng Xu

Keyword(s):

Social Media ◽

Visual Information ◽

Representation Learning ◽

Fake News ◽

Unified Framework ◽

Model Learning ◽

Convolutional Network ◽

Textual Information ◽

Convolutional Networks ◽

Real World Datasets

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.

Metal Oxide Nanoparticles Against Bacterial Biofilms: Perspectives and Limitations

Microorganisms ◽

10.3390/microorganisms8101545 ◽

2020 ◽

Vol 8 (10) ◽

pp. 1545 ◽

Cited By ~ 1

Author(s):

Liubov Shkodenko ◽

Ilia Kassirov ◽

Elena Koshel

Keyword(s):

Antibiotic Resistance ◽

Comparative Analysis ◽

Metal Oxide ◽

Metal Oxide Nanoparticles ◽

Antimicrobial Properties ◽

Bacterial Biofilms ◽

Oxide Nanoparticles ◽

Promising Solution ◽

Low Efficiency ◽

Al2o3 Nps

At present, there is an urgent need in medicine and industry to develop new approaches to eliminate bacterial biofilms. Considering the low efficiency of classical approaches to biofilm eradication and the growing problem of antibiotic resistance, the introduction of nanomaterials may be a promising solution. Outstanding antimicrobial properties have been demonstrated by nanoparticles (NPs) of metal oxides and their nanocomposites. The review presents a comparative analysis of antibiofilm properties of various metal oxide NPs (primarily, CuO, Fe3O4, TiO2, ZnO, MgO, and Al2O3 NPs) and nanocomposites, as well as mechanisms of their effect on plankton bacteria cells and biofilms. The potential mutagenicity of metal oxide NPs and safety problems of their wide application are also discussed.

SCFont: Structure-Guided Chinese Font Generation via Deep Stacked Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014015 ◽

2019 ◽

Vol 33 ◽

pp. 4015-4022 ◽

Cited By ~ 4

Author(s):

Yue Jiang ◽

Zhouhui Lian ◽

Yingmin Tang ◽

Jianguo Xiao

Keyword(s):

Deep Learning ◽

Domain Knowledge ◽

State Of The Art ◽

Automatic Generation ◽

Chinese Characters ◽

Generation System ◽

Model Learning ◽

High Quality ◽

Large Numbers ◽

Quantitative Assessments

Automatic generation of Chinese fonts that consist of large numbers of glyphs with complicated structures is now still a challenging and ongoing problem in areas of AI and Computer Graphics (CG). Traditional CG-based methods typically rely heavily on manual interventions, while recentlypopularized deep learning-based end-to-end approaches often obtain synthesis results with incorrect structures and/or serious artifacts. To address those problems, this paper proposes a structure-guided Chinese font generation system, SCFont, by using deep stacked networks. The key idea is to integrate the domain knowledge of Chinese characters with deep generative networks to ensure that high-quality glyphs with correct structures can be synthesized. More specifically, we first apply a CNN model to learn how to transfer the writing trajectories with separated strokes in the reference font style into those in the target style. Then, we train another CNN model learning how to recover shape details on the contour for synthesized writing trajectories. Experimental results validate the superiority of the proposed SCFont compared to the state of the art in both visual and quantitative assessments.

Structural, Vibrational, and Electronic Properties of Trigonal Cu2SrSnS4 Photovoltaic Absorber from First-Principles Calculations

Material Science Research India ◽

10.13005/msri.17.special-issue1.03 ◽

2020 ◽

Vol 17 (SpecialIssue1) ◽

pp. 07-12

Author(s):

Sriram Poyyapakkam Ramkumar

Keyword(s):

Electronic Properties ◽

First Principles ◽

Density Functional ◽

Primary Concern ◽

Functional Theory ◽

The Family ◽

Site Disorder ◽

Promising Solution ◽

Low Efficiency ◽

Interesting Alternative

In the search for sustainable alternate absorber materials for photovoltaic applications, the family of chalcogenides provide a promising solution. While the most commonly studied Cu2ZnSnS4 based kesterite solar cells seem to have intrinsic drawbacks such as low-efficiency arising from defects and anti-disorder in the Cu-Zn sites, substituting other elements in the Cu/Zn sites have been considered. In this direction, Cu2(Ba,Sr)SnS4 provide an interesting alternative as they possibly help limit the intrinsic anti-site disorder in the system which is of primary concern with regard to efficiency loses. In this study, we report the structural, vibrational, and electronic properties of trigonal structured Cu2SrSnS4 quarternary system computed from first-principles density functional theory paving way for further characterization and analysis within this class of materials.

Discovering Subsequence Patterns for Next POI Recommendation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/445 ◽

2020 ◽

Cited By ~ 3

Author(s):

Kangzhi Zhao ◽

Yong Zhang ◽

Hongzhi Yin ◽

Jin Wang ◽

Kai Zheng ◽

...

Keyword(s):

Power Law ◽

Domain Knowledge ◽

State Of The Art ◽

Location Based Services ◽

Sequential Patterns ◽

Economic Activities ◽

Point Of Interest ◽

Poi Recommendation ◽

Latent Structures ◽

Real World Datasets

Next Point-of-Interest (POI) recommendation plays an important role in location-based services. State-of-the-art methods learn the POI-level sequential patterns in the user's check-in sequence but ignore the subsequence patterns that often represent the socio-economic activities or coherence of preference of the users. However, it is challenging to integrate the semantic subsequences due to the difficulty to predefine the granularity of the complex but meaningful subsequences. In this paper, we propose Adaptive Sequence Partitioner with Power-law Attention (ASPPA) to automatically identify each semantic subsequence of POIs and discover their sequential patterns. Our model adopts a state-based stacked recurrent neural network to hierarchically learn the latent structures of the user's check-in sequence. We also design a power-law attention mechanism to integrate the domain knowledge in spatial and temporal contexts. Extensive experiments on two real-world datasets demonstrate the effectiveness of our model.

SAND

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467863 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1717-1729

Author(s):

Paul Boniol ◽

John Paparrizos ◽

Themis Palpanas ◽

Michael J. Franklin

Keyword(s):

Anomaly Detection ◽

Domain Knowledge ◽

State Of The Art ◽

Data Distribution ◽

Detection Methods ◽

Current State ◽

Normal Behavior ◽

Real World Datasets ◽

Increasing Demand ◽

Entire Dataset

With the increasing demand for real-time analytics and decision making, anomaly detection methods need to operate over streams of values and handle drifts in data distribution. Unfortunately, existing approaches have severe limitations: they either require prior domain knowledge or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In addition, subsequence anomaly detection methods usually require access to the entire dataset and are not able to learn and detect anomalies in streaming settings. To address these problems, we propose SAND, a novel online method suitable for domain-agnostic anomaly detection. SAND aims to detect anomalies based on their distance to a model that represents normal behavior. SAND relies on a novel steaming methodology to incrementally update such model, which adapts to distribution drifts and omits obsolete data. The experimental results on several real-world datasets demonstrate that SAND correctly identifies single and recurrent anomalies without prior knowledge of the characteristics of these anomalies. SAND outperforms by a large margin the current state-of-the-art algorithms in terms of accuracy while achieving orders of magnitude speedups.

Medical Document Clustering Using Ontology-Based Term Similarity Measures

Strategic Advancements in Utilizing Data Mining and Warehousing Technologies ◽

10.4018/978-1-60566-717-1.ch007 ◽

2011 ◽

pp. 121-132

Author(s):

Zhang Xiaodan ◽

Jing Liping ◽

Hu Xiaohua ◽

Ng Michael ◽

Xia Jiali ◽

...

Keyword(s):

Semantic Similarity ◽

Domain Knowledge ◽

Document Clustering ◽

Similarity Measures ◽

Concept Hierarchy ◽

Term Similarity ◽

Feature Based ◽

Document Vector ◽

Real World Datasets ◽

Medical Document

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this article, we conduct a comparative study on how different term semantic similarity measures including path-based, informationcontent- based and feature-based similarity measure affect document clustering. Term re-weighting of document vector is an important method to integrate domain ontology to clustering process. In detail, the weight of a term is augmented by the weights of its co-occurred concepts. Spherical k-means are used for evaluate document vector re-weighting on two real-world datasets: Disease10 and OHSUMED23. Experimental results on nine different semantic measures have shown that: (1) there is no certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms.

Medical Document Clustering Using Ontology-Based Term Similarity Measures

Medical Informatics ◽

10.4018/978-1-60566-050-9.ch169 ◽

2011 ◽

pp. 2232-2243

Author(s):

Xiaodan Zhang ◽

Liping Jing ◽

Xiaohua Hu ◽

Michael Ng ◽

Jiali Xia ◽

...

Keyword(s):

Semantic Similarity ◽

Domain Knowledge ◽

Document Clustering ◽

Similarity Measures ◽

Concept Hierarchy ◽

Term Similarity ◽

Feature Based ◽

Document Vector ◽

Real World Datasets ◽

Medical Document

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this article, we conduct a comparative study on how different term semantic similarity measures including path-based, information-content- based and feature-based similarity measure affect document clustering. Term re-weighting of document vector is an important method to integrate domain ontology to clustering process. In detail, the weight of a term is augmented by the weights of its co-occurred concepts. Spherical k-means are used for evaluate document vector reweighting on two real-world datasets: Disease10 and OHSUMED23. Experimental results on nine different semantic measures have shown that: (1) there is no certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms.