Efficient Heuristic Hypothesis Ranking

This paper considers the problem of learning the ranking of a set of stochastic alternatives based upon incomplete information (i.e., a limited number of samples). We describe a system that, at each decision cycle, outputs either a complete ordering on the hypotheses or decides to gather additional information (i.e., observations) at some cost. The ranking problem is a generalization of the previously studied hypothesis selection problem - in selection, an algorithm must select the single best hypothesis, while in ranking, an algorithm must order all the hypotheses. The central problem we address is achieving the desired ranking quality while minimizing the cost of acquiring additional samples. We describe two algorithms for hypothesis ranking and their application for the probably approximately correct (PAC) and expected loss (EL) learning criteria. Empirical results are provided to demonstrate the effectiveness of these ranking procedures on both synthetic and real-world datasets.

Download Full-text

SPMC: Socially-Aware Personalized Markov Chains for Sparse Sequential Recommendation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/204 ◽

2017 ◽

Cited By ~ 13

Author(s):

Chenwei Cai ◽

Ruining He ◽

Julian McAuley

Keyword(s):

Markov Chains ◽

Social Relationships ◽

Real World ◽

Matrix Factorization ◽

State Of The Art ◽

Cold Start ◽

Additional Information ◽

New Methods ◽

Real World Datasets ◽

Sequential Information

Dealing with sparse, long-tailed datasets, and cold-start problems is always a challenge for recommender systems. These issues can partly be dealt with by making predictions not in isolation, but by leveraging information from related events; such information could include signals from social relationships or from the sequence of recent activities. Both types of additional information can be used to improve the performance of state-of-the-art matrix factorization-based techniques. In this paper, we propose new methods to combine both social and sequential information simultaneously, in order to further improve recommendation performance. We show these techniques to be particularly effective when dealing with sparsity and cold-start issues in several large, real-world datasets.

Download Full-text

EXTENSIONS TO THE K-AMH ALGORITHM FOR NUMERICAL CLUSTERING

Journal of Information and Communication Technology ◽

10.32890/jict2018.17.4.8272 ◽

2018 ◽

Author(s):

Ali Seman ◽

Azizian Mohd Sapawi

Keyword(s):

Real World ◽

Confidence Level ◽

Categorical Data ◽

Euclidean Distance ◽

P Value ◽

Accuracy Score ◽

Original Algorithm ◽

Real World Datasets ◽

The Cost ◽

Numerical Clustering

The k-AMH algorithm has been proven efficient in clustering categorical datasets. It can also be used to cluster numerical values with minimum modification to the original algorithm. In this paper, we present two algorithms that extend the k-AMH algorithm to the clustering of numerical values. The original k-AMH algorithm for categorical values uses a simple matching dissimilarity measure, but for numerical values it uses Euclidean distance. The first extension to the k-AMH algorithm, denoted k-AMH Numeric I, enables it to cluster numerical values in a fashion similar to k-AMH for categorical data. The second extension, k-AMH Numeric II, adopts the cost function of the fuzzy k-Means algorithm together with Euclidean distance, and has demonstrated performance similar to that of k-AMH Numeric I. The clustering performance of the two algorithms was evaluated on six real-world datasets against a benchmark algorithm, the fuzzy k-Means algorithm. The results obtained indicate that the two algorithms are as efficient as the fuzzy k-Means algorithm when clustering numerical values. Further, on an ANOVA test, k-AMH Numeric I obtained the highest accuracy score of 0.69 for the six datasets combined with p-value less than 0.01, indicating a 95% confidence level. The experimental results prove that the k-AMH Numeric I and k-AMH Numeric II algorithms can be effectively used for numerical clustering. The significance of this study lies in that the k-AMH numeric algorithms have been demonstrated as potential solutions for clustering numerical objects.

Download Full-text

Toward Fair Recommendation in Two-sided Platforms

ACM Transactions on the Web ◽

10.1145/3503624 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-34

Author(s):

Arpita Biswas ◽

Gourab K. Patro ◽

Niloy Ganguly ◽

Krishna P. Gummadi ◽

Abhijnan Chakraborty

Keyword(s):

Customer Satisfaction ◽

Real World ◽

Computation Time ◽

Well Being ◽

Personalized Recommendation ◽

Indivisible Goods ◽

Goods And Services ◽

Online Platforms ◽

Real World Datasets ◽

The Cost

Many online platforms today (such as Amazon, Netflix, Spotify, LinkedIn, and AirBnB) can be thought of as two-sided markets with producers and customers of goods and services. Traditionally, recommendation services in these platforms have focused on maximizing customer satisfaction by tailoring the results according to the personalized preferences of individual customers. However, our investigation reinforces the fact that such customer-centric design of these services may lead to unfair distribution of exposure to the producers, which may adversely impact their well-being. However, a pure producer-centric design might become unfair to the customers. As more and more people are depending on such platforms to earn a living, it is important to ensure fairness to both producers and customers. In this work, by mapping a fair personalized recommendation problem to a constrained version of the problem of fairly allocating indivisible goods, we propose to provide fairness guarantees for both sides. Formally, our proposed FairRec algorithm guarantees Maxi-Min Share of exposure for the producers, and Envy-Free up to One Item fairness for the customers. Extensive evaluations over multiple real-world datasets show the effectiveness of FairRec in ensuring two-sided fairness while incurring a marginal loss in overall recommendation quality. Finally, we present a modification of FairRec (named as FairRecPlus ) that at the cost of additional computation time, improves the recommendation performance for the customers, while maintaining the same fairness guarantees.

Download Full-text

A Bandit Approach to Maximum Inner Product Search

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014376 ◽

2019 ◽

Vol 33 ◽

pp. 4376-4383

Author(s):

Rui Liu ◽

Tianyi Wu ◽

Barzan Mozafari

Keyword(s):

Real World ◽

State Of The Art ◽

Linear Time ◽

Identification Problem ◽

Approximate Algorithm ◽

Inner Product ◽

Approximate Algorithms ◽

Product Search ◽

Real World Datasets ◽

The Cost

There has been substantial research on sub-linear time approximate algorithms for Maximum Inner Product Search (MIPS). To achieve fast query time, state-of-the-art techniques require significant preprocessing, which can be a burden when the number of subsequent queries is not sufficiently large to amortize the cost. Furthermore, existing methods do not have the ability to directly control the suboptimality of their approximate results with theoretical guarantees. In this paper, we propose the first approximate algorithm for MIPS that does not require any preprocessing, and allows users to control and bound the suboptimality of the results. We cast MIPS as a Best Arm Identification problem, and introduce a new bandit setting that can fully exploit the special structure of MIPS. Our approach outperforms state-of-the-art methods on both synthetic and real-world datasets.

Download Full-text

The Cost-Effectiveness of Pre-school Peanut Oral Immunotherapy in the Real World Setting

The Journal of Allergy and Clinical Immunology In Practice ◽

10.1016/j.jaip.2021.02.058 ◽

2021 ◽

Author(s):

Marcus Shaker ◽

Edmond S. Chan ◽

Jennifer LP. Protudjer ◽

Lianne Soller ◽

Elissa M. Abrams ◽

...

Keyword(s):

Cost Effectiveness ◽

Real World ◽

Oral Immunotherapy ◽

The Real ◽

Real World Setting ◽

The Cost

Download Full-text

Time-Efficient Ensemble Learning with Sample Exchange for Edge Computing

ACM Transactions on Internet Technology ◽

10.1145/3409265 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Wu Chen ◽

Yong Yu ◽

Keke Gai ◽

Jiamou Liu ◽

Kim-Kwang Raymond Choo

Keyword(s):

Ensemble Learning ◽

Real World ◽

Interaction Mechanism ◽

Training Model ◽

Edge Computing ◽

Learning Techniques ◽

Multi Agent ◽

Real World Datasets ◽

Entire Dataset ◽

Exchange Data

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).

Download Full-text

ALPINE: Active Link Prediction Using Network Embedding

Applied Sciences ◽

10.3390/app11115043 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5043

Author(s):

Xi Chen ◽

Bo Kang ◽

Jefrey Lijffijt ◽

Tijl De Bie

Keyword(s):

Active Learning ◽

Protein Interactions ◽

Link Prediction ◽

Prediction Accuracy ◽

Real Data ◽

Network Embedding ◽

Protein Protein Interactions ◽

Additional Information ◽

The Cost ◽

Active Link

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Scalable Privacy-preserving Geo-distance Evaluation for Precision Agriculture IoT Systems

ACM Transactions on Sensor Networks ◽

10.1145/3463575 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-30

Author(s):

Qiben Yan ◽

Jianzhi Lou ◽

Mehmet C. Vuran ◽

Suat Irmak

Keyword(s):

Precision Agriculture ◽

Evaluation System ◽

Privacy Preserving ◽

Large Network ◽

Additional Information ◽

Modern Agriculture ◽

Iot Applications ◽

Privacy Leakage ◽

Practical Performance ◽

Real World Datasets

Precision agriculture has become a promising paradigm to transform modern agriculture. The recent revolution in big data and Internet-of-Things (IoT) provides unprecedented benefits including optimizing yield, minimizing environmental impact, and reducing cost. However, the mass collection of farm data in IoT applications raises serious concerns about potential privacy leakage that may harm the farmers’ welfare. In this work, we propose a novel scalable and private geo-distance evaluation system, called SPRIDE, to allow application servers to provide geographic-based services by computing the distances among sensors and farms privately. The servers determine the distances without learning any additional information about their locations. The key idea of SPRIDE is to perform efficient distance measurement and distance comparison on encrypted locations over a sphere by leveraging a homomorphic cryptosystem. To serve a large user base, we further propose SPRIDE+ with novel and practical performance enhancements based on pre-computation of cryptographic elements. Through extensive experiments using real-world datasets, we show SPRIDE+ achieves private distance evaluation on a large network of farms, attaining 3+ times runtime performance improvement over existing techniques. We further show SPRIDE+ can run on resource-constrained mobile devices, which offers a practical solution for privacy-preserving precision agriculture IoT applications.

Download Full-text

Overlapping Community Detection Based on Attribute Augmented Graph

Entropy ◽

10.3390/e23060680 ◽

2021 ◽

Vol 23 (6) ◽

pp. 680

Author(s):

Hanyang Lin ◽

Yongzhao Zhan ◽

Zizheng Zhao ◽

Yuzhong Chen ◽

Chen Dong

Keyword(s):

Community Detection ◽

Real World ◽

Detection Algorithm ◽

Overlapping Community Detection ◽

Overlapping Communities ◽

Adjustment Strategy ◽

Topology Information ◽

Overlapping Community ◽

Real World Datasets ◽

Community Detection Algorithm

There is a wealth of information in real-world social networks. In addition to the topology information, the vertices or edges of a social network often have attributes, with many of the overlapping vertices belonging to several communities simultaneously. It is challenging to fully utilize the additional attribute information to detect overlapping communities. In this paper, we first propose an overlapping community detection algorithm based on an augmented attribute graph. An improved weight adjustment strategy for attributes is embedded in the algorithm to help detect overlapping communities more accurately. Second, we enhance the algorithm to automatically determine the number of communities by a node-density-based fuzzy k-medoids process. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed algorithms can effectively detect overlapping communities with fewer parameters compared to the baseline methods.

Download Full-text