scholarly journals Efficient Heuristic Hypothesis Ranking

1999 ◽  
Vol 10 ◽  
pp. 375-397 ◽  
Author(s):  
S. Chien ◽  
A. Stechert ◽  
D. Mutz

This paper considers the problem of learning the ranking of a set of stochastic alternatives based upon incomplete information (i.e., a limited number of samples). We describe a system that, at each decision cycle, outputs either a complete ordering on the hypotheses or decides to gather additional information (i.e., observations) at some cost. The ranking problem is a generalization of the previously studied hypothesis selection problem - in selection, an algorithm must select the single best hypothesis, while in ranking, an algorithm must order all the hypotheses. The central problem we address is achieving the desired ranking quality while minimizing the cost of acquiring additional samples. We describe two algorithms for hypothesis ranking and their application for the probably approximately correct (PAC) and expected loss (EL) learning criteria. Empirical results are provided to demonstrate the effectiveness of these ranking procedures on both synthetic and real-world datasets.

Author(s):  
Chenwei Cai ◽  
Ruining He ◽  
Julian McAuley

Dealing with sparse, long-tailed datasets, and cold-start problems is always a challenge for recommender systems. These issues can partly be dealt with by making predictions not in isolation, but by leveraging information from related events; such information could include signals from social relationships or from the sequence of recent activities. Both types of additional information can be used to improve the performance of state-of-the-art matrix factorization-based techniques. In this paper, we propose new methods to combine both social and sequential information simultaneously, in order to further improve recommendation performance. We show these techniques to be particularly effective when dealing with sparsity and cold-start issues in several large, real-world datasets.


Author(s):  
Ali Seman ◽  
Azizian Mohd Sapawi

The k-AMH algorithm has been proven efficient in clustering categorical datasets. It can also be used to cluster numerical values with minimum modification to the original algorithm. In this paper, we present two algorithms that extend the k-AMH algorithm to the clustering of numerical values. The original k-AMH algorithm for categorical values uses a simple matching dissimilarity measure, but for numerical values it uses Euclidean distance. The first extension to the k-AMH algorithm, denoted k-AMH Numeric I, enables it to cluster numerical values in a fashion similar to k-AMH for categorical data. The second extension, k-AMH Numeric II, adopts the cost function of the fuzzy k-Means algorithm together with Euclidean distance, and has demonstrated performance similar to that of k-AMH Numeric I. The clustering performance of the two algorithms was evaluated on six real-world datasets against a benchmark algorithm, the fuzzy k-Means algorithm. The results obtained indicate that the two algorithms are as efficient as the fuzzy k-Means algorithm when clustering numerical values. Further, on an ANOVA test, k-AMH Numeric I obtained the highest accuracy score of 0.69 for the six datasets combined with p-value less than 0.01, indicating a 95% confidence level. The experimental results prove that the k-AMH Numeric I and k-AMH Numeric II algorithms can be effectively used for numerical clustering. The significance of this study lies in that the k-AMH numeric algorithms have been demonstrated as potential solutions for clustering numerical objects.  


2022 ◽  
Vol 16 (2) ◽  
pp. 1-34
Author(s):  
Arpita Biswas ◽  
Gourab K. Patro ◽  
Niloy Ganguly ◽  
Krishna P. Gummadi ◽  
Abhijnan Chakraborty

Many online platforms today (such as Amazon, Netflix, Spotify, LinkedIn, and AirBnB) can be thought of as two-sided markets with producers and customers of goods and services. Traditionally, recommendation services in these platforms have focused on maximizing customer satisfaction by tailoring the results according to the personalized preferences of individual customers. However, our investigation reinforces the fact that such customer-centric design of these services may lead to unfair distribution of exposure to the producers, which may adversely impact their well-being. However, a pure producer-centric design might become unfair to the customers. As more and more people are depending on such platforms to earn a living, it is important to ensure fairness to both producers and customers. In this work, by mapping a fair personalized recommendation problem to a constrained version of the problem of fairly allocating indivisible goods, we propose to provide fairness guarantees for both sides. Formally, our proposed FairRec algorithm guarantees Maxi-Min Share of exposure for the producers, and Envy-Free up to One Item fairness for the customers. Extensive evaluations over multiple real-world datasets show the effectiveness of FairRec in ensuring two-sided fairness while incurring a marginal loss in overall recommendation quality. Finally, we present a modification of FairRec (named as FairRecPlus ) that at the cost of additional computation time, improves the recommendation performance for the customers, while maintaining the same fairness guarantees.


Author(s):  
Rui Liu ◽  
Tianyi Wu ◽  
Barzan Mozafari

There has been substantial research on sub-linear time approximate algorithms for Maximum Inner Product Search (MIPS). To achieve fast query time, state-of-the-art techniques require significant preprocessing, which can be a burden when the number of subsequent queries is not sufficiently large to amortize the cost. Furthermore, existing methods do not have the ability to directly control the suboptimality of their approximate results with theoretical guarantees. In this paper, we propose the first approximate algorithm for MIPS that does not require any preprocessing, and allows users to control and bound the suboptimality of the results. We cast MIPS as a Best Arm Identification problem, and introduce a new bandit setting that can fully exploit the special structure of MIPS. Our approach outperforms state-of-the-art methods on both synthetic and real-world datasets.


Author(s):  
Marcus Shaker ◽  
Edmond S. Chan ◽  
Jennifer LP. Protudjer ◽  
Lianne Soller ◽  
Elissa M. Abrams ◽  
...  

2021 ◽  
Vol 21 (3) ◽  
pp. 1-17
Author(s):  
Wu Chen ◽  
Yong Yu ◽  
Keke Gai ◽  
Jiamou Liu ◽  
Kim-Kwang Raymond Choo

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).


2021 ◽  
Vol 11 (11) ◽  
pp. 5043
Author(s):  
Xi Chen ◽  
Bo Kang ◽  
Jefrey Lijffijt ◽  
Tijl De Bie

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.


Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.


2021 ◽  
Vol 17 (4) ◽  
pp. 1-30
Author(s):  
Qiben Yan ◽  
Jianzhi Lou ◽  
Mehmet C. Vuran ◽  
Suat Irmak

Precision agriculture has become a promising paradigm to transform modern agriculture. The recent revolution in big data and Internet-of-Things (IoT) provides unprecedented benefits including optimizing yield, minimizing environmental impact, and reducing cost. However, the mass collection of farm data in IoT applications raises serious concerns about potential privacy leakage that may harm the farmers’ welfare. In this work, we propose a novel scalable and private geo-distance evaluation system, called SPRIDE, to allow application servers to provide geographic-based services by computing the distances among sensors and farms privately. The servers determine the distances without learning any additional information about their locations. The key idea of SPRIDE is to perform efficient distance measurement and distance comparison on encrypted locations over a sphere by leveraging a homomorphic cryptosystem. To serve a large user base, we further propose SPRIDE+ with novel and practical performance enhancements based on pre-computation of cryptographic elements. Through extensive experiments using real-world datasets, we show SPRIDE+ achieves private distance evaluation on a large network of farms, attaining 3+ times runtime performance improvement over existing techniques. We further show SPRIDE+ can run on resource-constrained mobile devices, which offers a practical solution for privacy-preserving precision agriculture IoT applications.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 680
Author(s):  
Hanyang Lin ◽  
Yongzhao Zhan ◽  
Zizheng Zhao ◽  
Yuzhong Chen ◽  
Chen Dong

There is a wealth of information in real-world social networks. In addition to the topology information, the vertices or edges of a social network often have attributes, with many of the overlapping vertices belonging to several communities simultaneously. It is challenging to fully utilize the additional attribute information to detect overlapping communities. In this paper, we first propose an overlapping community detection algorithm based on an augmented attribute graph. An improved weight adjustment strategy for attributes is embedded in the algorithm to help detect overlapping communities more accurately. Second, we enhance the algorithm to automatically determine the number of communities by a node-density-based fuzzy k-medoids process. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed algorithms can effectively detect overlapping communities with fewer parameters compared to the baseline methods.


Sign in / Sign up

Export Citation Format

Share Document