Review Summary Generation in Online Systems: Frameworks for Supervised and Unsupervised Scenarios

In online systems, including e-commerce platforms, many users resort to the reviews or comments generated by previous consumers for decision making, while their time is limited to deal with many reviews. Therefore, a review summary, which contains all important features in user-generated reviews, is expected. In this article, we study “how to generate a comprehensive review summary from a large number of user-generated reviews.” This can be implemented by text summarization, which mainly has two types of extractive and abstractive approaches. Both of these approaches can deal with both supervised and unsupervised scenarios, but the former may generate redundant and incoherent summaries, while the latter can avoid redundancy but usually can only deal with short sequences. Moreover, both approaches may neglect the sentiment information. To address the above issues, we propose comprehensive Review Summary Generation frameworks to deal with the supervised and unsupervised scenarios. We design two different preprocess models of re-ranking and selecting to identify the important sentences while keeping users’ sentiment in the original reviews. These sentences can be further used to generate review summaries with text summarization methods. Experimental results in seven real-world datasets (Idebate, Rotten Tomatoes Amazon, Yelp, and three unlabelled product review datasets in Amazon) demonstrate that our work performs well in review summary generation. Moreover, the re-ranking and selecting models show different characteristics.

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Adaptive Double-Exploration Tradeoff for Outlier Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6164 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6837-6844

Author(s):

Xiaojin Zhang ◽

Honglei Zhuang ◽

Shengyu Zhang ◽

Yuan Zhou

Keyword(s):

Confidence Interval ◽

Outlier Detection ◽

Real World ◽

Efficient Algorithm ◽

Experimental Results ◽

Sample Complexity ◽

Bandit Problem ◽

Real World Datasets ◽

Synthetic Datasets ◽

The Individual

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

Download Full-text

Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5329 ◽

2020 ◽

Vol 34 (01) ◽

pp. 19-26 ◽

Cited By ~ 5

Author(s):

Chong Chen ◽

Min Zhang ◽

Yongfeng Zhang ◽

Weizhi Ma ◽

Yiqun Liu ◽

...

Keyword(s):

Collaborative Filtering ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Heterogeneous Data ◽

Model Parameters ◽

Online Systems ◽

Practical Applications ◽

Real World Datasets ◽

Primary Type

Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.

Download Full-text

PROVIDING TIMELY UPDATED SEQUENTIAL PATTERNS IN DECISION MAKING

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010004147 ◽

2010 ◽

Vol 09 (06) ◽

pp. 873-888 ◽

Cited By ~ 4

Author(s):

TZUNG-PEI HONG ◽

CHING-YAO WANG ◽

CHUN-WEI LIN

Keyword(s):

Decision Making ◽

Real Time ◽

Real World ◽

Experimental Results ◽

Sequential Patterns ◽

The Past ◽

Large Databases ◽

Real World Applications ◽

Critical Task ◽

Maintenance Process

Mining knowledge from large databases has become a critical task for organizations. Managers commonly use the obtained sequential patterns to make decisions. In the past, databases were usually assumed to be static. In real-world applications, however, transactions may be updated. In this paper, a maintenance algorithm for rapidly updating sequential patterns for real-time decision making is proposed. The proposed algorithm utilizes previously discovered large sequences in the maintenance process, thus greatly reducing the number of database rescans and improving performance. Experimental results verify the performance of the proposed approach. The proposed algorithm provides real-time knowledge that can be used for decision making.

Download Full-text

SOCIAL INTEREST FOR USER SELECTING ITEMS IN RECOMMENDER SYSTEMS

International Journal of Modern Physics C ◽

10.1142/s0129183113500228 ◽

2013 ◽

Vol 24 (04) ◽

pp. 1350022 ◽

Cited By ~ 7

Author(s):

DA-CHENG NIE ◽

MING-JING DING ◽

YAN FU ◽

JUN-LIN ZHOU ◽

ZI-KE ZHANG

Keyword(s):

Recommender Systems ◽

Real World ◽

Social Interest ◽

Experimental Results ◽

Simple Method ◽

The Social ◽

Social Interests ◽

Similarity Computation ◽

Real World Datasets

Recommender systems have developed rapidly and successfully. The system aims to help users find relevant items from a potentially overwhelming set of choices. However, most of the existing recommender algorithms focused on the traditional user-item similarity computation, other than incorporating the social interests into the recommender systems. As we know, each user has their own preference field, they may influence their friends' preference in their expert field when considering the social interest on their friends' item collecting. In order to model this social interest, in this paper, we proposed a simple method to compute users' social interest on the specific items in the recommender systems, and then integrate this social interest with similarity preference. The experimental results on two real-world datasets Epinions and Friendfeed show that this method can significantly improve not only the algorithmic precision-accuracy but also the diversity-accuracy.

Download Full-text

The Finding and Dynamic Detection of Opinion Leaders in Social Network

Mathematical Problems in Engineering ◽

10.1155/2014/328407 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Baocheng Huang ◽

Guang Yu ◽

Hamid Reza Karimi

Keyword(s):

Social Network ◽

Real World ◽

Structural Characteristics ◽

Detection Algorithm ◽

Experimental Results ◽

Opinion Leaders ◽

Data Sources ◽

The Real ◽

Data Source ◽

Different Characteristics

It is valuable for the real world to find the opinion leaders. Because different data sources usually have different characteristics, there does not exist a standard algorithm to find and detect the opinion leaders in different data sources. Every data source has its own structural characteristics, and also has its own detection algorithm to find the opinion leaders. Experimental results show the opinion leaders and theirs characteristics can be found among the comments from the Weibo social network of China, which is like Facebook or Twitter in USA.

Download Full-text

Boosting medical diagnostics by pooling independent judgments

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1601827113 ◽

2016 ◽

Vol 113 (31) ◽

pp. 8777-8782 ◽

Cited By ~ 50

Author(s):

Ralf H. J. M. Kurvers ◽

Stefan M. Herzog ◽

Ralph Hertwig ◽

Jens Krause ◽

Patricia A. Carney ◽

...

Keyword(s):

Decision Making ◽

Diagnostic Accuracy ◽

Real World ◽

Collective Intelligence ◽

Medical Diagnostics ◽

Wide Range ◽

Key Factor ◽

Systematic Effects ◽

Real World Datasets ◽

Skin Cancer Detection

Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors’ diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches.

Download Full-text

Self-weighted Multiview Clustering with Multiple Graphs

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/357 ◽

2017 ◽

Cited By ~ 44

Author(s):

Feiping Nie ◽

Jing Li ◽

Xuelong Li

Keyword(s):

Real World ◽

Spectral Clustering ◽

Experimental Results ◽

Clustering Method ◽

Elegant Method ◽

Multiview Learning ◽

Cluster Label ◽

Real World Datasets ◽

Synthetic Datasets ◽

Multiview Clustering

In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally self-weighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as $K$-means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graph-based multiview clustering approaches on four real-world datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.

Download Full-text

EVALUASI KINERJA ALGORITMA ASSOCIATION RULE

BAREKENG JURNAL ILMU MATEMATIKA DAN TERAPAN ◽

10.30598/barekengvol1iss1pp38-45 ◽

2007 ◽

Vol 1 (1) ◽

pp. 38-45

Author(s):

Gysber J. Tamaela

Keyword(s):

Real World ◽

Association Rule ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Counting Strategy ◽

Reliable Performance ◽

Real World Datasets ◽

Different Characteristics ◽

The Relationship ◽

New Algorithms

Association is a technique in data mining used to identify the relationship between itemsets in a database (association rule). Some researches in association rule since the invention of AIS algorithm in 1993 have yielded several new algorithms. Some of those used artificial datasets (IBM) and claimed by the authors to have a reliable performance in finding maximal frequent itemset. But these datasets have a different characteristics from real world dataset. The goal of this research is to compare the performance of Apriori and Cut Both Ways (CBW) algorithms using 3 real world datasets. We used small and large values of minimum support thresholds as atreatment for each algorithm and datasets. As a result we find that the characteristics of datasets have a signifcant effect on the performance of Apriori and CBW. Support counting strategy, horizontal counting, showed a better performance compared to vertical intersection although candidate frequent itemsets counted was fewer.

Download Full-text

Adaptively Multi-Objective Adversarial Training for Dialogue Generation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/397 ◽

2020 ◽

Author(s):

Xuemiao Zhang ◽

Zhouxing Tan ◽

Xiaoning Zhang ◽

Yang Cao ◽

Rui Yan

Keyword(s):

Real World ◽

Optimization Problem ◽

Experimental Results ◽

Sampling Distribution ◽

Multi Objective Optimization ◽

Generation Task ◽

Multi Objective ◽

Adversarial Models ◽

Adversarial Training ◽

Real World Datasets

Naive neural dialogue generation models tend to produce repetitive and dull utterances. The promising adversarial models train the generator against a well-designed discriminator to push it to improve towards the expected direction. However, assessing dialogues requires consideration of many aspects of linguistics, which are difficult to be fully covered by a single discriminator. To address it, we reframe the dialogue generation task as a multi-objective optimization problem and propose a novel adversarial dialogue generation framework with multiple discriminators that excel in different objectives for multiple linguistic aspects, called AMPGAN, whose feasibility is proved by theoretical derivations. Moreover, we design an adaptively adjusted sampling distribution to balance the discriminators and promote the overall improvement of the generator by continuing to focus on these objectives that the generator is not performing well relatively. Experimental results on two real-world datasets show a significant improvement over the baselines.

Download Full-text