Enhancing Recommender Diversity Using Gaussian Cloud Transformation

The recommender systems community is paying great attention to diversity as key qualities beyond accuracy in real recommendation scenarios. Multifarious diversity-increasing approaches have been developed to enhance recommendation diversity in the related literature while making personalized recommendations to users. In this work, we present Gaussian Cloud Recommendation Algorithm (GCRA), a novel method designed to balance accuracy and diversity personalized top-N recommendation lists in order to capture the user's complete spectrum of tastes. Our proposed algorithm does not require semantic information. Meanwhile we propose a unified framework to extend the traditional CF algorithms via utilizing GCRA for improving the recommendation system performance. Our work builds upon prior research on recommender systems. Though being detrimental to average accuracy, we show that our method can capture the user's complete spectrum of interests. Systematic experiments on three real-world data sets have demonstrated the effectiveness of our proposed approach in learning both accuracy and diversity.

Download Full-text

AANMF: Attribute-Aware Attentional Neural Matrix Factorization

Information Technology And Control ◽

10.5755/j01.itc.48.4.23149 ◽

2019 ◽

Vol 48 (4) ◽

pp. 682-693

Author(s):

Bo Zheng ◽

Jinsong Hu

Keyword(s):

Matrix Factorization ◽

Recommendation System ◽

Auxiliary Information ◽

Inner Product ◽

Data Sets ◽

It Projects ◽

Real World Data ◽

Latent Space ◽

Almost All ◽

Novel Model

Matrix Factorization (MF) is one of the most intuitive and effective methods in the Recommendation System domain. It projects sparse (user, item) interactions into dense feature products which endues strong generality to the MF model. To leverage this interaction, recent works use auxiliary information of users and items. Despite effectiveness, irrationality still exists among these methods, since almost all of them simply add the feature of auxiliary information in dense latent space to the feature of the user or item. In this work, we propose a novel model named AANMF, short for Attribute-aware Attentional Neural Matrix Factorization. AANMF combines two main parts, namely, neural-network-based factorization architecture for modeling inner product and attention-mechanism-based attribute processing cell for attribute handling. Extensive experiments on two real-world data sets demonstrate the robust and stronger performance of our model. Notably, we show that our model can deal with the attributes of user or item more reasonably. Our implementation of AANMF is publicly available at https://github.com/Holy-Shine/AANMF.

Download Full-text

Parametric Rough Sets with Application to Granular Association Rule Mining

Mathematical Problems in Engineering ◽

10.1155/2013/461363 ◽

2013 ◽

Vol 2013 ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Xu He ◽

Fan Min ◽

William Zhu

Keyword(s):

Recommender Systems ◽

Association Rule ◽

Rough Sets ◽

Relational Databases ◽

Data Sets ◽

Rule Mining ◽

Real World Data ◽

New Model ◽

New Type ◽

Two Universes

Granular association rules reveal patterns hidden in many-to-many relationships which are common in relational databases. In recommender systems, these rules are appropriate for cold-start recommendation, where a customer or a product has just entered the system. An example of such rules might be “40% men like at least 30% kinds of alcohol; 45% customers are men and 6% products are alcohol.” Mining such rules is a challenging problem due to pattern explosion. In this paper, we build a new type of parametric rough sets on two universes and propose an efficient rule mining algorithm based on the new model. Specifically, the model is deliberately defined such that the parameter corresponds to one threshold of rules. The algorithm benefits from the lower approximation operator in the new model. Experiments on two real-world data sets show that the new algorithm is significantly faster than an existing algorithm, and the performance of recommender systems is stable.

Download Full-text

Something’s Missing? A Procedure for Extending Item Content Data Sets in the Context of Recommender Systems

Information Systems Frontiers ◽

10.1007/s10796-020-10071-y ◽

2020 ◽

Author(s):

Bernd Heinrich ◽

Marcus Hopf ◽

Daniel Lohninger ◽

Alexander Schiller ◽

Michael Szubartowicz

Keyword(s):

Recommender Systems ◽

Additional Data ◽

Rapid Development ◽

Data Sets ◽

Web Portals ◽

Real World Data ◽

Data Set ◽

Item Content ◽

Choice Literature

Abstract The rapid development of e-commerce has led to a swiftly increasing number of competing providers in electronic markets, which maintain their own, individual data describing the offered items. Recommender systems are popular and powerful tools relying on this data to guide users to their individually best item choice. Literature suggests that data quality of item content data has substantial influence on recommendation quality. Thereby, the dimension completeness is expected to be particularly important. Herein resides a considerable chance to improve recommendation quality by increasing completeness via extending an item content data set with an additional data set of the same domain. This paper therefore proposes a procedure for such a systematic data extension and analyzes effects of the procedure regarding items, content and users based on real-world data sets from four leading web portals. The evaluation results suggest that the proposed procedure is indeed effective in enabling improved recommendation quality.

Download Full-text

Partial Multi-Label Learning with Noisy Label Identification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6117 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6454-6461 ◽

Cited By ~ 1

Author(s):

Ming-Kun Xie ◽

Sheng-Jun Huang

Keyword(s):

Ground Truth ◽

Noise Model ◽

Learning Approach ◽

Data Sets ◽

Trace Norm ◽

Unified Framework ◽

Real World Data ◽

Ground Truth Information ◽

Label Correlation ◽

Noisy Labels

Partial multi-label learning (PML) deals with problems where each instance is assigned with a candidate label set, which contains multiple relevant labels and some noisy labels. Recent studies usually solve PML problems with the disambiguation strategy, which recovers ground-truth labels from the candidate label set by simply assuming that the noisy labels are generated randomly. In real applications, however, noisy labels are usually caused by some ambiguous contents of the example. Based on this observation, we propose a partial multi-label learning approach to simultaneously recover the ground-truth information and identify the noisy labels. The two objectives are formalized in a unified framework with trace norm and ℓ1 norm regularizers. Under the supervision of the observed noise-corrupted label matrix, the multi-label classifier and noisy label identifier are jointly optimized by incorporating the label correlation exploitation and feature-induced noise model. Extensive experiments on synthetic as well as real-world data sets validate the effectiveness of the proposed approach.

Download Full-text

A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data

BMC Bioinformatics ◽

10.1186/s12859-021-04303-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Audrey Hulot ◽

Denis Laloë ◽

Florence Jaffrézic

Keyword(s):

Breast Cancer ◽

Simple Procedure ◽

Data Sets ◽

Integration Step ◽

Visual Interpretation ◽

Data Types ◽

Unified Framework ◽

Real World Data ◽

Multiple Factor Analysis ◽

Data Set

Abstract Background Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations. Results To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question. Conclusion Our approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes.

Download Full-text

Penalty-Enhanced Utility-Based Multi-Criteria Recommendations

Information ◽

10.3390/info11120551 ◽

2020 ◽

Vol 11 (12) ◽

pp. 551

Author(s):

Yong Zheng

Keyword(s):

Decision Making ◽

Recommender Systems ◽

Real World ◽

User Preferences ◽

Experimental Results ◽

Data Sets ◽

Real World Data ◽

Promising Solution ◽

Multiple Domains

Recommender systems have been successfully applied to assist decision making in multiple domains and applications. Multi-criteria recommender systems try to take the user preferences on multiple criteria into consideration, in order to further improve the quality of the recommendations. Most recently, the utility-based multi-criteria recommendation approach has been proposed as an effective and promising solution. However, the issue of over-/under-expectations was ignored in the approach, which may bring risks to the recommendation model. In this paper, we propose a penalty-enhanced model to alleviate this issue. Our experimental results based on multiple real-world data sets can demonstrate the effectiveness of the proposed solutions. In addition, the outcomes of the proposed solution can also help explain the characteristics of the applications by observing the treatment on the issue of over-/under-expectations.

Download Full-text

TrustSVD: A Novel Trust-Based Matrix Factorization Model with User Trust and Item Ratings

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i11.422 ◽

2017 ◽

Vol 7 (11) ◽

pp. 7 ◽

Cited By ~ 1

Author(s):

K Sobha Rani

Keyword(s):

Matrix Factorization ◽

Social Trust ◽

State Of The Art ◽

Data Sets ◽

Real World Data ◽

Recommendation Algorithm ◽

Active User ◽

Factorization Model ◽

The Social ◽

Matrix Factorization Technique

Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.

Download Full-text

Hfinger: Malware HTTP Request Fingerprinting

Entropy ◽

10.3390/e23050507 ◽

2021 ◽

Vol 23 (5) ◽

pp. 507

Author(s):

Piotr Białczak ◽

Wojciech Mazurczyk

Keyword(s):

Real World ◽

Network Traffic ◽

Experimental Evaluation ◽

Data Sets ◽

Real World Data ◽

Malicious Software ◽

Default Mode ◽

World Data ◽

Effectiveness Analysis ◽

Http Protocol

Malicious software utilizes HTTP protocol for communication purposes, creating network traffic that is hard to identify as it blends into the traffic generated by benign applications. To this aim, fingerprinting tools have been developed to help track and identify such traffic by providing a short representation of malicious HTTP requests. However, currently existing tools do not analyze all information included in the HTTP message or analyze it insufficiently. To address these issues, we propose Hfinger, a novel malware HTTP request fingerprinting tool. It extracts information from the parts of the request such as URI, protocol information, headers, and payload, providing a concise request representation that preserves the extracted information in a form interpretable by a human analyst. For the developed solution, we have performed an extensive experimental evaluation using real-world data sets and we also compared Hfinger with the most related and popular existing tools such as FATT, Mercury, and p0f. The conducted effectiveness analysis reveals that on average only 1.85% of requests fingerprinted by Hfinger collide between malware families, what is 8–34 times lower than existing tools. Moreover, unlike these tools, in default mode, Hfinger does not introduce collisions between malware and benign applications and achieves it by increasing the number of fingerprints by at most 3 times. As a result, Hfinger can effectively track and hunt malware by providing more unique fingerprints than other standard tools.

Download Full-text

Learning emotional word embeddings for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201993 ◽

2021 ◽

pp. 1-13

Author(s):

Qingtian Zeng ◽

Xishi Zhao ◽

Xiaohui Hu ◽

Hua Duan ◽

Zhongying Zhao ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Emotional Word ◽

Classification Model ◽

Data Sets ◽

Word Embeddings ◽

Real World Data ◽

Text Documents

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.

Download Full-text

Entropy Based Features Distribution for Anti-DDoS Model in SDN

Sustainability ◽

10.3390/su13031522 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1522

Author(s):

Raja Majid Ali Ujjan ◽

Zeeshan Pervez ◽

Keshav Dahal ◽

Wajahat Ali Khan ◽

Asad Masood Khattak ◽

...

Keyword(s):

Network Security ◽

False Positive ◽

Denial Of Service ◽

Network Services ◽

Detection Accuracy ◽

Data Sets ◽

Traffic Patterns ◽

Average Accuracy ◽

Ddos Detection ◽

Quantitative Results

In modern network infrastructure, Distributed Denial of Service (DDoS) attacks are considered as severe network security threats. For conventional network security tools it is extremely difficult to distinguish between the higher traffic volume of a DDoS attack and large number of legitimate users accessing a targeted network service or a resource. Although these attacks have been widely studied, there are few works which collect and analyse truly representative characteristics of DDoS traffic. The current research mostly focuses on DDoS detection and mitigation with predefined DDoS data-sets which are often hard to generalise for various network services and legitimate users’ traffic patterns. In order to deal with considerably large DDoS traffic flow in a Software Defined Networking (SDN), in this work we proposed a fast and an effective entropy-based DDoS detection. We deployed generalised entropy calculation by combining Shannon and Renyi entropy to identify distributed features of DDoS traffic—it also helped SDN controller to effectively deal with heavy malicious traffic. To lower down the network traffic overhead, we collected data-plane traffic with signature-based Snort detection. We then analysed the collected traffic for entropy-based features to improve the detection accuracy of deep learning models: Stacked Auto Encoder (SAE) and Convolutional Neural Network (CNN). This work also investigated the trade-off between SAE and CNN classifiers by using accuracy and false-positive results. Quantitative results demonstrated SAE achieved relatively higher detection accuracy of 94% with only 6% of false-positive alerts, whereas the CNN classifier achieved an average accuracy of 93%.

Download Full-text