Detecting Anomalous Ratings in Collaborative Filtering Recommender Systems

Online rating data is ubiquitous on existing popular E-commerce websites such as Amazon, Yelp etc., which influences deeply the following customer choices about products used by E-businessman. Collaborative filtering recommender systems (CFRSs) play crucial role in rating systems. Since CFRSs are highly vulnerable to “shilling” attacks, it is common occurrence that attackers contaminate the rating systems with malicious rates to achieve their attack intentions. Despite detection methods based on such attacks have received much attention, the problem of detection accuracy remains largely unsolved. Moreover, few can scale up to handle large networks. This paper proposes a fast and effective detection method which combines two stages to find out abnormal users. Firstly, the manuscript employs a graph mining method to spot automatically suspicious nodes in a constructed graph with millions of nodes. And then, this manuscript continue to determine abnormal users by exploiting suspected target items based on the result of first stage. Experiments evaluate the effectiveness of the method.

Download Full-text

Detecting Shilling Attacks with Automatic Features from Multiple Views

Security and Communication Networks ◽

10.1155/2019/6523183 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Yaojun Hao ◽

Fuzhi Zhang ◽

Jian Wang ◽

Qingshan Zhao ◽

Jianfang Cao

Keyword(s):

Principal Component Analysis ◽

Recommender Systems ◽

Detection Method ◽

Principal Component ◽

Component Analysis ◽

Experimental Results ◽

Detection Methods ◽

Detection Accuracy ◽

Multiple Views ◽

Fine Grained

Due to the openness of the recommender systems, the attackers are likely to inject a large number of fake profiles to bias the prediction of such systems. The traditional detection methods mainly rely on the artificial features, which are often extracted from one kind of user-generated information. In these methods, fine-grained interactions between users and items cannot be captured comprehensively, leading to the degradation of detection accuracy under various types of attacks. In this paper, we propose an ensemble detection method based on the automatic features extracted from multiple views. Firstly, to collaboratively discover the shilling profiles, the users’ behaviors are analyzed from multiple views including ratings, item popularity, and user-user graph. Secondly, based on the data preprocessed from multiple views, the stacked denoising autoencoders are used to automatically extract user features with different corruption rates. Moreover, the features extracted from multiple views are effectively combined based on principal component analysis. Finally, according to the features extracted with different corruption rates, the weak classifiers are generated and then integrated to detect attacks. The experimental results on the MovieLens, Netflix, and Amazon datasets indicate that the proposed method can effectively detect various attacks.

Download Full-text

UARR: A Novel Similarity Measure for Collaborative Filtering Recommendation

Cybernetics and Information Technologies ◽

10.2478/cait-2013-0043 ◽

2013 ◽

Vol 13 (Special-Issue) ◽

pp. 122-130

Author(s):

Yue Huang ◽

Xuedong Gao ◽

Shujuan Gu

Keyword(s):

Collaborative Filtering ◽

Recommender Systems ◽

Similarity Measure ◽

Accurate Prediction ◽

Similarity Measurement ◽

Data Set ◽

Rating Data ◽

User Similarity ◽

Level Difference

Abstract User similarity measurement plays a key role in collaborative filtering recommendation which is the most widely applied technique in recommender systems. Traditional user-based collaborative filtering recommendation methods focus on absolute rating difference of common rated items while neglecting the relative rating level difference to the same items. In order to overcome this drawback, we propose a novel user similarity measure which takes into account the degree of rating the level gap that users could accept. The results of collaborative filtering recommendation based on User Acceptable Rating Radius (UARR) on a real movie rating data set, the MovieLens data set, prove to generate more accurate prediction results compared to the traditional similarity methods.

Download Full-text

An Efficient MapReduce-Based Parallel Processing Framework for User-Based Collaborative Filtering

Symmetry ◽

10.3390/sym11060748 ◽

2019 ◽

Vol 11 (6) ◽

pp. 748 ◽

Cited By ~ 1

Author(s):

Hanjo Jeong ◽

Kyung Jin CHA

Keyword(s):

Parallel Processing ◽

Collaborative Filtering ◽

Recommender Systems ◽

Rating Data ◽

Data Framework ◽

Entire Data ◽

Active User ◽

Full Scan ◽

Processing Framework ◽

Research Studies

User-based collaborative filtering is one of the most-used methods for the recommender systems. However, it takes time to perform the method because it requires a full scan of the entire data to find the neighboring users of each active user, who have similar rating patterns. It also requires time-consuming computations because of the complexity of the algorithms. Furthermore, the amount of rating data in the recommender systems grows rapidly, as the number of users, items, and their rating activities tend to increase. Thus, a big data framework with parallel processing, such as Hadoop, is needed for the recommender systems. There are already many research studies on the MapReduce-based parallel processing method for collaborative filtering. However, most of the research studies have not considered the sequential-access restriction for executing MapReduce jobs and the minimization of the required full scan on the entire data on the Hadoop Distributed File System (HDFS), because HDFS sequentially access data on the disk. In this paper, we introduce an efficient MapReduce-based parallel processing framework for collaborative filtering method that requires only a one-time parallelized full scan, while adhering to the sequential access patterns on Hadoop data nodes. Our proposed framework contains a novel MapReduce framework, including a partial computation framework for calculating the predictions and finding the recommended items for an active user with such a one-way parallelized scan. Lastly, we have used the MovieLens dataset to show the validity of our proposed method, mainly in terms of the efficiency of the parallelized method.

Download Full-text

COLLABORATIVE FILTERING FOR MULTI-CLASS DATA USING BAYESIAN NETWORKS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213008003789 ◽

2008 ◽

Vol 17 (01) ◽

pp. 71-85 ◽

Cited By ~ 16

Author(s):

XIAOYUAN SU ◽

TAGHI M. KHOSHGOFTAAR

Keyword(s):

Logistic Regression ◽

Bayesian Networks ◽

Collaborative Filtering ◽

Recommender Systems ◽

Real World ◽

Incomplete Data ◽

Pearson Correlation ◽

Bayesian Classifiers ◽

Rating Data ◽

Better Than

As one of the most successful recommender systems, collaborative filtering (CF) algorithms are required to deal with high sparsity and high requirement of scalability amongst other challenges. Bayesian networks (BNs), one of the most frequently used classifiers, can be used for CF tasks. Previous works on applying BNs to CF tasks were mainly focused on binary-class data, and used simple or basic Bayesian classifiers.1,2 In this work, we apply advanced BNs models to CF tasks instead of simple ones, and work on real-world multi-class CF data instead of synthetic binary-class data. Empirical results show that with their ability to deal with incomplete data, the extended logistic regression on tree augmented naïve Bayes (TAN-ELR)3 CF model consistently performs better than the traditional Pearson correlation-based CF algorithm for the rating data that have few items or high missing rates. In addition, the ELR-optimized BNs CF models are robust in terms of the ability to make predictions, while the robustness of the Pearson correlation-based CF algorithm degrades as the sparseness of the data increases.

Download Full-text

OutlierNets: Highly Compact Deep Autoencoder Network Architectures for On-Device Acoustic Anomaly Detection

Sensors ◽

10.3390/s21144805 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4805

Author(s):

Saad Abbasi ◽

Mahmoud Famouri ◽

Mohammad Javad Shafiee ◽

Alexander Wong

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Detection Methods ◽

Detection Accuracy ◽

Network Architectures ◽

Design Exploration ◽

Convolutional Autoencoder ◽

Acoustic Anomaly ◽

Human Operators ◽

Computational Resources

Human operators often diagnose industrial machinery via anomalous sounds. Given the new advances in the field of machine learning, automated acoustic anomaly detection can lead to reliable maintenance of machinery. However, deep learning-driven anomaly detection methods often require an extensive amount of computational resources prohibiting their deployment in factories. Here we explore a machine-driven design exploration strategy to create OutlierNets, a family of highly compact deep convolutional autoencoder network architectures featuring as few as 686 parameters, model sizes as small as 2.7 KB, and as low as 2.8 million FLOPs, with a detection accuracy matching or exceeding published architectures with as many as 4 million parameters. The architectures are deployed on an Intel Core i5 as well as a ARM Cortex A72 to assess performance on hardware that is likely to be used in industry. Experimental results on the model’s latency show that the OutlierNet architectures can achieve as much as 30x lower latency than published networks.

Download Full-text

Edge Computing for Data Anomaly Detection of Multi-Sensors in Underground Mining

Electronics ◽

10.3390/electronics10030302 ◽

2021 ◽

Vol 10 (3) ◽

pp. 302

Author(s):

Chunde Liu ◽

Xianli Su ◽

Chuanwen Li

Keyword(s):

Energy Consumption ◽

Anomaly Detection ◽

Underground Mining ◽

Heterogeneous Data ◽

Edge Computing ◽

Sensor Nodes ◽

Detection Methods ◽

Detection Accuracy ◽

Clustering Methods ◽

Safety Warning

There is a growing interest in safety warning of underground mining due to the huge threat being faced by those working in underground mining. Data acquisition of sensors based on Internet of Things (IoT) is currently the main method, but the data anomaly detection and analysis of multi-sensors is a challenging task: firstly, the data that are collected by different sensors of underground mining are heterogeneous; secondly, real-time is required for the data anomaly detection of safety warning. Currently, there are many anomaly detection methods, such as traditional clustering methods K-means and C-means. Meanwhile, Artificial Intelligence (AI) is widely used in data analysis and prediction. However, K-means and C-means cannot directly process heterogeneous data, and AI algorithms require equipment with high computing and storage capabilities. IoT equipment of underground mining cannot perform complex calculation due to the limitation of energy consumption. Therefore, many existing methods cannot be directly used for IoT applications in underground mining. In this paper, a multi-sensors data anomaly detection method based on edge computing is proposed. Firstly, an edge computing model is designed, and according to the computing capabilities of different types of devices, anomaly detection tasks are migrated to different edge devices, which solve the problem of insufficient computing capabilities of the devices. Secondly, according to the requirements of different anomaly detection tasks, edge anomaly detection algorithms for sensor nodes and sink nodes are designed respectively. Lastly, an experimental platform is built for performance comparison analysis, and the experimental results show that the proposed algorithm has better performance in anomaly detection accuracy, delay, and energy consumption.

Download Full-text

High-Speed Lightweight Ship Detection Algorithm Based on YOLO-V4 for Three-Channels RGB SAR Image

Remote Sensing ◽

10.3390/rs13101909 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1909

Author(s):

Jiahuan Jiang ◽

Xiongjun Fu ◽

Rui Qin ◽

Xiaoyan Wang ◽

Zhifeng Ma

Keyword(s):

Deep Learning ◽

Gpu Computing ◽

Hot Spot ◽

Detection Algorithm ◽

Detection Methods ◽

Detection Accuracy ◽

Processing Unit ◽

Sar Image ◽

Marine Monitoring ◽

Ship Detection

Synthetic Aperture Radar (SAR) has become one of the important technical means of marine monitoring in the field of remote sensing due to its all-day, all-weather advantage. National territorial waters to achieve ship monitoring is conducive to national maritime law enforcement, implementation of maritime traffic control, and maintenance of national maritime security, so ship detection has been a hot spot and focus of research. After the development from traditional detection methods to deep learning combined methods, most of the research always based on the evolving Graphics Processing Unit (GPU) computing power to propose more complex and computationally intensive strategies, while in the process of transplanting optical image detection ignored the low signal-to-noise ratio, low resolution, single-channel and other characteristics brought by the SAR image imaging principle. Constantly pursuing detection accuracy while ignoring the detection speed and the ultimate application of the algorithm, almost all algorithms rely on powerful clustered desktop GPUs, which cannot be implemented on the frontline of marine monitoring to cope with the changing realities. To address these issues, this paper proposes a multi-channel fusion SAR image processing method that makes full use of image information and the network’s ability to extract features; it is also based on the latest You Only Look Once version 4 (YOLO-V4) deep learning framework for modeling architecture and training models. The YOLO-V4-light network was tailored for real-time and implementation, significantly reducing the model size, detection time, number of computational parameters, and memory consumption, and refining the network for three-channel images to compensate for the loss of accuracy due to light-weighting. The test experiments were completed entirely on a portable computer and achieved an Average Precision (AP) of 90.37% on the SAR Ship Detection Dataset (SSDD), simplifying the model while ensuring a lead over most existing methods. The YOLO-V4-lightship detection algorithm proposed in this paper has great practical application in maritime safety monitoring and emergency rescue.

Download Full-text

Multiview deep learning based on tensor decomposition and its application in fault detection of overhead contact systems

The Visual Computer ◽

10.1007/s00371-021-02080-y ◽

2021 ◽

Author(s):

Xuewu Zhang ◽

Yansheng Gong ◽

Chen Qiao ◽

Wenfeng Jing

Keyword(s):

High Speed ◽

Tensor Decomposition ◽

Detection Methods ◽

Detection Accuracy ◽

Feature Maps ◽

Training Time ◽

Detection Model ◽

Railway Line ◽

Result Show ◽

Deep Layers

AbstractThis article mainly focuses on the most common types of high-speed railways malfunctions in overhead contact systems, namely, unstressed droppers, foreign-body invasions, and pole number-plate malfunctions, to establish a deep-network detection model. By fusing the feature maps of the shallow and deep layers in the pretraining network, global and local features of the malfunction area are combined to enhance the network's ability of identifying small objects. Further, in order to share the fully connected layers of the pretraining network and reduce the complexity of the model, Tucker tensor decomposition is used to extract features from the fused-feature map. The operation greatly reduces training time. Through the detection of images collected on the Lanxin railway line, experiments result show that the proposed multiview Faster R-CNN based on tensor decomposition had lower miss probability and higher detection accuracy for the three types faults. Compared with object-detection methods YOLOv3, SSD, and the original Faster R-CNN, the average miss probability of the improved Faster R-CNN model in this paper is decreased by 37.83%, 51.27%, and 43.79%, respectively, and average detection accuracy is increased by 3.6%, 9.75%, and 5.9%, respectively.

Download Full-text

Improved SSD-assisted algorithm for surface defect detection of electromagnetic luminescence

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x21995388 ◽

2021 ◽

pp. 1748006X2199538

Author(s):

Zhenying Xu ◽

Ziqian Wu ◽

Wei Fan

Keyword(s):

Defect Detection ◽

Feature Fusion ◽

Recognition Rate ◽

Detection Methods ◽

Small Scale ◽

Detection Accuracy ◽

Single Shot ◽

Surface Defect Detection ◽

Feature Pyramid ◽

Small Feature

Defect detection of electromagnetic luminescence (EL) cells is the core step in the production and preparation of solar cell modules to ensure conversion efficiency and long service life of batteries. However, due to the lack of feature extraction capability for small feature defects, the traditional single shot multibox detector (SSD) algorithm performs not well in EL defect detection with high accuracy. Consequently, an improved SSD algorithm with modification in feature fusion in the framework of deep learning is proposed to improve the recognition rate of EL multi-class defects. A dataset containing images with four different types of defects through rotation, denoising, and binarization is established for the EL. The proposed algorithm can greatly improve the detection accuracy of the small-scale defect with the idea of feature pyramid networks. An experimental study on the detection of the EL defects shows the effectiveness of the proposed algorithm. Moreover, a comparison study shows the proposed method outperforms other traditional detection methods, such as the SIFT, Faster R-CNN, and YOLOv3, in detecting the EL defect.

Download Full-text

An Optimized Stacking Ensemble Model for Phishing Websites Detection

Electronics ◽

10.3390/electronics10111285 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1285

Author(s):

Mohammed Al-Sarem ◽

Faisal Saeed ◽

Zeyad Ghaleb Al-Mekhlafi ◽

Badiea Abdulkarem Mohammed ◽

Tawfik Al-Hadhrami ◽

...

Keyword(s):

Machine Learning ◽

Random Forests ◽

Ensemble Method ◽

Detection Methods ◽

Detection Accuracy ◽

Ensemble Model ◽

Security Attacks ◽

Data Set ◽

Machine Learning Methods ◽

Ensemble Machine Learning

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.

Download Full-text