Improved Fake Reviews Detection Model Based on Vertical Ensemble Tri-Training and Active Learning

Chunyong Yin; Haoqi Cuan; Yuhang Zhu; Zhichao Yin

doi:10.1145/3450285

Improved Fake Reviews Detection Model Based on Vertical Ensemble Tri-Training and Active Learning

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3450285 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1-19

Author(s):

Chunyong Yin ◽

Haoqi Cuan ◽

Yuhang Zhu ◽

Zhichao Yin

Keyword(s):

Active Learning ◽

Large Scale ◽

User Behavior ◽

Classification Performance ◽

Online Activity ◽

Detection Model ◽

Model Based ◽

Group Integration ◽

Good Classification Performance ◽

Fake Reviews

People’s increasingly frequent online activity has generated a large number of reviews, whereas fake reviews can mislead users and harm their personal interests. In addition, it is not feasible to label reviews on a large scale because of the high cost of manual labeling. Therefore, to improve the detection performance by utilizing the unlabeled reviews, this article proposes a fake reviews detection model based on vertical ensemble tri-training and active learning (VETT-AL). The model combines the features of review text with the user behavior features as feature extraction. In the VETT-AL algorithm, the iterative process is divided into two parts: vertical integration within the group and horizontal integration among the groups. The intra-group integration is to integrate three original classifiers by using the previous iterative models of the classifiers. The inter-group integration is to adopt the active learning based on entropy to select the data with the highest confidence and label it, and as the result of that, the second generation classifiers are trained by the traditional process to improve the accuracy of the label. Experimental results show that the proposed model has a good classification performance.

A bio-inspired credit card fraud detection model based on user behavior analysis suitable for business management in electronic banking

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-01759-9 ◽

2020 ◽

Vol 11 (11) ◽

pp. 4873-4887 ◽

Cited By ~ 1

Author(s):

Saad M. Darwish

Keyword(s):

Behavior Analysis ◽

Credit Card ◽

User Behavior ◽

Business Management ◽

Fraud Detection ◽

Electronic Banking ◽

Credit Card Fraud ◽

User Behavior Analysis ◽

Detection Model ◽

Model Based

DEEP LEARNING FOR VESSEL DETECTION AND IDENTIFICATION FROM SPACEBORNE OPTICAL IMAGERY

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2021-303-2021 ◽

2021 ◽

Vol V-3-2021 ◽

pp. 303-310

Author(s):

G. Matasci ◽

J. Plante ◽

K. Kasa ◽

P. Mousavi ◽

A. Stewart ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Input Image ◽

Automatic Identification ◽

Identification System ◽

Optical Remote Sensing ◽

Detection Model ◽

Model Based ◽

Optical Images ◽

Vessel Detection

Abstract. We present a deep learning-based vessel detection and (re-)identification approach from spaceborne optical images. We introduce these two components as part of a maritime surveillance from space pipeline and present experimental results on challenging real-world maritime datasets derived from WorldView imagery. First, we developed a vessel detection model based on RetinaNet achieving a performance of 0.795 F1-score on a challenging multi-scale dataset. We then collected a large-scale dataset for vessel identification by applying the detection model on 200+ optical images, detecting the vessels therein and assigning them an identity via an Automatic Identification System association framework. A vessel re-identification model based on Twin neural networks has then been trained on this dataset featuring 2500+ unique vessels with multiple repeated occurrences across different acquisitions. The model allows to naturally establish similarities between vessel images. It returns a relevant ranking of candidate vessels from a database when provided an input image for a specific vessel the user might be interested in, with top-1 and top-10 accuracies of 38.7% and 76.5%, respectively. This study demonstrates the potential offered by the latest advances in deep learning and computer vision when applied to optical remote sensing imagery in a maritime context, opening new opportunities for automated vessel monitoring and tracking capabilities from space.

Fatigue driving detection model based on multi-feature fusion and semi-supervised active learning

IET Intelligent Transport Systems ◽

10.1049/iet-its.2018.5590 ◽

2019 ◽

Vol 13 (9) ◽

pp. 1401-1409 ◽

Cited By ~ 6

Author(s):

Xu Li ◽

Lin Hong ◽

Jian-chun Wang ◽

Xiang Liu

Keyword(s):

Active Learning ◽

Feature Fusion ◽

Detection Model ◽

Model Based ◽

Fatigue Driving

Effect Improved for High-Dimensional and Unbalanced Data Anomaly Detection Model Based on KNN-SMOTE-LSTM

Complexity ◽

10.1155/2020/9084704 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Fuguang Bao ◽

Yongqiang Wu ◽

Zhaogang Li ◽

Yongzhao Li ◽

Lili Liu ◽

...

Keyword(s):

Anomaly Detection ◽

Large Scale ◽

Short Term Memory ◽

Sequence Data ◽

Binary Classification ◽

High Dimensional ◽

Unbalanced Data ◽

K Nearest Neighbors ◽

Detection Model ◽

Model Based

High-dimensional and unbalanced data anomaly detection is common. Effective anomaly detection is essential for problem or disaster early warning and maintaining system reliability. A significant research issue related to the data analysis of the sensor is the detection of anomalies. The anomaly detection is essentially an unbalanced sequence binary classification. The data of this type contains characteristics of large scale, high complex computation, unbalanced data distribution, and sequence relationship among data. This paper uses long short-term memory networks (LSTMs) combined with historical sequence data; also, it integrates the synthetic minority oversampling technique (SMOTE) algorithm and K-nearest neighbors (kNN), and it designs and constructs an anomaly detection network model based on kNN-SMOTE-LSTM in accordance with the data characteristic of being unbalanced. This model can continuously filter out and securely generate samples to improve the performance of the model through kNN discriminant classifier and avoid the blindness and limitations of the SMOTE algorithm in generating new samples. The experiments demonstrated that the structured kNN-SMOTE-LSTM model can significantly improve the performance of the unbalanced sequence binary classification.

PBDT: Python Backdoor Detection Model Based on Combined Features

Security and Communication Networks ◽

10.1155/2021/9923234 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Yong Fang ◽

Mingyu Xie ◽

Cheng Huang

Keyword(s):

Research Work ◽

Classification Performance ◽

Statistical Characteristics ◽

Language Differences ◽

Application Security ◽

Detection Model ◽

Combined Features ◽

Good Classification Performance ◽

Opcode Sequence ◽

The Common

Application security is essential in today’s highly development period. Backdoor is a means by which attackers can invade the system to achieve illegal purposes and damage users’ rights. It has posed a serious threat to network security. Thus, it is urgent to take adequate measures to defend such attacks. Previous research work was mainly focused on numerous PHP webshells, with less research on Python backdoor files. Language differences make the method not entirely applicable. This paper proposes a Python backdoor detection model named PBDT based on combined features. The model summarizes the common functional modules and functions in the backdoor files and extracts the number of calls in the text to form sample features. What is more, we consider the text’s statistical characteristics, including the information entropy, the longest string, etc., to identify the obfuscated Python code. Besides, the opcode sequence is used to represent code characteristics, such as TF-IDF vector and FastText classifier, to eliminate the influence of interference items. Finally, we introduce the Random Forest algorithm to build a classifier. Covering most types of backdoors, some samples are obfuscated, the model achieves an accuracy of 97.70%, and the TNR index is as high as 98.66%, showing a good classification performance in Python backdoor detection.

Active Learning Strategies for Phenotypic Profiling of High-Content Screens

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057114527313 ◽

2014 ◽

Vol 19 (5) ◽

pp. 685-695 ◽

Cited By ~ 17

Author(s):

Kevin Smith ◽

Peter Horvath

Keyword(s):

Active Learning ◽

Learning Strategies ◽

Large Scale ◽

Classification Performance ◽

New Drugs ◽

Supervised Machine Learning ◽

Biological Research ◽

Time Investment ◽

Active Learning Strategies ◽

The Impact

High-content screening is a powerful method to discover new drugs and carry out basic biological research. Increasingly, high-content screens have come to rely on supervised machine learning (SML) to perform automatic phenotypic classification as an essential step of the analysis. However, this comes at a cost, namely, the labeled examples required to train the predictive model. Classification performance increases with the number of labeled examples, and because labeling examples demands time from an expert, the training process represents a significant time investment. Active learning strategies attempt to overcome this bottleneck by presenting the most relevant examples to the annotator, thereby achieving high accuracy while minimizing the cost of obtaining labeled data. In this article, we investigate the impact of active learning on single-cell–based phenotype recognition, using data from three large-scale RNA interference high-content screens representing diverse phenotypic profiling problems. We consider several combinations of active learning strategies and popular SML methods. Our results show that active learning significantly reduces the time cost and can be used to reveal the same phenotypic targets identified using SML. We also identify combinations of active learning strategies and SML methods which perform better than others on the phenotypic profiling problems we studied.

From Local to Global: A Transfer Learning-Based Approach for Mapping Poplar Plantations at Large Scale

10.20944/preprints202004.0302.v1 ◽

2020 ◽

Author(s):

Yousra Hamrouni ◽

Éric Paillassa ◽

Véronique Chéret ◽

Claude Monteil ◽

David Sheeren

Keyword(s):

Active Learning ◽

Random Sampling ◽

Large Scale ◽

Forest Cover ◽

Classification Performance ◽

Global Model ◽

Source Image ◽

National Scale ◽

Training Samples ◽

Poplar Plantations

Reliable estimates of poplar plantations area are not available at the French national scale due to the unsuitability and low update rate of existing forest databases for this short-rotation species. While supervised classification methods have been shown to be highly accurate in mapping forest cover from remotely sensed images, their performance depends to a great extent on the labelled samples used to build the models. In addition to their high acquisition cost, such samples are often scarce and not fully representative of the variability in class distributions. Consequently, when classification models are applied to large areas with high intra-class variance, they generally yield poor accuracies. In this paper, we propose the use of active learning (AL) to efficiently adapt a classifier trained on a source image to spatially distinct target images with minimal labelling effort and without sacrificing classification performance. The adaptation consists in actively adding to the initial local model, new relevant training samples from other areas, in a cascade that iteratively improves the generalisation capabilities of the classifier, leading to a global model tailored to different areas. This active selection relies on uncertainty sampling to directly focus on the most informative pixels for which the algorithm is the least certain of their class labels. Experiments conducted on Sentinel-2 time series showed that when the same number of training samples was used, active learning outperformed passive learning (random sampling) by up to 5% of overall accuracy and up to 12% of class F-score. In addition, and depending on the class considered, the random sampling required up to 50% more samples to achieve the same performance of an active learning-based model. Moreover, the results demonstrate the suitability of the derived global model to accurately map poplar plantations among other tree species with overall accuracy values up to 14% higher than those obtained with local models. The proposed approach paves the way for national-scale mapping in an operational context.

A Large-Scale Parallel Network Intrusion Detection Model Based on K-Means in Security Audit System

Communications in Computer and Information Science - Artificial Intelligence and Security ◽

10.1007/978-981-15-8086-4_18 ◽

2020 ◽

pp. 189-198

Author(s):

Xueming Qiao ◽

Yuan Zhang ◽

Yanhong Liu ◽

Hao Hu ◽

Dongjie Zhu ◽

...

Keyword(s):

Intrusion Detection ◽

Large Scale ◽

Network Intrusion Detection ◽

Security Audit ◽

Detection Model ◽

Audit System ◽

Model Based ◽

Network Intrusion ◽

Parallel Network

The design of database anomalous detection model based on user behavior profile mining

2010 3rd International Conference on Computer Science and Information Technology ◽

10.1109/iccsit.2010.5564082 ◽

2010 ◽

Cited By ~ 1

Author(s):

Yaohui Wang ◽

Hongjian Chu ◽

Zhaoyang Qu

Keyword(s):

User Behavior ◽

Detection Model ◽

Model Based ◽

Behavior Profile

Active Learning with Query Generation for Cost-Effective Text Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6133 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6583-6590

Author(s):

Yi-Fan Yan ◽

Sheng-Jun Huang ◽

Shaoyi Chen ◽

Meng Liao ◽

Jin Xu

Keyword(s):

Active Learning ◽

Text Classification ◽

Large Scale ◽

Cost Effective ◽

Classification Performance ◽

Unlabeled Data ◽

Classification Model ◽

Class Label ◽

Text Document ◽

Query Generation

Labeling a text document is usually time consuming because it requires the annotator to read the whole document and check its relevance with each possible class label. It thus becomes rather expensive to train an effective model for text classification when it involves a large dataset of long documents. In this paper, we propose an active learning approach for text classification with lower annotation cost. Instead of scanning all the examples in the unlabeled data pool to select the best one for query, the proposed method automatically generates the most informative examples based on the classification model, and thus can be applied to tasks with large scale or even infinite unlabeled data. Furthermore, we propose to approximate the generated example with a few summary words by sparse reconstruction, which allows the annotators to easily assign the class label by reading a few words rather than the long document. Experiments on different datasets demonstrate that the proposed approach can effectively improve the classification performance while significantly reduce the annotation cost.