Camouflage is NOT easy: Uncovering adversarial fraudsters in large online app review platform

Given users and products that he/she reviews, can we recognize fake reviews just using the text information, or determine whether a reviewer is a fraud or not? Automatically detecting fake reviews and reviewers is an urgent problem and lots of work attempts for discovering linguistics, behaviors and graph patterns. However, in reality, there are new kinds of fraudsters who can change their behaviors to camouflage as genuine reviewers to avoid detection systems. With the fraudsters become distributed, dynamic, and adversarial, anti-spam tasks face a new challenge. In this paper, we tackle the challenge of adversarial fraudsters in online app review platform and propose a system called DDF (Detect, Defense, and Forecast) to uncover camouflage accounts. Firstly, we select a small set of seed with high-precision based on text and behavior features; Secondly, we build our graph-based detection model for uncovering hidden (distant) users who serve structurally similar to the seed by utilizing Graph Convolutional Network (GCN) algorithm. Thirdly, we evaluate DDF using real-world data set from Tencent APP Store and analyze the potential fraudsters detected by DDF. It is worth mentioning that precision can achieve 0.95+. Finally, we validate the efficiency and scalability of DDF and show that it can be well transferred to other anti-spam tasks.

Download Full-text

Self-Improving Generative Artificial Neural Network for Pseudorehearsal Incremental Class Learning

Algorithms ◽

10.3390/a12100206 ◽

2019 ◽

Vol 12 (10) ◽

pp. 206 ◽

Cited By ~ 1

Author(s):

Diego Mellado ◽

Carolina Saavedra ◽

Steren Chabert ◽

Romina Torres ◽

Rodrigo Salas

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Novelty Detection ◽

Activation Function ◽

Learning Sequence ◽

Data Set ◽

Detection Model ◽

Neural Network System ◽

Small Set ◽

Artificial Neural

Deep learning models are part of the family of artificial neural networks and, as such, they suffer catastrophic interference when learning sequentially. In addition, the greater number of these models have a rigid architecture which prevents the incremental learning of new classes. To overcome these drawbacks, we propose the Self-Improving Generative Artificial Neural Network (SIGANN), an end-to-end deep neural network system which can ease the catastrophic forgetting problem when learning new classes. In this method, we introduce a novel detection model that automatically detects samples of new classes, and an adversarial autoencoder is used to produce samples of previous classes. This system consists of three main modules: a classifier module implemented using a Deep Convolutional Neural Network, a generator module based on an adversarial autoencoder, and a novelty-detection module implemented using an OpenMax activation function. Using the EMNIST data set, the model was trained incrementally, starting with a small set of classes. The results of the simulation show that SIGANN can retain previous knowledge while incorporating gradual forgetfulness of each learning sequence at a rate of about 7% per training step. Moreover, SIGANN can detect new classes that are hidden in the data with a median accuracy of 43 % and, therefore, proceed with incremental class learning.

Download Full-text

Self-Improving Generative Artificial Neural Network for Pseudo-Rehearsal Incremental Class Learning

10.20944/preprints201907.0121.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Diego Mellado ◽

Carolina Saavedra ◽

Steren Chabert ◽

Romina Torres ◽

Rodrigo Salas

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Novelty Detection ◽

Activation Function ◽

Learning Sequence ◽

Data Set ◽

Detection Model ◽

Neural Network System ◽

Small Set ◽

Artificial Neural

Deep learning models are part of the family of artificial neural networks and, as such, it suffers of catastrophic interference when they learn sequentially. In addition, most of these models have a rigid architecture which prevents the incremental learning of new classes. To overcome these drawbacks, in this article we propose the Self-Improving Generative Artificial Neural Network (SIGANN), a type of end-to-end Deep Neural Network system which is able to ease the catastrophic forgetting problem when leaning new classes. In this method, we introduce a novelty detection model to automatically detect samples of new classes, moreover an adversarial auto-encoder is used to produce samples of previous classes. This system consists of three main modules: a classifier module implemented using a Deep Convolutional Neural Network, a generator module based on an adversarial autoencoder; and a novelty detection module, implemented using an OpenMax activation function. Using the EMNIST data set, the model was trained incrementally, starting with a small set of classes. The results of the simulation show that SIGANN is able to retain previous knowledge with a gradual forgetfulness for each learning sequence. Moreover, SIGANN can detect new classes that are hidden in the data and, therefore, proceed with incremental class learning.

Download Full-text

Detection of Drive-by Download Attacks Using Machine Learning Approach

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2017100102 ◽

2017 ◽

Vol 11 (4) ◽

pp. 16-28 ◽

Cited By ~ 8

Author(s):

Monther Aldwairi ◽

Musaab Hasan ◽

Zayed Balbahaith

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Detection Accuracy ◽

Financial Loss ◽

Data Set ◽

Detection Model ◽

Detection Systems ◽

Novel Approach ◽

Positive Rate ◽

Using Data

Drive-by download refers to attacks that automatically download malwares to user's computer without his knowledge or consent. This type of attack is accomplished by exploiting web browsers and plugins vulnerabilities. The damage may include data leakage leading to financial loss. Traditional antivirus and intrusion detection systems are not efficient against such attacks. Researchers proposed plenty of detection approaches mostly passive blacklisting. However, a few proposed dynamic classification techniques, which suffer from clear shortcomings. In this paper, we propose a novel approach to detect drive-by download infected web pages based on extracted features from their source code. We test 23 different machine learning classifiers using data set of 5435 webpages and based on the detection accuracy we selected the top five to build our detection model. The approach is expected to serve as a base for implementing and developing anti drive-by download programs. We develop a graphical user interface program to allow the end user to examine the URL before visiting the website. The Bagged Trees classifier exhibited the highest accuracy of 90.1% and reported 96.24% true positive and 26.07% false positive rate.

Download Full-text

SACN: A Novel Rotating Face Detector Based on Architecture Search

Electronics ◽

10.3390/electronics10050558 ◽

2021 ◽

Vol 10 (5) ◽

pp. 558

Author(s):

Anping Song ◽

Xiaokang Xu ◽

Xinyi Zhai

Keyword(s):

Face Detection ◽

Human Face ◽

Angle Error ◽

Rotation Invariant ◽

Convolutional Network ◽

Data Set ◽

Practical Applications ◽

Model Size ◽

Average Angle ◽

Face Detector

Rotation-Invariant Face Detection (RIPD) has been widely used in practical applications; however, the problem of the adjusting of the rotation-in-plane (RIP) angle of the human face still remains. Recently, several methods based on neural networks have been proposed to solve the RIP angle problem. However, these methods have various limitations, including low detecting speed, model size, and detecting accuracy. To solve the aforementioned problems, we propose a new network, called the Searching Architecture Calibration Network (SACN), which utilizes architecture search, fully convolutional network (FCN) and bounding box center cluster (CC). SACN was tested on the challenging Multi-Oriented Face Detection Data Set and Benchmark (MOFDDB) and achieved a higher detecting accuracy and almost the same speed as existing detectors. Moreover, the average angle error is optimized from the current 12.6° to 10.5°.

Download Full-text

Auto-sharing parameters for transfer learning based on multi-objective optimization

Integrated Computer-Aided Engineering ◽

10.3233/ica-210655 ◽

2021 ◽

pp. 1-13

Author(s):

Hailin Liu ◽

Fangqing Gu ◽

Zixian Lin

Keyword(s):

Transfer Learning ◽

Optimization Problem ◽

Data Sets ◽

Multi Objective Optimization ◽

Particle Swarm Optimizer ◽

Real World Data ◽

Data Set ◽

Target Task ◽

Main Research ◽

Multi Objective

Transfer learning methods exploit similarities between different datasets to improve the performance of the target task by transferring knowledge from source tasks to the target task. “What to transfer” is a main research issue in transfer learning. The existing transfer learning method generally needs to acquire the shared parameters by integrating human knowledge. However, in many real applications, an understanding of which parameters can be shared is unknown beforehand. Transfer learning model is essentially a special multi-objective optimization problem. Consequently, this paper proposes a novel auto-sharing parameter technique for transfer learning based on multi-objective optimization and solves the optimization problem by using a multi-swarm particle swarm optimizer. Each task objective is simultaneously optimized by a sub-swarm. The current best particle from the sub-swarm of the target task is used to guide the search of particles of the source tasks and vice versa. The target task and source task are jointly solved by sharing the information of the best particle, which works as an inductive bias. Experiments are carried out to evaluate the proposed algorithm on several synthetic data sets and two real-world data sets of a school data set and a landmine data set, which show that the proposed algorithm is effective.

Download Full-text

Empirical evaluation of feature subset selection based on a real-world data set

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2004.03.005 ◽

2004 ◽

Vol 17 (3) ◽

pp. 285-288 ◽

Cited By ~ 5

Author(s):

Petra Perner ◽

Chid Apte

Keyword(s):

Real World ◽

Empirical Evaluation ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Real World Data ◽

Data Set ◽

World Data

Download Full-text

Effects of the domestic thyroid stimulating hormone receptor (TSHR) variant on the hypothalamic-pituitary-thyroid axis and behavior in chicken

Genetics ◽

10.1093/genetics/iyaa050 ◽

2021 ◽

Vol 217 (1) ◽

Author(s):

Amir Fallahshahroudi ◽

Martin Johnsson ◽

Enrico Sorato ◽

S J Kumari A Ubhayasekera ◽

Jonas Bergquist ◽

...

Keyword(s):

Gene Expression ◽

Hormone Receptor ◽

Thyroid Stimulating Hormone ◽

Phenotypic Traits ◽

Wild Type ◽

Data Set ◽

Red Junglefowl ◽

Pituitary Thyroid Axis ◽

And Behavior ◽

Stimulating Hormone

Abstract Domestic chickens are less fearful, have a faster sexual development, grow bigger, and lay more eggs than their primary ancestor, the red junglefowl. Several candidate genetic variants selected during domestication have been identified, but only a few studies have directly linked them with distinct phenotypic traits. Notably, a variant of the thyroid stimulating hormone receptor (TSHR) gene has been under strong positive selection over the past millennium, but it’s function and mechanisms of action are still largely unresolved. We therefore assessed the abundance of the domestic TSHR variant and possible genomic selection signatures in an extensive data set comprising multiple commercial and village chicken populations as well as wild-living extant members of the genus Gallus. Furthermore, by mean of extensive backcrossing we introgressed the wild-type TSHR variant from red junglefowl into domestic White Leghorn chickens and investigated gene expression, hormone levels, cold adaptation, and behavior in chickens possessing either the wild-type or domestic TSHR variant. While the domestic TSHR was the most common variant in all studied domestic populations and in one of two red junglefowl population, it was not detected in the other Gallus species. Functionally, the individuals with the domestic TSHR variant had a lower expression of the TSHR in the hypothalamus and marginally higher in the thyroid gland than wild-type TSHR individuals. Expression of TSHB and DIO2, two regulators of sexual maturity and reproduction in birds, was higher in the pituitary gland of the domestic-variant chickens. Furthermore, the domestic variant was associated with higher activity in the open field test. Our findings confirm that the spread of the domestic TSHR variant is limited to domesticated chickens, and to a lesser extent, their wild counterpart, the red junglefowl. Furthermore, we showed that effects of genetic variability in TSHR mirror key differences in gene expression and behavior previously described between the red junglefowl and domestic chicken.

Download Full-text

Time-ResNeXt for epilepsy recognition based on EEG signals in wireless networks

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-020-01810-5 ◽

2020 ◽

Vol 2020 (1) ◽

Cited By ~ 1

Author(s):

Shaoqiang Wang ◽

Shudong Wang ◽

Song Zhang ◽

Yifan Wang

Keyword(s):

Deep Learning ◽

Network Structure ◽

Signal Recognition ◽

Eeg Signals ◽

Real World Data ◽

Data Set ◽

Practical Applications ◽

Epilepsy Diagnosis ◽

Single Data ◽

Electroencephalogram Eeg

Abstract To automatically detect dynamic EEG signals to reduce the time cost of epilepsy diagnosis. In the signal recognition of electroencephalogram (EEG) of epilepsy, traditional machine learning and statistical methods require manual feature labeling engineering in order to show excellent results on a single data set. And the artificially selected features may carry a bias, and cannot guarantee the validity and expansibility in real-world data. In practical applications, deep learning methods can release people from feature engineering to a certain extent. As long as the focus is on the expansion of data quality and quantity, the algorithm model can learn automatically to get better improvements. In addition, the deep learning method can also extract many features that are difficult for humans to perceive, thereby making the algorithm more robust. Based on the design idea of ResNeXt deep neural network, this paper designs a Time-ResNeXt network structure suitable for time series EEG epilepsy detection to identify EEG signals. The accuracy rate of Time-ResNeXt in the detection of EEG epilepsy can reach 91.50%. The Time-ResNeXt network structure produces extremely advanced performance on the benchmark dataset (Berne-Barcelona dataset) and has great potential for improving clinical practice.

Download Full-text

Construction of a Genetic Linkage Map in Tetraploid Species Using Molecular Markers

Genetics ◽

10.1093/genetics/157.3.1369 ◽

2001 ◽

Vol 157 (3) ◽

pp. 1369-1385 ◽

Cited By ~ 2

Author(s):

Z W Luo ◽

C A Hackett ◽

J E Bradshaw ◽

J W McNicol ◽

D Milbourne

Keyword(s):

Molecular Markers ◽

Linkage Map ◽

Recombination Frequency ◽

Simulated Data ◽

Likelihood Estimation ◽

Lod Score ◽

Data Set ◽

Independent Segregation ◽

The Em Algorithm ◽

Small Set

Abstract This article presents methodology for the construction of a linkage map in an autotetraploid species, using either codominant or dominant molecular markers scored on two parents and their full-sib progeny. The steps of the analysis are as follows: identification of parental genotypes from the parental and offspring phenotypes; testing for independent segregation of markers; partition of markers into linkage groups using cluster analysis; maximum-likelihood estimation of the phase, recombination frequency, and LOD score for all pairs of markers in the same linkage group using the EM algorithm; ordering the markers and estimating distances between them; and reconstructing their linkage phases. The information from different marker configurations about the recombination frequency is examined and found to vary considerably, depending on the number of different alleles, the number of alleles shared by the parents, and the phase of the markers. The methods are applied to a simulated data set and to a small set of SSR and AFLP markers scored in a full-sib population of tetraploid potato.

Download Full-text

Rapid video assessment for monitoring testing facility fraud

International Journal of Quality & Reliability Management ◽

10.1108/ijqrm-01-2017-0022 ◽

2018 ◽

Vol 35 (8) ◽

pp. 1508-1518

Author(s):

Rosembergue Pereira Souza ◽

Luiz Fernando Rust da Costa Carmo ◽

Luci Pirmez

Keyword(s):

Processing Time ◽

Threshold Value ◽

Real World Data ◽

Data Set ◽

Content Type ◽

Video Assessment ◽

Technical Requirements ◽

Testing Facility ◽

Rapid Processing ◽

Temporal Differencing

Purpose The purpose of this paper is to present a procedure for finding unusual patterns in accredited tests using a rapid processing method for analyzing video records. The procedure uses the temporal differencing technique for object tracking and considers only frames not identified as statistically redundant. Design/methodology/approach An accreditation organization is responsible for accrediting facilities to undertake testing and calibration activities. Periodically, such organizations evaluate accredited testing facilities. These evaluations could use video records and photographs of the tests performed by the facility to judge their conformity to technical requirements. To validate the proposed procedure, a real-world data set with video records from accredited testing facilities in the field of vehicle safety in Brazil was used. The processing time of this proposed procedure was compared with the time needed to process the video records in a traditional fashion. Findings With an appropriate threshold value, the proposed procedure could successfully identify video records of fraudulent services. Processing time was faster than when a traditional method was employed. Originality/value Manually evaluating video records is time consuming and tedious. This paper proposes a procedure to rapidly find unusual patterns in videos of accredited tests with a minimum of manual effort.

Download Full-text