RDFuzz: Accelerating Directed Fuzzing with Intertwined Schedule and Optimized Mutation

Mathematical Problems in Engineering ◽

10.1155/2020/7698916 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Jiaxi Ye ◽

Ruilin Li ◽

Bin Zhang

Keyword(s):

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Exploration And Exploitation ◽

Balance Problem ◽

Evaluation Strategy ◽

Testing Schedule ◽

Available Resources

Directed fuzzing is a practical technique, which concentrates its testing energy on the process toward the target code areas, while costing little on other unconcerned components. It is a promising way to make better use of available resources, especially in testing large-scale programs. However, by observing the state-of-the-art-directed fuzzing engine (AFLGo), we argue that there are two universal limitations, the balance problem between the exploration and the exploitation and the blindness in mutation toward the target code areas. In this paper, we present a new prototype RDFuzz to address these two limitations. In RDFuzz, we first introduce the frequency-guided strategy in the exploration and improve its accuracy by adopting the branch-level instead of the path-level frequency. Then, we introduce the input-distance-based evaluation strategy in the exploitation stage and present an optimized mutation to distinguish and protect the distance sensitive input content. Moreover, an intertwined testing schedule is leveraged to perform the exploration and exploitation in turn. We test RDFuzz on 7 benchmarks, and the experimental results demonstrate that RDFuzz is skilled at driving the program toward the target code areas, and it is not easily stuck by the balance problem of the exploration and the exploitation.

Get full-text (via PubEx)

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Get full-text (via PubEx)

Fast Vehicle Identification in Surveillance via Ranked Semantic Sampling Based Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/514 ◽

2018 ◽

Cited By ~ 5

Author(s):

Feng Zheng ◽

Xin Miao ◽

Heng Huang

Keyword(s):

Random Sampling ◽

Large Scale ◽

Hamming Distance ◽

State Of The Art ◽

Semantic Distance ◽

The State ◽

Experimental Results ◽

Traffic Surveillance ◽

Vehicle Identification ◽

Hard Samples

Identifying vehicles across cameras in traffic surveillance is fundamentally important for public safety purposes. However, despite some preliminary work, the rapid vehicle search in large-scale datasets has not been investigated. Moreover, modelling a view-invariant similarity between vehicle images from different views is still highly challenging. To address the problems, in this paper, we propose a Ranked Semantic Sampling (RSS) guided binary embedding method for fast cross-view vehicle Re-IDentification (Re-ID). The search can be conducted by efficiently computing similarities in the projected space. Unlike previous methods using random sampling, we design tree-structured attributes to guide the mini-batch sampling. The ranked pairs of hard samples in the mini-batch can improve the convergence of optimization. By minimizing a novel ranked semantic distance loss defined according to the structure, the learned Hamming distance is view-invariant, which enables cross-view Re-ID. The experimental results demonstrate that RSS outperforms the state-of-the-art approaches and the learned embedding from one dataset can be transferred to achieve the task of vehicle Re-ID on another dataset.

Get full-text (via PubEx)

Evaluation of recent advances in recommender systems on Arabic content

Journal Of Big Data ◽

10.1186/s40537-021-00420-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mehdi Srifi ◽

Ahmed Oussous ◽

Ayoub Ait Lahcen ◽

Salma Mouline

Keyword(s):

Recommender Systems ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Recent Advances ◽

Research Gap ◽

Text Preprocessing

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.

Get full-text (via PubEx)

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Get full-text (via PubEx)

A Multilayer CARU Framework to Obtain Probability Distribution for Paragraph-Based Sentiment Analysis

Applied Sciences ◽

10.3390/app112311344 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11344

Author(s):

Wei Ke ◽

Ka-Hou Chan

Keyword(s):

Probability Distribution ◽

Information Extraction ◽

Sentiment Analysis ◽

State Of The Art ◽

Final Analysis ◽

The State ◽

Experimental Results ◽

Content Adaptive

Paragraph-based datasets are hard to analyze by a simple RNN, because a long sequence always contains lengthy problems of long-term dependencies. In this work, we propose a Multilayer Content-Adaptive Recurrent Unit (CARU) network for paragraph information extraction. In addition, we present a type of CNN-based model as an extractor to explore and capture useful features in the hidden state, which represent the content of the entire paragraph. In particular, we introduce the Chebyshev pooling to connect to the end of the CNN-based extractor instead of using the maximum pooling. This can project the features into a probability distribution so as to provide an interpretable evaluation for the final analysis. Experimental results demonstrate the superiority of the proposed approach, being compared to the state-of-the-art models.

Get full-text (via PubEx)

A Boosting Framework of Factorization Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421590369 ◽

2021 ◽

pp. 2159036

Author(s):

Jun Zhou ◽

Longfei Li ◽

Ziqi Liu ◽

Chaochao Chen

Keyword(s):

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Low Rank ◽

Inner Product ◽

Adaptive Boosting ◽

Rank Matrix ◽

Factorization Machine ◽

Fixed Rank ◽

Low Rank Matrix

Recently, Factorization Machine (FM) has become more and more popular for recommendation systems due to its effectiveness in finding informative interactions between features. Usually, the weights for the interactions are learned as a low rank weight matrix, which is formulated as an inner product of two low rank matrices. This low rank matrix can help improve the generalization ability of Factorization Machine. However, to choose the rank properly, it usually needs to run the algorithm for many times using different ranks, which clearly is inefficient for some large-scale datasets. To alleviate this issue, we propose an Adaptive Boosting framework of Factorization Machine (AdaFM), which can adaptively search for proper ranks for different datasets without re-training. Instead of using a fixed rank for FM, the proposed algorithm will gradually increase its rank according to its performance until the performance does not grow. Extensive experiments are conducted to validate the proposed method on multiple large-scale datasets. The experimental results demonstrate that the proposed method can be more effective than the state-of-the-art Factorization Machines.

Get full-text (via PubEx)

A Comprehensive Taxonomy of Dynamic Texture Representation

ACM Computing Surveys ◽

10.1145/3487892 ◽

2023 ◽

Vol 55 (1) ◽

pp. 1-39

Author(s):

Thanh Tuan Nguyen ◽

Thanh Phuong Nguyen

Keyword(s):

Large Scale ◽

Environmental Changes ◽

State Of The Art ◽

The State ◽

Future Research ◽

Research Activities ◽

Potential Applications ◽

Benchmark Datasets ◽

Negative Impacts ◽

Made In

Representing dynamic textures (DTs) plays an important role in many real implementations in the computer vision community. Due to the turbulent and non-directional motions of DTs along with the negative impacts of different factors (e.g., environmental changes, noise, illumination, etc.), efficiently analyzing DTs has raised considerable challenges for the state-of-the-art approaches. For 20 years, many different techniques have been introduced to handle the above well-known issues for enhancing the performance. Those methods have shown valuable contributions, but the problems have been incompletely dealt with, particularly recognizing DTs on large-scale datasets. In this article, we present a comprehensive taxonomy of DT representation in order to purposefully give a thorough overview of the existing methods along with overall evaluations of their obtained performances. Accordingly, we arrange the methods into six canonical categories. Each of them is then taken in a brief presentation of its principal methodology stream and various related variants. The effectiveness levels of the state-of-the-art methods are then investigated and thoroughly discussed with respect to quantitative and qualitative evaluations in classifying DTs on benchmark datasets. Finally, we point out several potential applications and the remaining challenges that should be addressed in further directions. In comparison with two existing shallow DT surveys (i.e., the first one is out of date as it was made in 2005, while the newer one (published in 2016) is an inadequate overview), we believe that our proposed comprehensive taxonomy not only provides a better view of DT representation for the target readers but also stimulates future research activities.

Get full-text (via PubEx)

Precise No-Reference Image Quality Evaluation Based on Distortion Identification

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3468872 ◽

2021 ◽

Vol 17 (3s) ◽

pp. 1-21

Author(s):

Chenggang Yan ◽

Tong Teng ◽

Yutao Liu ◽

Yongbing Zhang ◽

Haoqian Wang ◽

...

Keyword(s):

Neural Network ◽

Image Quality ◽

Quality Assessment ◽

Large Scale ◽

Quality Evaluation ◽

Image Quality Assessment ◽

State Of The Art ◽

Gaussian White Noise ◽

The State ◽

Reference Image

The difficulty of no-reference image quality assessment (NR IQA) often lies in the lack of knowledge about the distortion in the image, which makes quality assessment blind and thus inefficient. To tackle such issue, in this article, we propose a novel scheme for precise NR IQA, which includes two successive steps, i.e., distortion identification and targeted quality evaluation. In the first step, we employ the well-known Inception-ResNet-v2 neural network to train a classifier that classifies the possible distortion in the image into the four most common distortion types, i.e., Gaussian white noise (WN), Gaussian blur (GB), jpeg compression (JPEG), and jpeg2000 compression (JP2K). Specifically, the deep neural network is trained on the large-scale Waterloo Exploration database, which ensures the robustness and high performance of distortion classification. In the second step, after determining the distortion type of the image, we then design a specific approach to quantify the image distortion level, which can estimate the image quality specially and more precisely. Extensive experiments performed on LIVE, TID2013, CSIQ, and Waterloo Exploration databases demonstrate that (1) the accuracy of our distortion classification is higher than that of the state-of-the-art distortion classification methods, and (2) the proposed NR IQA method outperforms the state-of-the-art NR IQA methods in quantifying the image quality.

Get full-text (via PubEx)

Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301176 ◽

2019 ◽

Vol 33 ◽

pp. 176-183 ◽

Cited By ~ 11

Author(s):

Chao Li ◽

Cheng Deng ◽

Lei Wang ◽

De Xie ◽

Xianglong Liu

Keyword(s):

Large Scale ◽

State Of The Art ◽

The State ◽

Storage Cost ◽

Common Representation ◽

Benchmark Datasets ◽

Query Efficiency ◽

Hash Codes

In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.

Get full-text (via PubEx)

Uniformity Attentive Learning-Based Siamese Network for Person Re-Identification

Sensors ◽

10.3390/s20123603 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3603

Author(s):

Dasol Jeong ◽

Hasil Park ◽

Joongchol Shin ◽

Donggoo Kang ◽

Joonki Paik

Keyword(s):

Large Scale ◽

Body Shape ◽

State Of The Art ◽

The State ◽

Whole Body ◽

Distinctive Features ◽

Common Features ◽

Siamese Network ◽

Art Methods ◽

Triplet Loss

Person re-identification (Re-ID) has a problem that makes learning difficult such as misalignment and occlusion. To solve these problems, it is important to focus on robust features in intra-class variation. Existing attention-based Re-ID methods focus only on common features without considering distinctive features. In this paper, we present a novel attentive learning-based Siamese network for person Re-ID. Unlike existing methods, we designed an attention module and attention loss using the properties of the Siamese network to concentrate attention on common and distinctive features. The attention module consists of channel attention to select important channels and encoder-decoder attention to observe the whole body shape. We modified the triplet loss into an attention loss, called uniformity loss. The uniformity loss generates a unique attention map, which focuses on both common and discriminative features. Extensive experiments show that the proposed network compares favorably to the state-of-the-art methods on three large-scale benchmarks including Market-1501, CUHK03 and DukeMTMC-ReID datasets.

Get full-text (via PubEx)