Feature Space Targeted Attacks by Statistic Alignment

By adding human-imperceptible perturbations to images, DNNs can be easily fooled. As one of the mainstream methods, feature space targeted attacks perturb images by modulating their intermediate feature maps, for the discrepancy between the intermediate source and target features is minimized. However, the current choice of pixel-wise Euclidean Distance to measure the discrepancy is questionable because it unreasonably imposes a spatial-consistency constraint on the source and target features. Intuitively, an image can be categorized as "cat'' no matter the cat is on the left or right of the image. To address this issue, we propose to measure this discrepancy using statistic alignment. Specifically, we design two novel approaches called Pair-wise Alignment Attack and Global-wise Alignment Attack, which attempt to measure similarities between feature maps by high-order statistics with translation invariance. Furthermore, we systematically analyze the layer-wise transferability with varied difficulties to obtain highly reliable attacks. Extensive experiments verify the effectiveness of our proposed method, and it outperforms the state-of-the-art algorithms by a large margin. Our code is publicly available at https://github.com/yaya-cheng/PAA-GAA.

Download Full-text

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6952 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12613-12620 ◽

Cited By ~ 3

Author(s):

Jihan Yang ◽

Ruijia Xu ◽

Ruiyu Li ◽

Xiaojuan Qi ◽

Xiaoyong Shen ◽

...

Keyword(s):

Marginal Distribution ◽

Domain Adaptation ◽

State Of The Art ◽

Feature Space ◽

Semantic Segmentation ◽

The State ◽

Object Size ◽

Feature Maps ◽

Feature Representations ◽

Unsupervised Domain Adaptation

We focus on Unsupervised Domain Adaptation (UDA) for the task of semantic segmentation. Recently, adversarial alignment has been widely adopted to match the marginal distribution of feature representations across two domains globally. However, this strategy fails in adapting the representations of the tail classes or small objects for semantic segmentation since the alignment objective is dominated by head categories or large objects. In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations. Specifically, we firstly perturb the intermediate feature maps with several attack objectives (i.e., discriminator and classifier) on each individual position for both domains, and then the classifier is trained to be invariant to the perturbations. By perturbing each position individually, our model treats each location evenly regardless of the category or object size and thus circumvents the aforementioned issue. Moreover, the domain gap in feature space is reduced by extrapolating source and target perturbed features towards each other with attack on the domain discriminator. Our approach achieves the state-of-the-art performance on two challenging domain adaptation tasks for semantic segmentation: GTA5 → Cityscapes and SYNTHIA → Cityscapes.

Download Full-text

Ensemble-Based Out-of-Distribution Detection

Electronics ◽

10.3390/electronics10050567 ◽

2021 ◽

Vol 10 (5) ◽

pp. 567

Author(s):

Donghun Yang ◽

Kien Mai Mai Ngoc ◽

Iksoo Shin ◽

Kyong-Ha Lee ◽

Myunggwon Hwang

Keyword(s):

Detection Method ◽

State Of The Art ◽

Metric Learning ◽

Feature Space ◽

Confidence Score ◽

Distance Metric Learning ◽

Current State ◽

Overall Performance ◽

Deep Learning Model

To design an efficient deep learning model that can be used in the real-world, it is important to detect out-of-distribution (OOD) data well. Various studies have been conducted to solve the OOD problem. The current state-of-the-art approach uses a confidence score based on the Mahalanobis distance in a feature space. Although it outperformed the previous approaches, the results were sensitive to the quality of the trained model and the dataset complexity. Herein, we propose a novel OOD detection method that can train more efficient feature space for OOD detection. The proposed method uses an ensemble of the features trained using the softmax-based classifier and the network based on distance metric learning (DML). Through the complementary interaction of these two networks, the trained feature space has a more clumped distribution and can fit well on the Gaussian distribution by class. Therefore, OOD data can be efficiently detected by setting a threshold in the trained feature space. To evaluate the proposed method, we applied our method to various combinations of image datasets. The results show that the overall performance of the proposed approach is superior to those of other methods, including the state-of-the-art approach, on any combination of datasets.

Download Full-text

Hybrid Graph Neural Networks for Crowd Counting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6839 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11693-11700 ◽

Cited By ~ 2

Author(s):

Ao Luo ◽

Fan Yang ◽

Xin Li ◽

Dong Nie ◽

Zhicheng Jiao ◽

...

Keyword(s):

Network Architecture ◽

Message Passing ◽

Large Scale ◽

State Of The Art ◽

Density Variation ◽

Feature Maps ◽

Crowd Counting ◽

Multi Scale ◽

Crowd Density ◽

Graph Neural Networks

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.

Download Full-text

Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3460235 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Jianhai Zhang ◽

Zhiyong Feng ◽

Yong Su ◽

Meng Xing

Keyword(s):

Action Recognition ◽

Temporal Order ◽

State Of The Art ◽

Feature Representation ◽

Independent Action ◽

Sufficient Statistics ◽

Informative Prior ◽

Global Action ◽

High Order Statistics ◽

Order Structures

For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two major problems of covariance include the following: (1) it is prone to be singular so that actions fail to be represented properly, and (2) it is short of global action/pose-aware information so that expressive and discriminative power is limited. In this article, we propose a novel Bayesian covariance representation by a prior regularization method to solve the preceding problems. Specifically, covariance is viewed as a parametric maximum likelihood estimate of Gaussian distribution over local poses from an independent action. Then, a Global Informative Prior (GIP) is generated over global poses with sufficient statistics to regularize covariance. In this way, (1) singularity is greatly relieved due to sufficient statistics, (2) global pose information of GIP makes Bayesian covariance theoretically equivalent to a saliency weighting covariance over global action poses so that discriminative characteristics of actions can be represented more clearly. Experimental results show that our Bayesian covariance with GIP efficiently improves the performance of action recognition. In some databases, it outperforms the state-of-the-art variant methods that are based on kernels, temporal-order structures, and saliency weighting attentions, among others.

Download Full-text

PedNet: A Spatio-Temporal Deep Convolutional Neural Network for Pedestrian Segmentation

Journal of Imaging ◽

10.3390/jimaging4090107 ◽

2018 ◽

Vol 4 (9) ◽

pp. 107 ◽

Cited By ~ 5

Author(s):

Mohib Ullah ◽

Ahmed Mohammed ◽

Faouzi Alaya Cheikh

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Performance Metrics ◽

State Of The Art ◽

Temporal Information ◽

Feature Maps ◽

Current Frame ◽

Low Level ◽

Art Methods ◽

Spatio Temporal

Articulation modeling, feature extraction, and classification are the important components of pedestrian segmentation. Usually, these components are modeled independently from each other and then combined in a sequential way. However, this approach is prone to poor segmentation if any individual component is weakly designed. To cope with this problem, we proposed a spatio-temporal convolutional neural network named PedNet which exploits temporal information for spatial segmentation. The backbone of the PedNet consists of an encoder–decoder network for downsampling and upsampling the feature maps, respectively. The input to the network is a set of three frames and the output is a binary mask of the segmented regions in the middle frame. Irrespective of classical deep models where the convolution layers are followed by a fully connected layer for classification, PedNet is a Fully Convolutional Network (FCN). It is trained end-to-end and the segmentation is achieved without the need of any pre- or post-processing. The main characteristic of PedNet is its unique design where it performs segmentation on a frame-by-frame basis but it uses the temporal information from the previous and the future frame for segmenting the pedestrian in the current frame. Moreover, to combine the low-level features with the high-level semantic information learned by the deeper layers, we used long-skip connections from the encoder to decoder network and concatenate the output of low-level layers with the higher level layers. This approach helps to get segmentation map with sharp boundaries. To show the potential benefits of temporal information, we also visualized different layers of the network. The visualization showed that the network learned different information from the consecutive frames and then combined the information optimally to segment the middle frame. We evaluated our approach on eight challenging datasets where humans are involved in different activities with severe articulation (football, road crossing, surveillance). The most common CamVid dataset which is used for calculating the performance of the segmentation algorithm is evaluated against seven state-of-the-art methods. The performance is shown on precision/recall, F 1 , F 2 , and mIoU. The qualitative and quantitative results show that PedNet achieves promising results against state-of-the-art methods with substantial improvement in terms of all the performance metrics.

Download Full-text

Multipath Lightweight Deep Network Using Randomly Selected Dilated Convolution

Sensors ◽

10.3390/s21237862 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7862

Author(s):

Sangun Park ◽

Dong Eui Chang

Keyword(s):

State Of The Art ◽

Robot Vision ◽

Research Field ◽

Machine Learning Algorithms ◽

Classification Error ◽

Feature Maps ◽

Deep Network ◽

Dilated Convolution ◽

Input Feature ◽

Multipath Networks

Robot vision is an essential research field that enables machines to perform various tasks by classifying/detecting/segmenting objects as humans do. The classification accuracy of machine learning algorithms already exceeds that of a well-trained human, and the results are rather saturated. Hence, in recent years, many studies have been conducted in the direction of reducing the weight of the model and applying it to mobile devices. For this purpose, we propose a multipath lightweight deep network using randomly selected dilated convolutions. The proposed network consists of two sets of multipath networks (minimum 2, maximum 8), where the output feature maps of one path are concatenated with the input feature maps of the other path so that the features are reusable and abundant. We also replace the 3×3 standard convolution of each path with a randomly selected dilated convolution, which has the effect of increasing the receptive field. The proposed network lowers the number of floating point operations (FLOPs) and parameters by more than 50% and the classification error by 0.8% as compared to the state-of-the-art. We show that the proposed network is efficient.

Download Full-text

Split to Be Slim: An Overlooked Redundancy in Vanilla Convolution

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/442 ◽

2020 ◽

Cited By ~ 1

Author(s):

Qiulin Zhang ◽

Zhuqing Jiang ◽

Qishuo Lu ◽

Jia'nan Han ◽

Zhengxin Zeng ◽

...

Keyword(s):

Feature Fusion ◽

State Of The Art ◽

Experimental Results ◽

Light Weight ◽

Feature Maps ◽

Plug And Play ◽

Input Feature

Many effective solutions have been proposed to reduce the redundancy of models for inference acceleration. Nevertheless, common approaches mostly focus on eliminating less important filters or constructing efficient operations, while ignoring the pattern redundancy in feature maps. We reveal that many feature maps within a layer share similar but not identical patterns. However, it is difficult to identify if features with similar patterns are redundant or contain essential details. Therefore, instead of directly removing uncertain redundant features, we propose a split based convolutional operation, namely SPConv, to tolerate features with similar patterns but require less computation. Specifically, we split input feature maps into the representative part and the uncertain redundant part, where intrinsic information is extracted from the representative part through relatively heavy computation while tiny hidden details in the uncertain redundant part are processed with some light-weight operation. To recalibrate and fuse these two groups of processed features, we propose a parameters-free feature fusion module. Moreover, our SPConv is formulated to replace the vanilla convolution in a plug-and-play way. Without any bells and whistles, experimental results on benchmarks demonstrate SPConv-equipped networks consistently outperform state-of-the-art baselines in both accuracy and inference time on GPU, with FLOPs and parameters dropped sharply.

Download Full-text

Computational Methods and Tools for Decision Support in Biomedicine

Handbook of Research on Advanced Techniques in Diagnostic Imaging and Biomedical Applications ◽

10.4018/978-1-60566-314-2.ch001 ◽

2009 ◽

pp. 1-17

Author(s):

Ioannis Dimou ◽

Michalis Zervakis ◽

David Lowe ◽

Manolis Tsiknakis

Keyword(s):

Decision Making ◽

Decision Support ◽

Computational Methods ◽

Pattern Analysis ◽

State Of The Art ◽

Biomedical Informatics ◽

Feature Space ◽

Diagnostic Tools ◽

Methodological Issues ◽

Medical Problems

The automation of diagnostic tools and the increasing availability of extensive medical datasets in the last decade have triggered the development of new analytical methodologies in the context of biomedical informatics. The aim is always to explore a problem’s feature space, extract useful information and support clinicians in their time, volume, and accuracy demanding decision making tasks. From simple summarizing statistics to state-of-the-art pattern analysis algorithms, the underlying principles that drive most medical problems show trends that can be identified and taken into account to improve the usefulness of computerized medicine to the field-clinicians and ultimately to the patient. This chapter presents a thorough review of this field and highlights the achievements and shortcomings of each family of methods. The authors’ effort has been focused on methodological issues as to generalize useful conclusions based on the large number of notable, yet case-specific developments presented in the field.

Download Full-text

On-The-Fly Syntheziser Programming with Fuzzy Rule Learning

Entropy ◽

10.3390/e22090969 ◽

2020 ◽

Vol 22 (9) ◽

pp. 969

Author(s):

Iván Paz ◽

Àngela Nebot ◽

Francisco Mugica ◽

Enrique Romero

Keyword(s):

Real Time ◽

Cross Validation ◽

State Of The Art ◽

Rule Learning ◽

Fuzzy Rule ◽

Feature Space ◽

Maximum Volume ◽

Time Variations ◽

Fuzzy Rule Learning ◽

Inductive Rule Learning

This manuscript explores fuzzy rule learning for sound synthesizer programming within the performative practice known as live coding. In this practice, sound synthesis algorithms are programmed in real time by means of source code. To facilitate this, one possibility is to automatically create variations out of a few synthesizer presets. However, the need for real-time feedback makes existent synthesizer programmers unfeasible to use. In addition, sometimes presets are created mid-performance and as such no benchmarks exist. Inductive rule learning has shown to be effective for creating real-time variations in such a scenario. However, logical IF-THEN rules do not cover the whole feature space. Here, we present an algorithm that extends IF-THEN rules to hyperrectangles, which are used as the cores of membership functions to create a map of the input space. To generalize the rules, the contradictions are solved by a maximum volume heuristics. The user controls the novelty-consistency balance with respect to the input data using the algorithm parameters. The algorithm was evaluated in live performances and by cross-validation using extrinsic-benchmarks and a dataset collected during user tests. The model’s accuracy achieves state-of-the-art results. This, together with the positive criticism received from live coders that tested our methodology, suggests that this is a promising approach.

Download Full-text

Revisiting 3D chromatin architecture in cancer development and progression

Nucleic Acids Research ◽

10.1093/nar/gkaa747 ◽

2020 ◽

Vol 48 (19) ◽

pp. 10632-10647

Author(s):

Yuliang Feng ◽

Siim Pauklin

Keyword(s):

Phase Separation ◽

Cancer Progression ◽

Genome Organization ◽

State Of The Art ◽

Human Cancer ◽

Cancer Development ◽

Transcriptional Dysregulation ◽

Chromatin Architecture ◽

Novel Approaches ◽

Promoter Interaction

Abstract Cancer development and progression are demarcated by transcriptional dysregulation, which is largely attributed to aberrant chromatin architecture. Recent transformative technologies have enabled researchers to examine the genome organization at an unprecedented dimension and precision. In particular, increasing evidence supports the essential roles of 3D chromatin architecture in transcriptional homeostasis and proposes its alterations as prominent causes of human cancer. In this article, we will discuss the recent findings on enhancers, enhancer–promoter interaction, chromatin topology, phase separation and explore their potential mechanisms in shaping transcriptional dysregulation in cancer progression. In addition, we will propose our views on how to employ state-of-the-art technologies to decode the unanswered questions in this field. Overall, this article motivates the study of 3D chromatin architecture in cancer, which allows for a better understanding of its pathogenesis and develop novel approaches for diagnosis and treatment of cancer.

Download Full-text