scholarly journals Semantic Locality-Aware Deformable Network for Clothing Segmentation

Author(s):  
Wei Ji ◽  
Xi Li ◽  
Yueting Zhuang ◽  
Omar El Farouk Bourahla ◽  
Yixin Ji ◽  
...  

Clothing segmentation is a challenging vision problem typically implemented within a fine-grained semantic segmentation framework. Different from conventional segmentation, clothing segmentation has some domain-specific properties such as texture richness, diverse appearance variations, non-rigid geometry deformations, and small sample learning. To deal with these points, we propose a semantic locality-aware segmentation model, which adaptively attaches an original clothing image with a semantically similar (e.g., appearance or pose) auxiliary exemplar by search. Through considering the interactions of the clothing image and its exemplar, more intrinsic knowledge about the locality manifold structures of clothing images is discovered to make the learning process of small sample problem more stable and tractable. Furthermore, we present a CNN model based on the deformable convolutions to extract the non-rigid geometry-aware features for clothing images. Experimental results demonstrate the effectiveness of the proposed model against the state-of-the-art approaches.

2020 ◽  
pp. 1-24
Author(s):  
Dequan Jin ◽  
Ziyan Qin ◽  
Murong Yang ◽  
Penghe Chen

We propose a novel neural model with lateral interaction for learning tasks. The model consists of two functional fields: an elementary field to extract features and a high-level field to store and recognize patterns. Each field is composed of some neurons with lateral interaction, and the neurons in different fields are connected by the rules of synaptic plasticity. The model is established on the current research of cognition and neuroscience, making it more transparent and biologically explainable. Our proposed model is applied to data classification and clustering. The corresponding algorithms share similar processes without requiring any parameter tuning and optimization processes. Numerical experiments validate that the proposed model is feasible in different learning tasks and superior to some state-of-the-art methods, especially in small sample learning, one-shot learning, and clustering.


2020 ◽  
Author(s):  
lin cao ◽  
xibao huo ◽  
yanan guo ◽  
yuying shao ◽  
kangning du

Abstract Face photo-sketch recognition refers to the process of matching sketches to photos. Recently, there has been a growing interest in using a convolutional neural network to learn discriminatively deep features. However, due to the large domain discrepancy and the high cost of acquiring sketches, the discriminative power of the deeply learned features will be inevitably reduced. In this paper, we propose a discriminative center loss to learn domain invariant features for face photo-sketch recognition. Specifically, two Mahalanobis distance matrices are proposed to enhance the intra-class compactness during inter-class separability. Moreover, a regularization technique is adopted on the Mahalanobis matrices to alleviate the small sample problem. Extensive experimental results on the e-PRIP dataset verified the effectiveness of the proposed discriminative center loss.


Author(s):  
Yunhui Guo ◽  
Yandong Li ◽  
Liqiang Wang ◽  
Tajana Rosing

There is a growing interest in designing models that can deal with images from different visual domains. If there exists a universal structure in different visual domains that can be captured via a common parameterization, then we can use a single model for all domains rather than one model per domain. A model aware of the relationships between different domains can also be trained to work on new domains with less resources. However, to identify the reusable structure in a model is not easy. In this paper, we propose a multi-domain learning architecture based on depthwise separable convolution. The proposed approach is based on the assumption that images from different domains share cross-channel correlations but have domain-specific spatial correlations. The proposed model is compact and has minimal overhead when being applied to new domains. Additionally, we introduce a gating mechanism to promote soft sharing between different domains. We evaluate our approach on Visual Decathlon Challenge, a benchmark for testing the ability of multi-domain models. The experiments show that our approach can achieve the highest score while only requiring 50% of the parameters compared with the state-of-the-art approaches.


Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 503 ◽  
Author(s):  
Nicola Altini ◽  
Giacomo Donato Cascarano ◽  
Antonio Brunetti ◽  
Francescomaria Marino ◽  
Maria Teresa Rocchetti ◽  
...  

The evaluation of kidney biopsies performed by expert pathologists is a crucial process for assessing if a kidney is eligible for transplantation. In this evaluation process, an important step consists of the quantification of global glomerulosclerosis, which is the ratio between sclerotic glomeruli and the overall number of glomeruli. Since there is a shortage of organs available for transplantation, a quick and accurate assessment of global glomerulosclerosis is essential for retaining the largest number of eligible kidneys. In the present paper, the authors introduce a Computer-Aided Diagnosis (CAD) system to assess global glomerulosclerosis. The proposed tool is based on Convolutional Neural Networks (CNNs). In particular, the authors considered approaches based on Semantic Segmentation networks, such as SegNet and DeepLab v3+. The dataset has been provided by the Department of Emergency and Organ Transplantations (DETO) of Bari University Hospital, and it is composed of 26 kidney biopsies coming from 19 donors. The dataset contains 2344 non-sclerotic glomeruli and 428 sclerotic glomeruli. The proposed model consents to achieve promising results in the task of automatically detecting and classifying glomeruli, thus easing the burden of pathologists. We get high performance both at pixel-level, achieving mean F-score higher than 0.81, and Weighted Intersection over Union (IoU) higher than 0.97 for both SegNet and Deeplab v3+ approaches, and at object detection level, achieving 0.924 as best F-score for non-sclerotic glomeruli and 0.730 as best F-score for sclerotic glomeruli.


2021 ◽  
Vol 13 (12) ◽  
pp. 2268
Author(s):  
Hang Gong ◽  
Qiuxia Li ◽  
Chunlai Li ◽  
Haishan Dai ◽  
Zhiping He ◽  
...  

Hyperspectral images are widely used for classification due to its rich spectral information along with spatial information. To process the high dimensionality and high nonlinearity of hyperspectral images, deep learning methods based on convolutional neural network (CNN) are widely used in hyperspectral classification applications. However, most CNN structures are stacked vertically in addition to using a onefold size of convolutional kernels or pooling layers, which cannot fully mine the multiscale information on the hyperspectral images. When such networks meet the practical challenge of a limited labeled hyperspectral image dataset—i.e., “small sample problem”—the classification accuracy and generalization ability would be limited. In this paper, to tackle the small sample problem, we apply the semantic segmentation function to the pixel-level hyperspectral classification due to their comparability. A lightweight, multiscale squeeze-and-excitation pyramid pooling network (MSPN) is proposed. It consists of a multiscale 3D CNN module, a squeezing and excitation module, and a pyramid pooling module with 2D CNN. Such a hybrid 2D-3D-CNN MSPN framework can learn and fuse deeper hierarchical spatial–spectral features with fewer training samples. The proposed MSPN was tested on three publicly available hyperspectral classification datasets: Indian Pine, Salinas, and Pavia University. Using 5%, 0.5%, and 0.5% training samples of the three datasets, the classification accuracies of the MSPN were 96.09%, 97%, and 96.56%, respectively. In addition, we also selected the latest dataset with higher spatial resolution, named WHU-Hi-LongKou, as the challenge object. Using only 0.1% of the training samples, we could achieve a 97.31% classification accuracy, which is far superior to the state-of-the-art hyperspectral classification methods.


2021 ◽  
Author(s):  
Yin Guo ◽  
Limin Li

Two-sample independent test methods are widely used in case-control studies to identify significant changes or differences, for example, to identify key pathogenic genes by comparing the gene expression levels in normal and disease cells. However, due to the high cost of data collection or labelling, many studies face the small sample problem, for which the traditional two-sample test methods often lose power. We propose a novel rank-based nonparametric test method WMW-A for small sample problem by introducing a three-sample statistic through another auxiliary sample. By combining the case, control and auxiliary samples together, we construct a three-sample WMW-A statistic based on the gap between the average ranks of the case and control samples in the combined samples. By assuming that the auxiliary sample follows a mixed distribution of the case and control populations, we analyze the theoretical properties of the WMW-A statistic and approximate the theoretical power. The extensive simulation experiments and real applications on microarray gene expression data sets show the WMW-A test could significantly improve the test power for two-sample problem with small sample sizes, by either available unlabelled auxiliary data or generated auxiliary data.


Author(s):  
Xue Lin ◽  
Qi Zou ◽  
Xixia Xu

Human-object interaction (HOI) detection is important to understand human-centric scenes and is challenging due to subtle difference between fine-grained actions, and multiple co-occurring interactions. Most approaches tackle the problems by considering the multi-stream information and even introducing extra knowledge, which suffer from a huge combination space and the non-interactive pair domination problem. In this paper, we propose an Action-Guided attention mining and Relation Reasoning (AGRR) network to solve the problems. Relation reasoning on human-object pairs is performed by exploiting contextual compatibility consistency among pairs to filter out the non-interactive combinations. To better discriminate the subtle difference between fine-grained actions, an action-aware attention based on class activation map is proposed to mine the most relevant features for recognizing HOIs. Extensive experiments on V-COCO and HICO-DET datasets demonstrate the effectiveness of the proposed model compared with the state-of-the-art approaches.


Author(s):  
G. Bellitto ◽  
F. Proietto Salanitri ◽  
S. Palazzo ◽  
F. Rundo ◽  
D. Giordano ◽  
...  

AbstractIn this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model yields state-of-the-art accuracy on supervised saliency prediction. When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two. When, instead, we test it in an unsupervised domain adaptation setting, by enabling hierarchical gradient reversal layers, we obtain performance comparable to supervised state-of-the-art. Source code, trained models and example outputs are publicly available at https://github.com/perceivelab/hd2s.


Sign in / Sign up

Export Citation Format

Share Document