A CNN Model for Human Parsing Based on Capacity Optimization

Yalong Jiang; Zheru Chi

doi:10.3390/app9071330

A CNN Model for Human Parsing Based on Capacity Optimization

Applied Sciences ◽

10.3390/app9071330 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1330 ◽

Cited By ~ 1

Author(s):

Yalong Jiang ◽

Zheru Chi

Keyword(s):

Neural Networks ◽

Computational Efficiency ◽

Semantic Information ◽

State Of The Art ◽

Depth Estimation ◽

Baseline Model ◽

Computational Burden ◽

Proposed Model ◽

Saliency Prediction ◽

Benchmark Solutions

Although a state-of-the-art performance has been achieved in pixel-specific tasks, such as saliency prediction and depth estimation, convolutional neural networks (CNNs) still perform unsatisfactorily in human parsing where semantic information of detailed regions needs to be perceived under the influences of variations in viewpoints, poses, and occlusions. In this paper, we propose to improve the robustness of human parsing modules by introducing a depth-estimation module. A novel scheme is proposed for the integration of a depth-estimation module and a human-parsing module. The robustness of the overall model is improved with the automatically obtained depth labels. As another major concern, the computational efficiency is also discussed. Our proposed human parsing module with 24 layers can achieve a similar performance as the baseline CNN model with over 100 layers. The number of parameters in the overall model is less than that in the baseline model. Furthermore, we propose to reduce the computational burden by replacing a conventional CNN layer with a stack of simplified sub-layers to further reduce the overall number of trainable parameters. Experimental results show that the integration of two modules contributes to the improvement of human parsing without additional human labeling. The proposed model outperforms the benchmark solutions and the capacity of our model is better matched to the complexity of the task.

Download Full-text

A New Click-Through Rates Prediction Model Based on Deep&Cross Network

Algorithms ◽

10.3390/a13120342 ◽

2020 ◽

Vol 13 (12) ◽

pp. 342

Author(s):

Guojing Huang ◽

Qingliang Chen ◽

Congjian Deng

Keyword(s):

Neural Networks ◽

Prediction Model ◽

Deep Neural Networks ◽

Prediction Models ◽

State Of The Art ◽

Online Advertising ◽

Optimization Technique ◽

Proposed Model ◽

Great Progress ◽

Online Advertisement

With the development of E-commerce, online advertising began to thrive and has gradually developed into a new mode of business, of which Click-Through Rates (CTR) prediction is the essential driving technology. Given a user, commodities and scenarios, the CTR model can predict the user’s click probability of an online advertisement. Recently, great progress has been made with the introduction of Deep Neural Networks (DNN) into CTR. In order to further advance the DNN-based CTR prediction models, this paper introduces a new model of FO-FTRL-DCN, based on the prestigious model of Deep&Cross Network (DCN) augmented with the latest optimization technique of Follow The Regularized Leader (FTRL) for DNN. The extensive comparative experiments on the iPinYou datasets show that the proposed model has outperformed other state-of-the-art baselines, with better generalization across different datasets in the benchmark.

Download Full-text

Appearance and Motion Enhancement for Video-Based Person Re-Identification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6802 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11394-11401

Author(s):

Shuzhao Li ◽

Huimin Yu ◽

Haoji Hu

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Complex Model ◽

The State ◽

Final Model ◽

Backbone Network ◽

Proposed Model ◽

Art Performance ◽

Attribute Recognition

In this paper, we propose an Appearance and Motion Enhancement Model (AMEM) for video-based person re-identification to enrich the two kinds of information contained in the backbone network in a more interpretable way. Concretely, human attribute recognition under the supervision of pseudo labels is exploited in an Appearance Enhancement Module (AEM) to help enrich the appearance and semantic information. A Motion Enhancement Module (MEM) is designed to capture the identity-discriminative walking patterns through predicting future frames. Despite a complex model with several auxiliary modules during training, only the backbone model plus two small branches are kept for similarity evaluation which constitute a simple but effective final model. Extensive experiments conducted on three popular video-based person ReID benchmarks demonstrate the effectiveness of our proposed model and the state-of-the-art performance compared with existing methods.

Download Full-text

Group-Wise Dynamic Dropout Based on Latent Semantic Variations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6782 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11229-11236

Author(s):

Zhiwei Ke ◽

Zhiwei Wen ◽

Weicheng Xie ◽

Yi Wang ◽

Linlin Shen

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Semantic Information ◽

State Of The Art ◽

Classification Performance ◽

Network Robustness ◽

Feature Detectors ◽

Data Points ◽

Adversarial Examples ◽

Public Datasets

Dropout regularization has been widely used in various deep neural networks to combat overfitting. It works by training a network to be more robust on information-degraded data points for better generalization. Conventional dropout and variants are often applied to individual hidden units in a layer to break up co-adaptations of feature detectors. In this paper, we propose an adaptive dropout to reduce the co-adaptations in a group-wise manner by coarse semantic information to improve feature discriminability. In particular, we showed that adjusting the dropout probability based on local feature densities can not only improve the classification performance significantly but also enhance the network robustness against adversarial examples in some cases. The proposed approach was evaluated in comparison with the baseline and several state-of-the-art adaptive dropouts over four public datasets of Fashion-MNIST, CIFAR-10, CIFAR-100 and SVHN.

Download Full-text

A Novel Architecture to Classify Histopathology Images Using Convolutional Neural Networks

Applied Sciences ◽

10.3390/app10082929 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2929 ◽

Cited By ~ 2

Author(s):

Ibrahem Kandel ◽

Mauro Castelli

Keyword(s):

Neural Network ◽

Neural Networks ◽

State Of The Art ◽

Treatment Plan ◽

Tissue Structure ◽

Activation Functions ◽

Proposed Model ◽

Histopathology Images ◽

Fully Connected

Histopathology is the study of tissue structure under the microscope to determine if the cells are normal or abnormal. Histopathology is a very important exam that is used to determine the patients’ treatment plan. The classification of histopathology images is very difficult to even an experienced pathologist, and a second opinion is often needed. Convolutional neural network (CNN), a particular type of deep learning architecture, obtained outstanding results in computer vision tasks like image classification. In this paper, we propose a novel CNN architecture to classify histopathology images. The proposed model consists of 15 convolution layers and two fully connected layers. A comparison between different activation functions was performed to detect the most efficient one, taking into account two different optimizers. To train and evaluate the proposed model, the publicly available PatchCamelyon dataset was used. The dataset consists of 220,000 annotated images for training and 57,000 unannotated images for testing. The proposed model achieved higher performance compared to the state-of-the-art architectures with an AUC of 95.46%.

Download Full-text

Hybrid pooling with wavelets for convolutional neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219223 ◽

2022 ◽

pp. 1-10

Author(s):

Daniel Trevino-Sanchez ◽

Vicente Alarcon-Aquino

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Computational Cost ◽

Relevant Information ◽

Accuracy Improvement ◽

Proposed Model ◽

Benchmark Datasets ◽

Augmentation Techniques ◽

High Computational Cost

The need to detect and classify objects correctly is a constant challenge, being able to recognize them at different scales and scenarios, sometimes cropped or badly lit is not an easy task. Convolutional neural networks (CNN) have become a widely applied technique since they are completely trainable and suitable to extract features. However, the growing number of convolutional neural networks applications constantly pushes their accuracy improvement. Initially, those improvements involved the use of large datasets, augmentation techniques, and complex algorithms. These methods may have a high computational cost. Nevertheless, feature extraction is known to be the heart of the problem. As a result, other approaches combine different technologies to extract better features to improve the accuracy without the need of more powerful hardware resources. In this paper, we propose a hybrid pooling method that incorporates multiresolution analysis within the CNN layers to reduce the feature map size without losing details. To prevent relevant information from losing during the downsampling process an existing pooling method is combined with wavelet transform technique, keeping those details "alive" and enriching other stages of the CNN. Achieving better quality characteristics improves CNN accuracy. To validate this study, ten pooling methods, including the proposed model, are tested using four benchmark datasets. The results are compared with four of the evaluated methods, which are also considered as the state-of-the-art.

Download Full-text

Learning Structured Text Representations

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00005 ◽

2018 ◽

Vol 6 ◽

pp. 63-75 ◽

Cited By ~ 12

Author(s):

Yang Liu ◽

Mirella Lapata

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Neural Model ◽

Document Modeling ◽

Proposed Model ◽

Parsing Algorithm ◽

Document Representations

In this paper, we focus on learning structure-aware document representations from data without recourse to a discourse parser or additional annotations. Drawing inspiration from recent efforts to empower neural networks with a structural bias (Cheng et al., 2016; Kim et al., 2017), we propose a model that can encode a document while automatically inducing rich structural dependencies. Specifically, we embed a differentiable non-projective parsing algorithm into a neural model and use attention mechanisms to incorporate the structural biases. Experimental evaluations across different tasks and datasets show that the proposed model achieves state-of-the-art results on document modeling tasks while inducing intermediate structures which are both interpretable and meaningful.

Download Full-text

Biomedical document triage using a hierarchical attention-based capsule network

BMC Bioinformatics ◽

10.1186/s12859-020-03673-5 ◽

2020 ◽

Vol 21 (S13) ◽

Author(s):

Jian Wang ◽

Mengying Li ◽

Qishuai Diao ◽

Hongfei Lin ◽

Zhihao Yang ◽

...

Keyword(s):

Neural Networks ◽

Information Extraction ◽

Precision Medicine ◽

State Of The Art ◽

Attention Mechanism ◽

Feature Representation ◽

Experimental Results ◽

Biomedical Domain ◽

Proposed Model ◽

Document Triage

Abstract Background Biomedical document triage is the foundation of biomedical information extraction, which is important to precision medicine. Recently, some neural networks-based methods have been proposed to classify biomedical documents automatically. In the biomedical domain, documents are often very long and often contain very complicated sentences. However, the current methods still find it difficult to capture important features across sentences. Results In this paper, we propose a hierarchical attention-based capsule model for biomedical document triage. The proposed model effectively employs hierarchical attention mechanism and capsule networks to capture valuable features across sentences and construct a final latent feature representation for a document. We evaluated our model on three public corpora. Conclusions Experimental results showed that both hierarchical attention mechanism and capsule networks are helpful in biomedical document triage task. Our method proved itself highly competitive or superior compared with other state-of-the-art methods.

Download Full-text

Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction

International Journal of Computer Vision ◽

10.1007/s11263-021-01519-y ◽

2021 ◽

Author(s):

G. Bellitto ◽

F. Proietto Salanitri ◽

S. Palazzo ◽

F. Rundo ◽

D. Giordano ◽

...

Keyword(s):

Multiple Scales ◽

Domain Adaptation ◽

State Of The Art ◽

Feature Learning ◽

Hierarchical Learning ◽

Domain Specific ◽

Proposed Model ◽

Saliency Prediction ◽

Video Saliency ◽

Abstraction Levels

AbstractIn this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model yields state-of-the-art accuracy on supervised saliency prediction. When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two. When, instead, we test it in an unsupervised domain adaptation setting, by enabling hierarchical gradient reversal layers, we obtain performance comparable to supervised state-of-the-art. Source code, trained models and example outputs are publicly available at https://github.com/perceivelab/hd2s.

Download Full-text

SalSAC: A Video Saliency Prediction Model with Shuffled Attentions and Correlation-Based ConvLSTM

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6927 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12410-12417 ◽

Cited By ~ 2

Author(s):

Xinyi Wu ◽

Zhenyao Wu ◽

Jinglin Zhang ◽

Lili Ju ◽

Song Wang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Prediction Model ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Dynamic Aspect ◽

Static Information ◽

Saliency Prediction ◽

Multi Level ◽

Video Saliency

The performance of predicting human fixations in videos has been much enhanced with the help of development of the convolutional neural networks (CNN). In this paper, we propose a novel end-to-end neural network “SalSAC” for video saliency prediction, which uses the CNN-LSTM-Attention as the basic architecture and utilizes the information from both static and dynamic aspects. To better represent the static information of each frame, we first extract multi-level features of same size from different layers of the encoder CNN and calculate the corresponding multi-level attentions, then we randomly shuffle these attention maps among levels and multiply them to the extracted multi-level features respectively. Through this way, we leverage the attention consistency across different layers to improve the robustness of the network. On the dynamic aspect, we propose a correlation-based ConvLSTM to appropriately balance the influence of the current and preceding frames to the prediction. Experimental results on the DHF1K, Hollywood2 and UCF-sports datasets show that SalSAC outperforms many existing state-of-the-art methods.

Download Full-text

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Wireless Communications and Mobile Computing ◽

10.1155/2021/8889075 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Tiantian Chen ◽

Nianbin Wang ◽

Hongbin Wang ◽

Haomin Zhan

Keyword(s):

Large Scale ◽

Semantic Information ◽

State Of The Art ◽

Relation Extraction ◽

Semantic Features ◽

Distant Supervision ◽

Word Level ◽

Proposed Model ◽

Relation Prediction ◽

Better Than

Distant supervision (DS) has been widely used for relation extraction (RE), which automatically generates large-scale labeled data. However, there is a wrong labeling problem, which affects the performance of RE. Besides, the existing method suffers from the lack of useful semantic features for some positive training instances. To address the above problems, we propose a novel RE model with sentence selection and interaction representation for distantly supervised RE. First, we propose a pattern method based on the relation trigger words as a sentence selector to filter out noisy sentences to alleviate the wrong labeling problem. After clean instances are obtained, we propose the interaction representation using the word-level attention mechanism-based entity pairs to dynamically increase the weights of the words related to entity pairs, which can provide more useful semantic information for relation prediction. The proposed model outperforms the strongest baseline by 2.61 in F1-score on a widely used dataset, which proves that our model performs significantly better than the state-of-the-art RE systems.

Download Full-text