Multi-Output Learning Based on Multimodal GCN and Co-Attention for Image Aesthetics and Emotion Analysis

Haotian Miao; Yifei Zhang; Daling Wang; Shi Feng

doi:10.3390/math9121437

Multi-Output Learning Based on Multimodal GCN and Co-Attention for Image Aesthetics and Emotion Analysis

Mathematics ◽

10.3390/math9121437 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1437

Author(s):

Haotian Miao ◽

Yifei Zhang ◽

Daling Wang ◽

Shi Feng

Keyword(s):

Image Processing ◽

Pattern Recognition ◽

Deep Learning ◽

Feature Representation ◽

Convolutional Network ◽

Massive Growth ◽

Mathematical Problems ◽

Aesthetics Assessment ◽

The Aesthetic ◽

High Level

With the development of social networks and intelligent terminals, it is becoming more convenient to share and acquire images. The massive growth of the number of social images makes people have higher demands for automatic image processing, especially in the aesthetic and emotional perspective. Both aesthetics assessment and emotion recognition require a higher ability for the computer to simulate high-level visual perception understanding, which belongs to the field of image processing and pattern recognition. However, existing methods often ignore the prior knowledge of images and intrinsic relationships between aesthetic and emotional perspectives. Recently, machine learning and deep learning have become powerful methods for researchers to solve mathematical problems in computing, such as image processing and pattern recognition. Both images and abstract concepts can be converted into numerical matrices and then establish the mapping relations using mathematics on computers. In this work, we propose an end-to-end multi-output deep learning model based on multimodal Graph Convolutional Network (GCN) and co-attention for aesthetic and emotion conjoint analysis. In our model, a stacked multimodal GCN network is proposed to encode the features under the guidance of the correlation matrix, and a co-attention module is designed to help the aesthetics and emotion feature representation learn from each other interactively. Experimental results indicate that our proposed model achieves competitive performance on the IAE dataset. Progressive results on the AVA and ArtPhoto datasets also prove the generalization ability of our model.

Recognizing Ancient Characters from Tamil Palm Leaf Manuscripts using Convolution Based Deep Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5842.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 6873-6880

Keyword(s):

Neural Network ◽

Neural Networks ◽

Image Processing ◽

Pattern Recognition ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Character Recognition ◽

Optical Character Recognition ◽

Convolutional Network ◽

Palm Leaf

Palm leaf manuscripts has been one of the ancient writing methods but the palm leaf manuscripts content requires to be inscribed in a new set of leaves. This study has provided a solution to save the contents in palm leaf manuscripts by recognizing the handwritten Tamil characters in manuscripts and storing them digitally. Character recognition is one of the most essential fields of pattern recognition and image processing. Generally Optical character recognition is the method of e-translation of typewritten text or handwritten images into machine editable text. The handwritten Tamil character recognition has been one of the challenging and active areas of research in the field of pattern recognition and image processing. In this study a trial was made to identify Tamil handwritten characters without extraction of feature using convolutional neural networks. This study uses convolutional neural networks for recognizing and classifying the Tamil palm leaf manuscripts of characters from separated character images. The convolutional neural network is a deep learning approach for which it does not need to retrieve features and also a rapid approach for character recognition. In the proposed system every character is expanded to needed pixels. The expanded characters have predetermined pixels and these pixels are considered as characteristics for neural network training. The trained network is employed for recognition and classification. Convolutional Network Model development contains convolution layer, Relu layer, pooling layer, fully connected layer. The ancient Tamil character dataset of 60 varying class has been created. The outputs reveal that the proposed approach generates better rates of recognition than that of schemes based on feature extraction for handwritten character recognition. The accuracy of the proposed approach has been identified as 97% which shows that the proposed approach is effective in terms of recognition of ancient characters.

Bubble Pattern Recognition from Particle Image Velocimetry (PIV) Images using a Deep-Learning-Based Image Processing Technique

10.26678/abcm.cobem2021.cob2021-0893 ◽

2021 ◽

Author(s):

Rafael Franklin Lazaro de Cerqueira ◽

Marco Antônio Cerutti ◽

Emilio Paladino

Keyword(s):

Image Processing ◽

Pattern Recognition ◽

Particle Image Velocimetry ◽

Deep Learning ◽

Particle Image ◽

Processing Technique ◽

Image Processing Technique ◽

Image Velocimetry

Image Enhancement Techniques Using Particle Swarm Optimization Technique

Advances in Computational Intelligence and Robotics - Handbook of Research on Swarm Intelligence in Engineering ◽

10.4018/978-1-4666-8291-7.ch010 ◽

2015 ◽

pp. 327-347 ◽

Cited By ~ 2

Author(s):

V. Santhi ◽

B. K. Tripathy

Keyword(s):

Image Processing ◽

Pattern Recognition ◽

Particle Swarm Optimization ◽

Real Time ◽

Quality Enhancement ◽

Swarm Optimization ◽

Image Processing Techniques ◽

Real Time Applications ◽

High Level ◽

Processing Techniques

The image quality enhancement process is considered as one of the basic requirement for high-level image processing techniques that demand good quality in images. High-level image processing techniques include feature extraction, morphological processing, pattern recognition, automation engineering, and many more. Many classical enhancement methods are available for enhancing the quality of images and they can be carried out either in spatial domain or in frequency domain. But in real time applications, the quality enhancement process carried out by classical approaches may not serve the purpose. It is required to combine the concept of computational intelligence with the classical approaches to meet the requirements of real-time applications. In recent days, Particle Swarm Optimization (PSO) technique is considered one of the new approaches in optimization techniques and it is used extensively in image processing and pattern recognition applications. In this chapter, image enhancement is considered an optimization problem, and different methods to solve it through PSO are discussed in detail.

Development of a deep learning-based image processing technique for bubble pattern recognition and shape reconstruction in dense bubbly flows

Chemical Engineering Science ◽

10.1016/j.ces.2020.116163 ◽

2021 ◽

Vol 230 ◽

pp. 116163

Author(s):

Rafael F.L. Cerqueira ◽

Emilio E. Paladino

Keyword(s):

Image Processing ◽

Pattern Recognition ◽

Deep Learning ◽

Processing Technique ◽

Shape Reconstruction ◽

Bubbly Flows ◽

Image Processing Technique

Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images

Remote Sensing ◽

10.3390/rs13245100 ◽

2021 ◽

Vol 13 (24) ◽

pp. 5100

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Image Processing ◽

Deep Learning ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Semantic Segmentation ◽

Landsat 8 ◽

Convolutional Network ◽

Image Labeling ◽

Feature Pyramid

Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% F1 score), Thailand North Landsat-8 corpus (63.12% F1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.

Road Object Detection: A Comparative Study of Deep Learning-Based Algorithms

Electronics ◽

10.3390/electronics10161932 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1932

Author(s):

Malik Haris ◽

Adam Glowacz

Keyword(s):

Image Processing ◽

Deep Learning ◽

Object Detection ◽

Real Time ◽

Large Scale ◽

Single Shot ◽

Automated Driving ◽

Convolutional Network ◽

Image Processing Algorithms ◽

Processing Algorithms

Automated driving and vehicle safety systems need object detection. It is important that object detection be accurate overall and robust to weather and environmental conditions and run in real-time. As a consequence of this approach, they require image processing algorithms to inspect the contents of images. This article compares the accuracy of five major image processing algorithms: Region-based Fully Convolutional Network (R-FCN), Mask Region-based Convolutional Neural Networks (Mask R-CNN), Single Shot Multi-Box Detector (SSD), RetinaNet, and You Only Look Once v4 (YOLOv4). In this comparative analysis, we used a large-scale Berkeley Deep Drive (BDD100K) dataset. Their strengths and limitations are analyzed based on parameters such as accuracy (with/without occlusion and truncation), computation time, precision-recall curve. The comparison is given in this article helpful in understanding the pros and cons of standard deep learning-based algorithms while operating under real-time deployment restrictions. We conclude that the YOLOv4 outperforms accurately in detecting difficult road target objects under complex road scenarios and weather conditions in an identical testing environment.

Image Enhancement Techniques Using Particle Swarm Optimization Technique

Human-Computer Interaction ◽

10.4018/978-1-4666-8789-9.ch039 ◽

2015 ◽

pp. 860-878

Author(s):

V. Santhi ◽

B. K. Tripathy

Keyword(s):

Image Processing ◽

Pattern Recognition ◽

Particle Swarm Optimization ◽

Real Time ◽

Quality Enhancement ◽

Swarm Optimization ◽

Image Processing Techniques ◽

Real Time Applications ◽

High Level ◽

Processing Techniques

Predicting functions of maize proteins using graph convolutional network

BMC Bioinformatics ◽

10.1186/s12859-020-03745-6 ◽

2020 ◽

Vol 21 (S16) ◽

Author(s):

Guangjie Zhou ◽

Jun Wang ◽

Xiangliang Zhang ◽

Maozu Guo ◽

Guoxian Yu

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Protein Function ◽

Structural Information ◽

Semantic Representation ◽

Model Organism ◽

Amino Acid Sequences ◽

Feature Representation ◽

Convolutional Network ◽

Go Terms

Abstract Background Maize (Zea mays ssp. mays L.) is the most widely grown and yield crop in the world, as well as an important model organism for fundamental research of the function of genes. The functions of Maize proteins are annotated using the Gene Ontology (GO), which has more than 40000 terms and organizes GO terms in a direct acyclic graph (DAG). It is a huge challenge to accurately annotate relevant GO terms to a Maize protein from such a large number of candidate GO terms. Some deep learning models have been proposed to predict the protein function, but the effectiveness of these approaches is unsatisfactory. One major reason is that they inadequately utilize the GO hierarchy. Results To use the knowledge encoded in the GO hierarchy, we propose a deep Graph Convolutional Network (GCN) based model (DeepGOA) to predict GO annotations of proteins. DeepGOA firstly quantifies the correlations (or edges) between GO terms and updates the edge weights of the DAG by leveraging GO annotations and hierarchy, then learns the semantic representation and latent inter-relations of GO terms in the way by applying GCN on the updated DAG. Meanwhile, Convolutional Neural Network (CNN) is used to learn the feature representation of amino acid sequences with respect to the semantic representations. After that, DeepGOA computes the dot product of the two representations, which enable to train the whole network end-to-end coherently. Extensive experiments show that DeepGOA can effectively integrate GO structural information and amino acid information, and then annotates proteins accurately. Conclusions Experiments on Maize PH207 inbred line and Human protein sequence dataset show that DeepGOA outperforms the state-of-the-art deep learning based methods. The ablation study proves that GCN can employ the knowledge of GO and boost the performance. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=DeepGOA.

Image Cropping with Composition and Saliency Aware Aesthetic Score Map

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6889 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12104-12111

Author(s):

Yi Tu ◽

Li Niu ◽

Weijie Zhao ◽

Dawei Cheng ◽

Liqing Zhang

Keyword(s):

Deep Learning ◽

Convolutional Network ◽

Aesthetic Quality ◽

Aesthetic Evaluation ◽

Fully Convolutional Network ◽

Intrinsic Mechanism ◽

Real World Applications ◽

Benchmark Datasets ◽

Score Map ◽

The Aesthetic

Aesthetic image cropping is a practical but challenging task which aims at finding the best crops with the highest aesthetic quality in an image. Recently, many deep learning methods have been proposed to address this problem, but they did not reveal the intrinsic mechanism of aesthetic evaluation. In this paper, we propose an interpretable image cropping model to unveil the mystery. For each image, we use a fully convolutional network to produce an aesthetic score map, which is shared among all candidate crops during crop-level aesthetic evaluation. Then, we require the aesthetic score map to be both composition-aware and saliency-aware. In particular, the same region is assigned with different aesthetic scores based on its relative positions in different crops. Moreover, a visually salient region is supposed to have more sensitive aesthetic scores so that our network can learn to place salient objects at more proper positions. Such an aesthetic score map can be used to localize aesthetically important regions in an image, which sheds light on the composition rules learned by our model. We show the competitive performance of our model in the image cropping task on several benchmark datasets, and also demonstrate its generality in real-world applications.

Object Detection Based on Faster R-Cnn

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c2186.0210321 ◽

2021 ◽

Vol 10 (3) ◽

pp. 72-76

Author(s):

M. Sushma Sri ◽

B. Rajendra Naik ◽

K. Jaya Sankar

Keyword(s):

Neural Networks ◽

Image Processing ◽

Deep Learning ◽

Object Detection ◽

Deep Neural Networks ◽

Rapid Development ◽

Simple Algorithm ◽

Average Precision ◽

Rapid Improvement ◽

High Level

In recent years there is rapid improvement in Object detection in areas of video analysis and image processing applications. Determing a desired object became an important aspect, so that there are many numerous of methods are evolved in Object detection. In this regard as there is rapid development in Deep Learning for its high-level processing, extracting deeper features, reliable and flexible compared to conventional techniques. In this article, the author proposes Object detection with deep neural networks and faster region convolutional neural networks methods for providing a simple algorithm which provides better accuracy and mean average precision.