scholarly journals ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition

Author(s):  
Yuan Yuan ◽  
Zhitong Xiong ◽  
Qi Wang

RGB image classification has achieved significant performance improvement with the resurge of deep convolutional neural networks. However, mono-modal deep models for RGB image still have several limitations when applied to RGB-D scene recognition. 1) Images for scene classification usually contain more than one typical object with flexible spatial distribution, so the object-level local features should also be considered in addition to global scene representation. 2) Multi-modal features in RGB-D scene classification are still under-utilized. Simply combining these modal-specific features suffers from the semantic gaps between different modalities. 3) Most existing methods neglect the complex relationships among multiple modality features. Considering these limitations, this paper proposes an adaptive crossmodal (ACM) feature learning framework based on graph convolutional neural networks for RGB-D scene recognition. In order to make better use of the modal-specific cues, this approach mines the intra-modality relationships among the selected local features from one modality. To leverage the multi-modal knowledge more effectively, the proposed approach models the inter-modality relationships between two modalities through the cross-modal graph (CMG). We evaluate the proposed method on two public RGB-D scene classification datasets: SUN-RGBD and NYUD V2, and the proposed method achieves state-of-the-art performance.

Author(s):  
Ridha Ilyas Bendjillali ◽  
Mohammed Beladgham ◽  
Khaled Merit ◽  
Abdelmalik Taleb-Ahmed

<p><span>In the last decade, facial recognition techniques are considered the most important fields of research in biometric technology. In this research paper, we present a Face Recognition (FR) system divided into three steps: The Viola-Jones face detection algorithm, facial image enhancement using Modified Contrast Limited Adaptive Histogram Equalization algorithm (M-CLAHE), and feature learning for classification. For learning the features followed by classification we used VGG16, ResNet50 and Inception-v3 Convolutional Neural Networks (CNN) architectures for the proposed system. Our experimental work was performed on the Extended Yale B database and CMU PIE face database. Finally, the comparison with the other methods on both databases shows the robustness and effectiveness of the proposed approach. Where the Inception-v3 architecture has achieved a rate of 99, 44% and 99, 89% respectively.</span></p>


2020 ◽  
Vol 12 (7) ◽  
pp. 1092
Author(s):  
David Browne ◽  
Michael Giering ◽  
Steven Prestwich

Scene classification is an important aspect of image/video understanding and segmentation. However, remote-sensing scene classification is a challenging image recognition task, partly due to the limited training data, which causes deep-learning Convolutional Neural Networks (CNNs) to overfit. Another difficulty is that images often have very different scales and orientation (viewing angle). Yet another is that the resulting networks may be very large, again making them prone to overfitting and unsuitable for deployment on memory- and energy-limited devices. We propose an efficient deep-learning approach to tackle these problems. We use transfer learning to compensate for the lack of data, and data augmentation to tackle varying scale and orientation. To reduce network size, we use a novel unsupervised learning approach based on k-means clustering, applied to all parts of the network: most network reduction methods use computationally expensive supervised learning methods, and apply only to the convolutional or fully connected layers, but not both. In experiments, we set new standards in classification accuracy on four remote-sensing and two scene-recognition image datasets.


2021 ◽  
pp. 1-13
Author(s):  
Xiang-Min Liu ◽  
Jian Hu ◽  
Deborah Simon Mwakapesa ◽  
Y.A. Nanehkaran ◽  
Yi-Min Mao ◽  
...  

Deep convolutional neural networks (DCNNs), with their complex network structure and powerful feature learning and feature expression capabilities, have been remarkable successes in many large-scale recognition tasks. However, with the expectation of memory overhead and response time, along with the increasing scale of data, DCNN faces three non-rival challenges in a big data environment: excessive network parameters, slow convergence, and inefficient parallelism. To tackle these three problems, this paper develops a deep convolutional neural networks optimization algorithm (PDCNNO) in the MapReduce framework. The proposed method first pruned the network to obtain a compressed network in order to effectively reduce redundant parameters. Next, a conjugate gradient method based on modified secant equation (CGMSE) is developed in the Map phase to further accelerate the convergence of the network. Finally, a load balancing strategy based on regulate load rate (LBRLA) is proposed in the Reduce phase to quickly achieve equal grouping of data and thus improving the parallel performance of the system. We compared the PDCNNO algorithm with other algorithms on three datasets, including SVHN, EMNIST Digits, and ISLVRC2012. The experimental results show that our algorithm not only reduces the space and time overhead of network training but also obtains a well-performing speed-up ratio in a big data environment.


Sign in / Sign up

Export Citation Format

Share Document