scholarly journals PlanetScope Radiometric Normalization and Sentinel-2 Super-Resolution (2.5 m): A Straightforward Spectral-Spatial Fusion of Multi-Satellite Multi-Sensor Images Using Residual Convolutional Neural Networks

2020 ◽  
Vol 12 (15) ◽  
pp. 2366
Author(s):  
Nicolas Latte ◽  
Philippe Lejeune

Sentinel-2 (S2) imagery is used in many research areas and for diverse applications. Its spectral resolution and quality are high but its spatial resolutions, of at most 10 m, is not sufficient for fine scale analysis. A novel method was thus proposed to super-resolve S2 imagery to 2.5 m. For a given S2 tile, the 10 S2 bands (four at 10 m and six at 20 m) were fused with additional images acquired at higher spatial resolution by the PlanetScope (PS) constellation. The radiometric inconsistencies between PS microsatellites were normalized. Radiometric normalization and super-resolution were achieved simultaneously using state-of–the-art super-resolution residual convolutional neural networks adapted to the particularities of S2 and PS imageries (including masks of clouds and shadows). The method is described in detail, from image selection and downloading to neural network architecture, training, and prediction. The quality was thoroughly assessed visually (photointerpretation) and quantitatively, confirming that the proposed method is highly spatially and spectrally accurate. The method is also robust and can be applied to S2 images acquired worldwide at any date.

In this paper we will identify a cry signals of infants and the explanation behind the screams below 0-6 months of segment age. Detection of baby cry signals is essential for the pre-processing of various applications involving crial analysis for baby caregivers, such as emotion detection. Since cry signals hold baby well-being information and can be understood to an extent by experienced parents and experts. We train and validate the neural network architecture for baby cry detection and also test the fastAI with the neural network. Trained neural networks will provide a model and this model can predict the reason behind the cry sound. Only the cry sounds are recognized, and alert the user automatically. Created a web application by responding and detecting different emotions including hunger, tired, discomfort, bellypain.


Author(s):  
Н.А. Полковникова ◽  
Е.В. Тузинкевич ◽  
А.Н. Попов

В статье рассмотрены технологии компьютерного зрения на основе глубоких свёрточных нейронных сетей. Применение нейронных сетей особенно эффективно для решения трудно формализуемых задач. Разработана архитектура свёрточной нейронной сети применительно к задаче распознавания и классификации морских объектов на изображениях. В ходе исследования выполнен ретроспективный анализ технологий компьютерного зрения и выявлен ряд проблем, связанных с применением нейронных сетей: «исчезающий» градиент, переобучение и вычислительная сложность. При разработке архитектуры нейросети предложено использовать функцию активации RELU, обучение некоторых случайно выбранных нейронов и нормализацию с целью упрощения архитектуры нейросети. Сравнение используемых в нейросети функций активации ReLU, LeakyReLU, Exponential ReLU и SOFTMAX выполнено в среде Matlab R2020a. На основе свёрточной нейронной сети разработана программа на языке программирования Visual C# в среде MS Visual Studio для распознавания морских объектов. Программапредназначена для автоматизированной идентификации морских объектов, производит детектирование (нахождение объектов на изображении) и распознавание объектов с высокой вероятностью обнаружения. The article considers computer vision technologies based on deep convolutional neural networks. Application of neural networks is particularly effective for solving difficult formalized problems. As a result convolutional neural network architecture to the problem of recognition and classification of marine objects on images is implemented. In the research process a retrospective analysis of computer vision technologies was performed and a number of problems associated with the use of neural networks were identified: vanishing gradient, overfitting and computational complexity. To solve these problems in neural network architecture development, it was proposed to use RELU activation function, training some randomly selected neurons and normalization for simplification of neural network architecture. Comparison of ReLU, LeakyReLU, Exponential ReLU, and SOFTMAX activation functions used in the neural network implemented in Matlab R2020a.The computer program based on convolutional neural network for marine objects recognition implemented in Visual C# programming language in MS Visual Studio integrated development environment. The program is designed for automated identification of marine objects, produces detection (i.e., presence of objects on image), and objects recognition with high probability of detection.


2019 ◽  
Vol 11 (22) ◽  
pp. 2635 ◽  
Author(s):  
Massimiliano Gargiulo ◽  
Antonio Mazza ◽  
Raffaele Gaetano ◽  
Giuseppe Ruello ◽  
Giuseppe Scarpa

Images provided by the ESA Sentinel-2 mission are rapidly becoming the main source of information for the entire remote sensing community, thanks to their unprecedented combination of spatial, spectral and temporal resolution, as well as their associated open access policy. Due to a sensor design trade-off, images are acquired (and delivered) at different spatial resolutions (10, 20 and 60 m) according to specific sets of wavelengths, with only the four visible and near infrared bands provided at the highest resolution (10 m). Although this is not a limiting factor in general, many applications seem to emerge in which the resolution enhancement of 20 m bands may be beneficial, motivating the development of specific super-resolution methods. In this work, we propose to leverage Convolutional Neural Networks (CNNs) to provide a fast, upscalable method for the single-sensor fusion of Sentinel-2 (S2) data, whose aim is to provide a 10 m super-resolution of the original 20 m bands. Experimental results demonstrate that the proposed solution can achieve better performance with respect to most of the state-of-the-art methods, including other deep learning based ones with a considerable saving of computational burden.


IoT ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 222-235
Author(s):  
Guillaume Coiffier ◽  
Ghouthi Boukli Hacene ◽  
Vincent Gripon

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.


2021 ◽  
Vol 15 ◽  
Author(s):  
Xuan Chen ◽  
Xiaopeng Yuan ◽  
Gaoming Fu ◽  
Yuanyong Luo ◽  
Tao Yue ◽  
...  

Convolutional Neural Networks (CNNs) are effective and mature in the field of classification, while Spiking Neural Networks (SNNs) are energy-saving for their sparsity of data flow and event-driven working mechanism. Previous work demonstrated that CNNs can be converted into equivalent Spiking Convolutional Neural Networks (SCNNs) without obvious accuracy loss, including different functional layers such as Convolutional (Conv), Fully Connected (FC), Avg-pooling, Max-pooling, and Batch-Normalization (BN) layers. To reduce inference-latency, existing researches mainly concentrated on the normalization of weights to increase the firing rate of neurons. There are also some approaches during training phase or altering the network architecture. However, little attention has been paid on the end of inference phase. From this new perspective, this paper presents 4 stopping criterions as low-cost plug-ins to reduce the inference-latency of SCNNs. The proposed methods are validated using MATLAB and PyTorch platforms with Spiking-AlexNet for CIFAR-10 dataset and Spiking-LeNet-5 for MNIST dataset. Simulation results reveal that, compared to the state-of-the-art methods, the proposed method can shorten the average inference-latency of Spiking-AlexNet from 892 to 267 time steps (almost 3.34 times faster) with the accuracy decline from 87.95 to 87.72%. With our methods, 4 types of Spiking-LeNet-5 only need 24–70 time steps per image with the accuracy decline not more than 0.1%, while models without our methods require 52–138 time steps, almost 1.92 to 3.21 times slower than us.


2016 ◽  
Vol 10 (03) ◽  
pp. 379-397 ◽  
Author(s):  
Hilal Ergun ◽  
Yusuf Caglar Akyuz ◽  
Mustafa Sert ◽  
Jianquan Liu

Visual concept recognition is an active research field in the last decade. Related to this attention, deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition in videos. In this study, we investigate various aspects of convolutional neural networks for visual concept recognition. We analyze recent studies and different network architectures both in terms of running time and accuracy. In our proposed visual concept recognition system, we first discuss various important properties of popular convolutional network architecture under consideration. Then we describe our method for feature extraction at different levels of abstraction. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the-art results on benchmark datasets while keeping computational costs at low level. Our results show that these state-of-the-art results can be reached without using extensive data augmentation techniques.


Author(s):  
Tuan Hoang ◽  
Thanh-Toan Do ◽  
Tam V. Nguyen ◽  
Ngai-Man Cheung

This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. However, this approach would result in some mismatch: the gradient descent updates full-precision weights, but it does not update the quantized weights. To address this issue, we propose a novel method that enables direct updating of quantized weights with learnable quantization levels to minimize the cost function using gradient descent. Second, to obtain low bit-width activations, existing works consider all channels equally. However, the activation quantizers could be biased toward a few channels with high-variance. To address this issue, we propose a method to take into account the quantization errors of individual channels. With this approach, we can learn activation quantizers that minimize the quantization errors in the majority of channels. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on the image classification task, using AlexNet, ResNet and MobileNetV2 architectures on CIFAR-100 and ImageNet datasets.


Author(s):  
Zhongying Deng ◽  
Xiaojiang Peng ◽  
Yu Qiao

Heterogeneous Face Recognition (HFR) is a challenging task due to large modality discrepancy as well as insufficient training images in certain modalities. In this paper, we propose a new two-branch network architecture, termed as Residual Compensation Networks (RCN), to learn separated features for different modalities in HFR. The RCN incorporates a residual compensation (RC) module and a modality discrepancy loss (MD loss) into traditional convolutional neural networks. The RC module reduces modal discrepancy by adding compensation to one of the modalities so that its representation can be close to the other modality. The MD loss alleviates modal discrepancy by minimizing the cosine distance between different modalities. In addition, we explore different architectures and positions for the RC module, and evaluate different transfer learning strategies for HFR. Extensive experiments on IIIT-D Viewed Sketch, Forensic Sketch, CASIA NIR-VIS 2.0 and CUHK NIR-VIS show that our RCN outperforms other state-of-the-art methods significantly.


Author(s):  
M. U. Müller ◽  
N. Ekhtiari ◽  
R. M. Almeida ◽  
C. Rieke

Abstract. Super-resolution aims at increasing image resolution by algorithmic means and has progressed over the recent years due to advances in the fields of computer vision and deep learning. Convolutional Neural Networks based on a variety of architectures have been applied to the problem, e.g. autoencoders and residual networks. While most research focuses on the processing of photographs consisting only of RGB color channels, little work can be found concentrating on multi-band, analytic satellite imagery. Satellite images often include a panchromatic band, which has higher spatial resolution but lower spectral resolution than the other bands. In the field of remote sensing, there is a long tradition of applying pan-sharpening to satellite images, i.e. bringing the multispectral bands to the higher spatial resolution by merging them with the panchromatic band. To our knowledge there are so far no approaches to super-resolution which take advantage of the panchromatic band. In this paper we propose a method to train state-of-the-art CNNs using pairs of lower-resolution multispectral and high-resolution pan-sharpened image tiles in order to create super-resolved analytic images. The derived quality metrics show that the method improves information content of the processed images. We compare the results created by four CNN architectures, with RedNet30 performing best.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1398
Author(s):  
Taian Guo ◽  
Tao Dai ◽  
Ling Liu ◽  
Zexuan Zhu ◽  
Shu-Tao Xia

Convolutional Neural Networks (CNNs) have been widely used in video super-resolution (VSR). Most existing VSR methods focus on how to utilize the information of multiple frames, while neglecting the feature correlations of the intermediate features, thus limiting the feature expression of the models. To address this problem, we propose a novel SAA network, that is, Scale-and-Attention-Aware Networks, to apply different attention to different temporal-length streams, while further exploring both spatial and channel attention on separate streams with a newly proposed Criss-Cross Channel Attention Module (C3AM). Experiments on public VSR datasets demonstrate the superiority of our method over other state-of-the-art methods in terms of both quantitative and qualitative metrics.


Sign in / Sign up

Export Citation Format

Share Document