scholarly journals Pansformers: Transformer-Based Self-Attention Network for Pansharpening

Author(s):  
Nithin G R ◽  
Nitish Kumar M ◽  
Venkateswaran Narasimhan ◽  
Rajanikanth Kakani ◽  
Ujjwal Gupta ◽  
...  

Pansharpening is the task of creating a High-Resolution Multi-Spectral Image (HRMS) by extracting and infusing pixel details from the High-Resolution Panchromatic Image into the Low-Resolution Multi-Spectral (LRMS). With the boom in the amount of satellite image data, researchers have replaced traditional approaches with deep learning models. However, existing deep learning models are not built to capture intricate pixel-level relationships. Motivated by the recent success of self-attention mechanisms in computer vision tasks, we propose Pansformers, a transformer-based self-attention architecture, that computes band-wise attention. A further improvement is proposed in the attention network by introducing a Multi-Patch Attention mechanism, which operates on non-overlapping, local patches of the image. Our model is successful in infusing relevant local details from the Panchromatic image while preserving the spectral integrity of the MS image. We show that our Pansformer model significantly improves the performance metrics and the output image quality on imagery from two satellite distributions IKONOS and LANDSAT-8.

2021 ◽  
Author(s):  
Nithin G R ◽  
Nitish Kumar M ◽  
Venkateswaran Narasimhan ◽  
Rajanikanth Kakani ◽  
Ujjwal Gupta ◽  
...  

Pansharpening is the task of creating a High-Resolution Multi-Spectral Image (HRMS) by extracting and infusing pixel details from the High-Resolution Panchromatic Image into the Low-Resolution Multi-Spectral (LRMS). With the boom in the amount of satellite image data, researchers have replaced traditional approaches with deep learning models. However, existing deep learning models are not built to capture intricate pixel-level relationships. Motivated by the recent success of self-attention mechanisms in computer vision tasks, we propose Pansformers, a transformer-based self-attention architecture, that computes band-wise attention. A further improvement is proposed in the attention network by introducing a Multi-Patch Attention mechanism, which operates on non-overlapping, local patches of the image. Our model is successful in infusing relevant local details from the Panchromatic image while preserving the spectral integrity of the MS image. We show that our Pansformer model significantly improves the performance metrics and the output image quality on imagery from two satellite distributions IKONOS and LANDSAT-8.


2021 ◽  
Vol 13 (5) ◽  
pp. 992
Author(s):  
Dan López-Puigdollers ◽  
Gonzalo Mateo-García ◽  
Luis Gómez-Chova

The systematic monitoring of the Earth using optical satellites is limited by the presence of clouds. Accurately detecting these clouds is necessary to exploit satellite image archives in remote sensing applications. Despite many developments, cloud detection remains an unsolved problem with room for improvement, especially over bright surfaces and thin clouds. Recently, advances in cloud masking using deep learning have shown significant boosts in cloud detection accuracy. However, these works are validated in heterogeneous manners, and the comparison with operational threshold-based schemes is not consistent among many of them. In this work, we systematically compare deep learning models trained on Landsat-8 images on different Landsat-8 and Sentinel-2 publicly available datasets. Overall, we show that deep learning models exhibit a high detection accuracy when trained and tested on independent images from the same Landsat-8 dataset (intra-dataset validation), outperforming operational algorithms. However, the performance of deep learning models is similar to operational threshold-based ones when they are tested on different datasets of Landsat-8 images (inter-dataset validation) or datasets from a different sensor with similar radiometric characteristics such as Sentinel-2 (cross-sensor validation). The results suggest that (i) the development of cloud detection methods for new satellites can be based on deep learning models trained on data from similar sensors and (ii) there is a strong dependence of deep learning models on the dataset used for training and testing, which highlights the necessity of standardized datasets and procedures for benchmarking cloud detection models in the future.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2611
Author(s):  
Andrew Shepley ◽  
Greg Falzon ◽  
Christopher Lawson ◽  
Paul Meek ◽  
Paul Kwan

Image data is one of the primary sources of ecological data used in biodiversity conservation and management worldwide. However, classifying and interpreting large numbers of images is time and resource expensive, particularly in the context of camera trapping. Deep learning models have been used to achieve this task but are often not suited to specific applications due to their inability to generalise to new environments and inconsistent performance. Models need to be developed for specific species cohorts and environments, but the technical skills required to achieve this are a key barrier to the accessibility of this technology to ecologists. Thus, there is a strong need to democratize access to deep learning technologies by providing an easy-to-use software application allowing non-technical users to train custom object detectors. U-Infuse addresses this issue by providing ecologists with the ability to train customised models using publicly available images and/or their own images without specific technical expertise. Auto-annotation and annotation editing functionalities minimize the constraints of manually annotating and pre-processing large numbers of images. U-Infuse is a free and open-source software solution that supports both multiclass and single class training and object detection, allowing ecologists to access deep learning technologies usually only available to computer scientists, on their own device, customised for their application, without sharing intellectual property or sensitive data. It provides ecological practitioners with the ability to (i) easily achieve object detection within a user-friendly GUI, generating a species distribution report, and other useful statistics, (ii) custom train deep learning models using publicly available and custom training data, (iii) achieve supervised auto-annotation of images for further training, with the benefit of editing annotations to ensure quality datasets. Broad adoption of U-Infuse by ecological practitioners will improve ecological image analysis and processing by allowing significantly more image data to be processed with minimal expenditure of time and resources, particularly for camera trap images. Ease of training and use of transfer learning means domain-specific models can be trained rapidly, and frequently updated without the need for computer science expertise, or data sharing, protecting intellectual property and privacy.


2021 ◽  
Vol 66 (1) ◽  
pp. 175-187
Author(s):  
Duong Phung Thai ◽  
Son Ton

On the basis of using practical methods, satellite image processing methods, the vegetation coverage classification system of the study area, interpretation key for the study area, classification and post-classification pro cessing, this research introduces how to exploit and process multi-temporal satellite images in evaluating the changes of forest area. Landsat 4, 5 TM and Landsat 8 OLI remote sensing image data were used to evaluate the changes in the area of mangrove forests (RNM) in Ca Mau province in the periods of 1988 - 1998, 1998 - 2013, 2013 - 2018, and 1988 - 2018. The results of the image interpretation in 1988, 1998, 2013, 2018 and the overlapping of the above maps show: In the 30-year period from 1988 to 2018, the total area of mangroves in Ca Mau province was decreased by 28% compared to the beginning, from 71,093.3 ha in 1988 reduced to 51,363.5 ha in 2018, decreasing by 19,729.8 ha. The recovery speed of mangroves is 2 times lower than their disappearance speed. Specifically, from 1988 to 2018, mangroves disappeared on an area of 42,534.9 hectares and appeared on the new area of 22,805 hectares, only 12,154.5 hectares of mangroves remained unchanged. The fluctuation of mangrove area in Ca Mau province is related to the process of deforestation to dig shrimp ponds, coastal erosion, the formation of mangroves on new coastal alluvial lands and soil dunes in estuaries, as well as planting new mangroves in inefficient shrimp ponds.


Author(s):  
Made Arya Bhaskara Putra ◽  
I Wayan Nuarsa ◽  
I Wayan Sandi Adnyana

Rice crop is one of the important commodities that must always be available, so estimation of rice production becomes very important to do before harvesting time to know the food availability. The technology that can be used is remote sensing technology using Landsat 8 Satellite. The aims of this study were (1) to obtain the model of estimation of rice production with Landsat 8 image analysis, and (2) to know the accuracy of the model that obtained by Landsat 8. The research area is located in three sub-districts in Klungkung regency. Analysis in this research was conducted by single band analysis and analysis of vegetation index of satellite image of Landsat 8. Estimation model of rice production was developed by finding the relationship between satellite image data and rice production data. The final stage is the accuracy test of the rice production estimation model, with t test and regression analysis. The results showed: (1) estimation of rice production can be calculated between 67 to 77 days after planting; (2) there was a positive correlation between NDVI (Normalized Difference Vegetation Index) vegetation index value with rice yield; (3) the model of rice production estimation is y = 2.0442e1.8787x (x is NDVI value of Landsat 8 and y is rice production); (4) The results of the model accuracy test showed that the obtained model is suitable to predict rice production with accuracy level is 89.29% and standard error of production estimation is + 0.443 ton/ha. Based on research results, it can be concluded that Landsat 8 Satellite image can be used to estimate rice production and the accuracy level is 89.29%. The results are expected to be a reference in estimating rice production in Klungkung Regency.


2020 ◽  
Vol 6 (11) ◽  
pp. 125 ◽  
Author(s):  
Albert Comelli ◽  
Claudia Coronnello ◽  
Navdeep Dahiya ◽  
Viviana Benfante ◽  
Stefano Palmucci ◽  
...  

Background: The aim of this work is to identify an automatic, accurate, and fast deep learning segmentation approach, applied to the parenchyma, using a very small dataset of high-resolution computed tomography images of patients with idiopathic pulmonary fibrosis. In this way, we aim to enhance the methodology performed by healthcare operators in radiomics studies where operator-independent segmentation methods must be used to correctly identify the target and, consequently, the texture-based prediction model. Methods: Two deep learning models were investigated: (i) U-Net, already used in many biomedical image segmentation tasks, and (ii) E-Net, used for image segmentation tasks in self-driving cars, where hardware availability is limited and accurate segmentation is critical for user safety. Our small image dataset is composed of 42 studies of patients with idiopathic pulmonary fibrosis, of which only 32 were used for the training phase. We compared the performance of the two models in terms of the similarity of their segmentation outcome with the gold standard and in terms of their resources’ requirements. Results: E-Net can be used to obtain accurate (dice similarity coefficient = 95.90%), fast (20.32 s), and clinically acceptable segmentation of the lung region. Conclusions: We demonstrated that deep learning models can be efficiently applied to rapidly segment and quantify the parenchyma of patients with pulmonary fibrosis, without any radiologist supervision, in order to produce user-independent results.


2019 ◽  
Vol 136 ◽  
pp. 06032
Author(s):  
Kun Ding ◽  
Chen Yang ◽  
Chuan-hua Zhu ◽  
Yong Zhang ◽  
Hui Zhang ◽  
...  

Total phosphorus (TP) in water is an important indicator reflecting water environment and water ecology. If the concentration exceeds the standard, it will directly lead to eutrophication. The daily monitoring of total phosphorus in water bodies has already mentioned the important agenda of environmental protection, while the routine testing has a large workload and heavy tasks. We used satellite remote sensing technology to extract image data and establish a mathematical models, what was used to invert the total phosphorus concentration in water. Taking the Ring River as an example, we selected different time nodes to sample and measure the TP value, and use the landsat-8 image data to establish a semi-empirical regression model. The model structure, the calculation results found that the error with the measured data is within the controllable range. The method is simple in operation, saves resources, manpower and financial resources, and can accurately reflect the actual situation of the water body TP.


Author(s):  
Mohammad Shahab Uddin ◽  
Jiang Li

Deep learning models are data driven. For example, the most popular convolutional neural network (CNN) model used for image classification or object detection requires large labeled databases for training to achieve competitive performances. This requirement is not difficult to be satisfied in the visible domain since there are lots of labeled video and image databases available nowadays. However, given the less popularity of infrared (IR) camera, the availability of labeled infrared videos or image databases is limited. Therefore, training deep learning models in infrared domain is still challenging. In this chapter, we applied the pix2pix generative adversarial network (Pix2Pix GAN) and cycle-consistent GAN (Cycle GAN) models to convert visible videos to infrared videos. The Pix2Pix GAN model requires visible-infrared image pairs for training while the Cycle GAN relaxes this constraint and requires only unpaired images from both domains. We applied the two models to an open-source database where visible and infrared videos provided by the signal multimedia and telecommunications laboratory at the Federal University of Rio de Janeiro. We evaluated conversion results by performance metrics including Inception Score (IS), Frechet Inception Distance (FID) and Kernel Inception Distance (KID). Our experiments suggest that cycle-consistent GAN is more effective than pix2pix GAN for generating IR images from optical images.


Sign in / Sign up

Export Citation Format

Share Document