scholarly journals Generative Adversarial Networks for Visible to Infrared Video Conversion

Author(s):  
Mohammad Shahab Uddin ◽  
Jiang Li

Deep learning models are data driven. For example, the most popular convolutional neural network (CNN) model used for image classification or object detection requires large labeled databases for training to achieve competitive performances. This requirement is not difficult to be satisfied in the visible domain since there are lots of labeled video and image databases available nowadays. However, given the less popularity of infrared (IR) camera, the availability of labeled infrared videos or image databases is limited. Therefore, training deep learning models in infrared domain is still challenging. In this chapter, we applied the pix2pix generative adversarial network (Pix2Pix GAN) and cycle-consistent GAN (Cycle GAN) models to convert visible videos to infrared videos. The Pix2Pix GAN model requires visible-infrared image pairs for training while the Cycle GAN relaxes this constraint and requires only unpaired images from both domains. We applied the two models to an open-source database where visible and infrared videos provided by the signal multimedia and telecommunications laboratory at the Federal University of Rio de Janeiro. We evaluated conversion results by performance metrics including Inception Score (IS), Frechet Inception Distance (FID) and Kernel Inception Distance (KID). Our experiments suggest that cycle-consistent GAN is more effective than pix2pix GAN for generating IR images from optical images.

2019 ◽  
Vol 8 (7) ◽  
pp. 294
Author(s):  
Han Zheng ◽  
Zanyang Cui ◽  
Xingchen Zhang

Driving modes play vital roles in understanding the stochastic nature of a railway system and can support studies of automatic driving and capacity utilization optimization. Integrated trajectory data containing information such as GPS trajectories and gear changes can be good proxies in the study of driving modes. However, in the absence of labeled data, discovering driving modes is challenging. In this paper, instead of classical models (railway-specified feature extraction and classical clustering), we used five deep unsupervised learning models to overcome this difficulty. In these models, adversarial autoencoders and stacked autoencoders are used as feature extractors, along with generative adversarial network-based and Kullback–Leibler (KL) divergence-based networks as clustering models. An experiment based on real and artificial datasets showed the following: (i) The proposed deep learning models outperform the classical models by 27.64% on average. (ii) Integrated trajectory data can improve the accuracy of unsupervised learning by approximately 13.78%. (iii) The different performance rankings of models based on indices with labeled data and indices without labeled data demonstrate the insufficiency of people’s understanding of the existing modes. This study also analyzes the relationship between the discovered modes and railway carrying capacity.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 29
Author(s):  
Manas Bazarbaev ◽  
Tserenpurev Chuluunsaikhan ◽  
Hyoseok Oh ◽  
Ga-Ae Ryu ◽  
Aziz Nasridinov ◽  
...  

Product quality is a major concern in manufacturing. In the metal processing industry, low-quality products must be remanufactured, which requires additional labor, money, and time. Therefore, user-controllable variables for machines and raw material compositions are key factors for ensuring product quality. In this study, we propose a method for generating the time-series working patterns of the control variables for metal-melting induction furnaces and continuous casting machines, thus improving product quality by aiding machine operators. We used an auxiliary classifier generative adversarial network (AC-GAN) model to generate time-series working patterns of two processes depending on product type and additional material data. To check accuracy, the difference between the generated time-series data of the model and the ground truth data was calculated. Specifically, the proposed model results were compared with those of other deep learning models: multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), and gated recurrent unit (GRU). It was demonstrated that the proposed model outperformed the other deep learning models. Moreover, the proposed method generated different time-series data for different inputs, whereas the other deep learning models generated the same time-series data.


2020 ◽  
Vol 12 (6) ◽  
pp. 2475 ◽  
Author(s):  
Jae-joon Chung ◽  
Hyun-Jung Kim

This paper elucidates the development of a deep learning–based driver assistant that can prevent driving accidents arising from drowsiness. As a precursor to this assistant, the relationship between the sensation of sleep depravity among drivers during long journeys and CO2 concentrations in vehicles is established. Multimodal signals are collected by the assistant using five sensors that measure the levels of CO, CO2, and particulate matter (PM), as well as the temperature and humidity. These signals are then transmitted to a server via the Internet of Things, and a deep neural network utilizes this information to analyze the air quality in the vehicle. The deep network employs long short-term memory (LSTM), skip-generative adversarial network (GAN), and variational auto-encoder (VAE) models to build an air quality anomaly detection model. The deep learning models gather data via LSTM, while the semi-supervised deep learning models collect data via GANs and VAEs. The purpose of this assistant is to provide vehicle air quality information, such as PM alerts and sleep-deprived driving alerts, to drivers in real time and thereby prevent accidents.


2021 ◽  
Vol 13 (16) ◽  
pp. 3257
Author(s):  
Mohammad Shahab Uddin ◽  
Reshad Hoque ◽  
Kazi Aminul Islam ◽  
Chiman Kwan ◽  
David Gribben ◽  
...  

To apply powerful deep-learning-based algorithms for object detection and classification in infrared videos, it is necessary to have more training data in order to build high-performance models. However, in many surveillance applications, one can have a lot more optical videos than infrared videos. This lack of IR video datasets can be mitigated if optical-to-infrared video conversion is possible. In this paper, we present a new approach for converting optical videos to infrared videos using deep learning. The basic idea is to focus on target areas using attention generative adversarial network (attention GAN), which will preserve the fidelity of target areas. The approach does not require paired images. The performance of the proposed attention GAN has been demonstrated using objective and subjective evaluations. Most importantly, the impact of attention GAN has been demonstrated in improved target detection and classification performance using real-infrared videos.


2020 ◽  
Author(s):  
Erdi Acar ◽  
Engin Şahin ◽  
İhsan Yılmaz

AbstractComputerized Tomography (CT) has a prognostic role in the early diagnosis of COVID-19 due to it gives both fast and accurate results. This is very important to help decision making of clinicians for quick isolation and appropriate patient treatment. In this study, we combine methods such as segmentation, data augmentation and the generative adversarial network (GAN) to improve the effectiveness of learning models. We obtain the best performance with 99% accuracy for lung segmentation. Using the above improvements we get the highest rates in terms of accuracy (99.8%), precision (99.8%), recall (99.8%), f1-score (99.8%) and roc acu (99.9979%) with deep learning methods in this paper. Also we compare popular deep learning-based frameworks such as VGG16, VGG19, Xception, ResNet50, ResNet50V2, DenseNet121, DenseNet169, InceptionV3 and InceptionResNetV2 for automatic COVID-19 classification. The DenseNet169 amongst deep convolutional neural networks achieves the best performance with 99.8% accuracy. The second-best learner is InceptionResNetV2 with accuracy of 99.65%. The third-best learner is Xception and InceptionV3 with accuracy of 99.60%.


Information ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 249
Author(s):  
Xin Jin ◽  
Yuanwen Zou ◽  
Zhongbing Huang

The cell cycle is an important process in cellular life. In recent years, some image processing methods have been developed to determine the cell cycle stages of individual cells. However, in most of these methods, cells have to be segmented, and their features need to be extracted. During feature extraction, some important information may be lost, resulting in lower classification accuracy. Thus, we used a deep learning method to retain all cell features. In order to solve the problems surrounding insufficient numbers of original images and the imbalanced distribution of original images, we used the Wasserstein generative adversarial network-gradient penalty (WGAN-GP) for data augmentation. At the same time, a residual network (ResNet) was used for image classification. ResNet is one of the most used deep learning classification networks. The classification accuracy of cell cycle images was achieved more effectively with our method, reaching 83.88%. Compared with an accuracy of 79.40% in previous experiments, our accuracy increased by 4.48%. Another dataset was used to verify the effect of our model and, compared with the accuracy from previous results, our accuracy increased by 12.52%. The results showed that our new cell cycle image classification system based on WGAN-GP and ResNet is useful for the classification of imbalanced images. Moreover, our method could potentially solve the low classification accuracy in biomedical images caused by insufficient numbers of original images and the imbalanced distribution of original images.


2021 ◽  
Vol 63 (9) ◽  
pp. 529-533
Author(s):  
Jiali Zhang ◽  
Yupeng Tian ◽  
LiPing Ren ◽  
Jiaheng Cheng ◽  
JinChen Shi

Reflection in images is common and the removal of complex noise such as image reflection is still being explored. The problem is difficult and ill-posed, not only because there is no mixing function but also because there are no constraints in the output space (the processed image). When it comes to detecting defects on metal surfaces using infrared thermography, reflection from smooth metal surfaces can easily affect the final detection results. Therefore, it is essential to remove the reflection interference in infrared images. With the continuous application and expansion of neural networks in the field of image processing, researchers have tried to apply neural networks to remove image reflection. However, they have mainly focused on reflection interference removal in visible images and it is believed that no researchers have applied neural networks to remove reflection interference in infrared images. In this paper, the authors introduce the concept of a conditional generative adversarial network (cGAN) and propose an end-to-end trained network based on this with two types of loss: perceptual loss and adversarial loss. A self-built infrared reflection image dataset from an infrared camera is used. The experimental results demonstrate the effectiveness of this GAN for removing infrared image reflection.


2021 ◽  
Author(s):  
James Howard ◽  
◽  
Joe Tracey ◽  
Mike Shen ◽  
Shawn Zhang ◽  
...  

Borehole image logs are used to identify the presence and orientation of fractures, both natural and induced, found in reservoir intervals. The contrast in electrical or acoustic properties of the rock matrix and fluid-filled fractures is sufficiently large enough that sub-resolution features can be detected by these image logging tools. The resolution of these image logs is based on the design and operation of the tools, and generally is in the millimeter per pixel range. Hence the quantitative measurement of actual width remains problematic. An artificial intelligence (AI) -based workflow combines the statistical information obtained from a Machine-Learning (ML) segmentation process with a multiple-layer neural network that defines a Deep Learning process that enhances fractures in a borehole image. These new images allow for a more robust analysis of fracture widths, especially those that are sub-resolution. The images from a BHTV log were first segmented into rock and fluid-filled fractures using a ML-segmentation tool that applied multiple image processing filters that captured information to describe patterns in fracture-rock distribution based on nearest-neighbor behavior. The robust ML analysis was trained by users to identify these two components over a short interval in the well, and then the regression model-based coefficients applied to the remaining log. Based on the training, each pixel was assigned a probability value between 1.0 (being a fracture) and 0.0 (pure rock), with most of the pixels assigned one of these two values. Intermediate probabilities represented pixels on the edge of rock-fracture interface or the presence of one or more sub-resolution fractures within the rock. The probability matrix produced a map or image of the distribution of probabilities that determined whether a given pixel in the image was a fracture or partially filled with a fracture. The Deep Learning neural network was based on a Conditional Generative Adversarial Network (cGAN) approach where the probability map was first encoded and combined with a noise vector that acted as a seed for diverse feature generation. This combination was used to generate new images that represented the BHTV response. The second layer of the neural network, the adversarial or discriminator portion, determined whether the generated images were representative of the actual BHTV by comparing the generated images with actual images from the log and producing an output probability of whether it was real or fake. This probability was then used to train the generator and discriminator models that were then applied to the entire log. Several scenarios were run with different probability maps. The enhanced BHTV images brought out fractures observed in the core photos that were less obvious in the original BTHV log through enhanced continuity and improved resolution on fracture widths.


Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3913 ◽  
Author(s):  
Mingxuan Li ◽  
Ou Li ◽  
Guangyi Liu ◽  
Ce Zhang

With the recently explosive growth of deep learning, automatic modulation recognition has undergone rapid development. Most of the newly proposed methods are dependent on large numbers of labeled samples. We are committed to using fewer labeled samples to perform automatic modulation recognition in the cognitive radio domain. Here, a semi-supervised learning method based on adversarial training is proposed which is called signal classifier generative adversarial network. Most of the prior methods based on this technology involve computer vision applications. However, we improve the existing network structure of a generative adversarial network by adding the encoder network and a signal spatial transform module, allowing our framework to address radio signal processing tasks more efficiently. These two technical improvements effectively avoid nonconvergence and mode collapse problems caused by the complexity of the radio signals. The results of simulations show that compared with well-known deep learning methods, our method improves the classification accuracy on a synthetic radio frequency dataset by 0.1% to 12%. In addition, we verify the advantages of our method in a semi-supervised scenario and obtain a significant increase in accuracy compared with traditional semi-supervised learning methods.


2021 ◽  
Author(s):  
Nithin G R ◽  
Nitish Kumar M ◽  
Venkateswaran Narasimhan ◽  
Rajanikanth Kakani ◽  
Ujjwal Gupta ◽  
...  

Pansharpening is the task of creating a High-Resolution Multi-Spectral Image (HRMS) by extracting and infusing pixel details from the High-Resolution Panchromatic Image into the Low-Resolution Multi-Spectral (LRMS). With the boom in the amount of satellite image data, researchers have replaced traditional approaches with deep learning models. However, existing deep learning models are not built to capture intricate pixel-level relationships. Motivated by the recent success of self-attention mechanisms in computer vision tasks, we propose Pansformers, a transformer-based self-attention architecture, that computes band-wise attention. A further improvement is proposed in the attention network by introducing a Multi-Patch Attention mechanism, which operates on non-overlapping, local patches of the image. Our model is successful in infusing relevant local details from the Panchromatic image while preserving the spectral integrity of the MS image. We show that our Pansformer model significantly improves the performance metrics and the output image quality on imagery from two satellite distributions IKONOS and LANDSAT-8.


Sign in / Sign up

Export Citation Format

Share Document