Cross-Spectral Local Descriptors via Quadruplet Network

This paper presents a novel CNN-based architecture, referred to as Q-Net, to learn local feature descriptors that are useful for matching image patches from two different spectral bands. Given correctly matched and non-matching cross-spectral image pairs, a quadruplet network is trained to map input image patches to a common Euclidean space, regardless of the input spectral band. Our approach is inspired by the recent success of triplet networks in the visible spectrum, but adapted for cross-spectral scenarios, where for each matching pair there are always two possible non-matching patches; one for each spectrum. Experimental evaluations on a public cross-spectral VIS-NIR dataset shows that the proposed approach improves the state-of-the-art. Moreover, the proposed technique can also be used in mono-spectral settings, obtaining a similar performance to triplet network descriptors, but requiring less training data.

Download Full-text

Unsupervised Cross-Spectral Stereo Matching by Learning to Synthesize

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018706 ◽

2019 ◽

Vol 33 ◽

pp. 8706-8713 ◽

Cited By ~ 1

Author(s):

Mingyang Liang ◽

Xiaoyang Guo ◽

Hongsheng Li ◽

Xiaogang Wang ◽

You Song

Keyword(s):

Semantic Information ◽

Stereo Matching ◽

Spectral Image ◽

Adversarial Learning ◽

Additional Information ◽

Spectral Bands ◽

Image Translation ◽

View Reconstruction ◽

Image Pairs ◽

Spectral Translation

Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any depth or disparity supervision. The estimated depth provides additional information complementary to original images, which can be helpful for other vision tasks such as tracking, recognition and detection. However, there are large appearance variations between images from different spectral bands, which is a challenge for cross-spectral stereo matching. Existing deep unsupervised stereo matching methods are sensitive to the appearance variations and do not perform well on cross-spectral data. We propose a novel unsupervised crossspectral stereo matching framework based on image-to-image translation. First, a style adaptation network transforms images across different spectral bands by cycle consistency and adversarial learning, during which appearance variations are minimized. Then, a stereo matching network is trained with image pairs from the same spectra using view reconstruction loss. At last, the estimated disparity is utilized to supervise the spectral translation network in an end-to-end way. Moreover, a novel style adaptation network F-cycleGAN is proposed to improve the robustness of spectral translation. Our method can tackle appearance variations and enhance the robustness of unsupervised cross-spectral stereo matching. Experimental results show that our method achieves good performance without using depth supervision or explicit semantic information.

Download Full-text

Deep bilinear features for Her2 scoring in digital pathology

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2017-0171 ◽

2017 ◽

Vol 3 (2) ◽

pp. 811-814 ◽

Cited By ~ 4

Author(s):

Erik Rodner ◽

Marcel Simon ◽

Joachim Denzler

Keyword(s):

Digital Pathology ◽

Training Data ◽

Large Set ◽

Second Order Statistics ◽

Over Expression ◽

Point Of Interest ◽

Winning Team ◽

Image Patches ◽

Parameterized Model ◽

The University

AbstractWe present an automated approach for rating HER2 over-expressions in given whole-slide images of breast cancer histology slides. The slides have a very high resolution and only a small part of it is relevant for the rating.Our approach is based on Convolutional Neural Networks (CNN), which are directly modelling the whole computer vision pipeline, from feature extraction to classification, with a single parameterized model. CNN models have led to a significant breakthrough in a lot of vision applications and showed promising results for medical tasks. However, the required size of training data is still an issue. Our CNN models are pre-trained on a large set of datasets of non-medical images, which prevents over-fitting to the small annotated dataset available in our case. We assume the selection of the probe in the data with just a single mouse click defining a point of interest. This is reasonable especially for slices acquired together with another sample. We sample image patches around the point of interest and obtain bilinear features by passing them through a CNN and encoding the output of the last convolutional layer with its second-order statistics.Our approach ranked second in the Her2 contest held by the University of Warwick achieving 345 points compared to 348 points of the winning team. In addition to pure classification, our approach would also allow for localization of parts of the slice relevant for visual detection of Her2 over-expression.

Download Full-text

LEARNING THE 3D POSE OF VEHICLES FROM 2D VEHICLE PATCHES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2020-683-2020 ◽

2020 ◽

Vol XLIII-B2-2020 ◽

pp. 683-688

Author(s):

C. Koetsier ◽

T. Peters ◽

M. Sester

Keyword(s):

Elevation Angle ◽

Training Data ◽

Training Dataset ◽

Image Patch ◽

Movement Trajectories ◽

Surveillance Camera ◽

Image Patches ◽

Ground Point ◽

Precise Movement ◽

Real Time Applications

Abstract. Estimating vehicle poses is crucial for generating precise movement trajectories from (surveillance) camera data. Additionally for real time applications this task has to be solved in an efficient way. In this paper we introduce a deep convolutional neural network for pose estimation of vehicles from image patches. For a given 2D image patch our approach estimates the 2D coordinates of the image representing the exact center ground point (cx, cy) and the orientation of the vehicle - represented by the elevation angle (e) of the camera with respect to the vehicle’s center ground point and the azimuth rotation (a) of the vehicle with respect to the camera. To train a accurate model a large and diverse training dataset is needed. Collecting and labeling such large amount of data is very time consuming and expensive. Due to the lack of a sufficient amount of training data we show furthermore, that also rendered 3D vehicle models with artificial generated textures are nearly adequate for training.

Download Full-text

TOPOGRAPHIC EFFECT ON SPECTRAL VEGETATION INDICES FROM LANDSAT TM DATA: IS TOPOGRAPHIC CORRECTION NECESSARY?

Boletim de Ciências Geodésicas ◽

10.1590/s1982-21702016000100006 ◽

2016 ◽

Vol 22 (1) ◽

pp. 95-107 ◽

Cited By ~ 9

Author(s):

Eder Paulo Moreira* ◽

Márcio de Morisson Valeriano ◽

Ieda Del Arco Sanches ◽

Antonio Roberto Formaggio

Keyword(s):

Near Infrared ◽

Incidence Angle ◽

Spectral Band ◽

Vegetation Indices ◽

Topographic Effect ◽

Topographic Correction ◽

Spectral Bands ◽

Spectral Vegetation Indices ◽

Current Availability ◽

Elevation Model

The full potentiality of spectral vegetation indices (VIs) can only be evaluated after removing topographic, atmospheric and soil background effects from radiometric data. Concerning the former effect, the topographic effect was barely investigated in the context of VIs, despite the current availability correction methods and Digital elevation Model (DEM). In this study, we performed topographic correction on Landsat 5 TM spectral bands and evaluated the topographic effect on four VIs: NDVI, RVI, EVI and SAVI. The evaluation was based on analyses of mean and standard deviation of VIs and TM band 4 (near-infrared), and on linear regression analyses between these variables and the cosine of the solar incidence angle on terrain surface (cos i). The results indicated that VIs are less sensitive to topographic effect than the uncorrected spectral band. Among VIs, NDVI and RVI were less sensitive to topographic effect than EVI and SAVI. All VIs showed to be fully independent of topographic effect only after correction. It can be concluded that the topographic correction is required for a consistent reduction of the topographic effect on the VIs from rugged terrain.

Download Full-text

NEURAL NETWORK METHODS FOR PLANAR IMAGE ANALYSIS IN AUTOMATED SCREENING SYSTEMS

Applied Aspects of Information Technology ◽

10.15276/aait.01.2021.6 ◽

2021 ◽

Vol 4 (1) ◽

pp. 71-79

Author(s):

Borys Igorovych Tymchenko

Keyword(s):

Neural Network ◽

Neural Networks ◽

Early Stage ◽

Semantic Segmentation ◽

Input Image ◽

Training Data ◽

Classification Task ◽

Automated Screening ◽

The Cost ◽

Screening Systems

Nowadays, means of preventive management in various spheres of human life are actively developing. The task of automated screening is to detect hidden problems at an early stage without human intervention, while the cost of responding to them is low. Visual inspection is often used to perform a screening task. Deep artificial neural networks are especially popular in image processing. One of the main problems when working with them is the need for a large amount of well-labeled data for training. In automated screening systems, available neural network approaches have limitations on the reliability of predictions due to the lack of accurately marked training data, as obtaining quality markup from professionals is very expensive, and sometimes not possible in principle. Therefore, there is a contradiction between increasing the requirements for the precision of predictions of neural network models without increasing the time spent on the one hand, and the need to reduce the cost of obtaining the markup of educational data. In this paper, we propose the parametric model of the segmentation dataset, which can be used to generate training data for model selection and benchmarking; and the multi-task learning method for training and inference of deep neural networks for semantic segmentation. Based on the proposed method, we develop a semi-supervised approach for segmentation of salient regions for classification task. The main advantage of the proposed method is that it uses semantically-similar general tasks, that have better labeling than original one, what allows users to reduce the cost of the labeling process. We propose to use classification task as a more general to the problem of semantic segmentation. As semantic segmentation aims to classify each pixel in the input image, classification aims to assign a class to all of the pixels in the input image. We evaluate our methods using the proposed dataset model, observing the Dice score improvement by seventeen percent. Additionally, we evaluate the robustness of the proposed method to different amount of the noise in labels and observe consistent improvement over baseline version.

Download Full-text

Experiment of OCITN: Considering Appropriate Goal Images and Metric for One-Class Image Transformation Network

10.3233/faia210045 ◽

2021 ◽

Author(s):

Toshitaka Hayashi ◽

Hamido Fujita

Keyword(s):

Image Data ◽

Classification Problem ◽

Model Error ◽

Input Image ◽

Training Data ◽

Distance Metric ◽

Image Transformation ◽

Image Entropy ◽

Image Metrics ◽

One Class Classification

One-class classification (OCC) is a classification problem where training data includes only one class. In such a problem, two types of classes exist, seen class and unseen class, and classifying these classes is a challenge. Besides, One-class Image Transformation Network (OCITN) is an OCC algorithm for image data. In which, image transformation network (ITN) is trained. ITN aims to transform all input image into one image, namely goal image. Moreover, the model error of ITN is computed as a distance metric between ITN output and a goal image. Besides, OCITN accuracy is related to goal image, and finding an appropriate goal image is challenging. In this paper, 234 goal images are experimented with in OCITN using the CIFAR10 dataset. Experiment results are analyzed with three image metrics: image entropy, similarity with seen images, and image derivatives.

Download Full-text

Radiative Properties and Numerical Modeling of C4F7N-CO2-O2 Thermal Plasma

Plasma Physics and Technology Journal ◽

10.14311/ppt.2019.2.144 ◽

2019 ◽

Vol 6 (2) ◽

pp. 144-147

Author(s):

M. Gnybida ◽

Ch. Ruempler ◽

V. R. T. Narayanan

Keyword(s):

Thermal Plasma ◽

Spectral Band ◽

Optimization Procedure ◽

Optimal Number ◽

Radiative Properties ◽

Absorption Coefficients ◽

Medium Voltage ◽

Spectral Bands ◽

Computational Fluid Dynamics Cfd ◽

Low Global Warming Potential

C4F7N and C4F7N-CO2 mixtures are considered as alternatives to SF6 for use in medium voltage gas insulated switchgear applications (GIS), due to the low global warming potential and good dielectric properties of C4F7N. Current work is focused on the calculation of radiative properties (absorption coefficients) of C4F7N-CO2 thermal plasma and computational fluid dynamics (CFD) simulations of free burning C4F7N-CO2 arcs that are stabilized by natural convection. Absorption coefficients of C4F7N-CO2 plasma used in the CFD model are derived from spectral absorption coefficients by Planck averaging. An optimization procedure has been applied to find the optimal number of spectral bands as well as spectral band interval boundaries. Radiation and flow model results for C4F7N-CO2 in comparison to SF6 and air are provided and discussed.

Download Full-text

Structural Similarity Loss for Learning to Fuse Multi-Focus Images

Sensors ◽

10.3390/s20226647 ◽

2020 ◽

Vol 20 (22) ◽

pp. 6647

Author(s):

Xiang Yan ◽

Syed Zulqarnain Gilani ◽

Hanlin Qin ◽

Ajmal Mian

Keyword(s):

Ground Truth ◽

Structural Similarity ◽

Input Image ◽

Test Time ◽

Suggested Approach ◽

Extensive Evaluation ◽

Fused Image ◽

Benchmark Datasets ◽

Focus Image ◽

Image Pairs

Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as ‘focused’ or ‘defocused’, and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks.

Download Full-text

Covering Graphs, Magnetic Spectral Gaps and Applications to Polymers and Nanoribbons

Symmetry ◽

10.3390/sym11091163 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1163 ◽

Cited By ~ 2

Author(s):

John Stewart Fabila-Carrasco ◽

Fernando Lledó

Keyword(s):

Magnetic Field ◽

Control Parameter ◽

Spectral Band ◽

Sufficient Conditions ◽

Magnetic Potential ◽

Spectral Gaps ◽

Spectral Bands ◽

Covering Graph ◽

Abelian Lattice ◽

Band Gap Structure

In this article, we analyze the spectrum of discrete magnetic Laplacians (DML) on an infinite covering graph G ˜ → G = G ˜ / Γ with (Abelian) lattice group Γ and periodic magnetic potential β ˜ . We give sufficient conditions for the existence of spectral gaps in the spectrum of the DML and study how these depend on β ˜ . The magnetic potential can be interpreted as a control parameter for the spectral bands and gaps. We apply these results to describe the spectral band/gap structure of polymers (polyacetylene) and nanoribbons in the presence of a constant magnetic field.

Download Full-text

Facial Expression Recognition Based on Weighted-Cluster Loss and Deep Transfer Learning Using a Highly Imbalanced Dataset

Sensors ◽

10.3390/s20092639 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2639

Author(s):

Quan T. Ngo ◽

Seokhoon Yoon

Keyword(s):

Facial Expression ◽

Transfer Learning ◽

Loss Function ◽

Real World ◽

Facial Expression Recognition ◽

Training Data ◽

Fine Tuning ◽

Expression Recognition ◽

Recent Success ◽

Deep Cnn

Facial expression recognition (FER) is a challenging problem in the fields of pattern recognition and computer vision. The recent success of convolutional neural networks (CNNs) in object detection and object segmentation tasks has shown promise in building an automatic deep CNN-based FER model. However, in real-world scenarios, performance degrades dramatically owing to the great diversity of factors unrelated to facial expressions, and due to a lack of training data and an intrinsic imbalance in the existing facial emotion datasets. To tackle these problems, this paper not only applies deep transfer learning techniques, but also proposes a novel loss function called weighted-cluster loss, which is used during the fine-tuning phase. Specifically, the weighted-cluster loss function simultaneously improves the intra-class compactness and the inter-class separability by learning a class center for each emotion class. It also takes the imbalance in a facial expression dataset into account by giving each emotion class a weight based on its proportion of the total number of images. In addition, a recent, successful deep CNN architecture, pre-trained in the task of face identification with the VGGFace2 database from the Visual Geometry Group at Oxford University, is employed and fine-tuned using the proposed loss function to recognize eight basic facial emotions from the AffectNet database of facial expression, valence, and arousal computing in the wild. Experiments on an AffectNet real-world facial dataset demonstrate that our method outperforms the baseline CNN models that use either weighted-softmax loss or center loss.

Download Full-text