scholarly journals Single- and Cross-Modality Near Duplicate Image Pairs Detection via Spatial Transformer Comparing CNN

Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 255
Author(s):  
Yi Zhang ◽  
Shizhou Zhang ◽  
Ying Li ◽  
Yanning Zhang

Recently, both single modality and cross modality near-duplicate image detection tasks have received wide attention in the community of pattern recognition and computer vision. Existing deep neural networks-based methods have achieved remarkable performance in this task. However, most of the methods mainly focus on the learning of each image from the image pair, thus leading to less use of the information between the near duplicate image pairs to some extent. In this paper, to make more use of the correlations between image pairs, we propose a spatial transformer comparing convolutional neural network (CNN) model to compare near-duplicate image pairs. Specifically, we firstly propose a comparing CNN framework, which is equipped with a cross-stream to fully learn the correlation information between image pairs, while considering the features of each image. Furthermore, to deal with the local deformations led by cropping, translation, scaling, and non-rigid transformations, we additionally introduce a spatial transformer comparing CNN model by incorporating a spatial transformer module to the comparing CNN architecture. To demonstrate the effectiveness of the proposed method on both the single-modality and cross-modality (Optical-InfraRed) near-duplicate image pair detection tasks, we conduct extensive experiments on three popular benchmark datasets, namely CaliforniaND (ND means near duplicate), Mir-Flickr Near Duplicate, and TNO Multi-band Image Data Collection. The experimental results show that the proposed method can achieve superior performance compared with many state-of-the-art methods on both tasks.

Author(s):  
Chen Qi ◽  
Shibo Shen ◽  
Rongpeng Li ◽  
Zhifeng Zhao ◽  
Qing Liu ◽  
...  

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.


2021 ◽  
Vol 2050 (1) ◽  
pp. 012006
Author(s):  
Xili Dai ◽  
Chunmei Ma ◽  
Jingwei Sun ◽  
Tao Zhang ◽  
Haigang Gong ◽  
...  

Abstract Training deep neural networks from only a few examples has been an interesting topic that motivated few shot learning. In this paper, we study the fine-grained image classification problem in a challenging few-shot learning setting, and propose the Self-Amplificated Network (SAN), a method based on meta-learning to tackle this problem. The SAN model consists of three parts, which are the Encoder, Amplification and Similarity Modules. The Encoder Module encodes a fine-grained image input into a feature vector. The Amplification Module is used to amplify subtle differences between fine-grained images based on the self attention mechanism which is composed of multi-head attention. The Similarity Module measures how similar the query image and the support set are in order to determine the classification result. In-depth experiments on three benchmark datasets have showcased that our network achieves superior performance over the competing baselines.


2021 ◽  
Author(s):  
Mengke Li ◽  
Yiu-ming Cheung ◽  
Yang Lu

<p>Long-tailed data is still a big challenge for deep neural networks, even though they have achieved great success on balanced data. We observe that vanilla training on long-tailed data with cross-entropy loss makes the instance-rich head classes severely squeeze the spatial distribution of the tail classes, which leads to difficulty in classifying tail class samples. Furthermore, the original cross-entropy loss can only propagate gradient short-lively because the gradient in softmax form rapidly approaches zero as the logit difference increases. This phenomenon is called softmax saturation. It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper therefore proposes the Gaussian clouded logit adjustment by Gaussian perturbing different class logits with varied amplitude. We define the amplitude of perturbation as cloud size and set relatively large cloud sizes to tail classes. The large cloud size can reduce the softmax saturation and thereby making tail class samples more active as well as enlarging the embedding space. To alleviate the bias in the classifier, we accordingly propose the class-based effective number sampling strategy with classifier re-training. Extensive experiments on benchmark datasets validate the superior performance of the proposed method.</p><br>


2021 ◽  
Author(s):  
Mengke Li ◽  
Yiu-ming Cheung ◽  
Yang Lu

<p>Long-tailed data is still a big challenge for deep neural networks, even though they have achieved great success on balanced data. We observe that vanilla training on long-tailed data with cross-entropy loss makes the instance-rich head classes severely squeeze the spatial distribution of the tail classes, which leads to difficulty in classifying tail class samples. Furthermore, the original cross-entropy loss can only propagate gradient short-lively because the gradient in softmax form rapidly approaches zero as the logit difference increases. This phenomenon is called softmax saturation. It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper therefore proposes the Gaussian clouded logit adjustment by Gaussian perturbing different class logits with varied amplitude. We define the amplitude of perturbation as cloud size and set relatively large cloud sizes to tail classes. The large cloud size can reduce the softmax saturation and thereby making tail class samples more active as well as enlarging the embedding space. To alleviate the bias in the classifier, we accordingly propose the class-based effective number sampling strategy with classifier re-training. Extensive experiments on benchmark datasets validate the superior performance of the proposed method.</p><br>


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Narjes Rohani ◽  
Changiz Eslahchi

Abstract Drug-Drug Interaction (DDI) prediction is one of the most critical issues in drug development and health. Proposing appropriate computational methods for predicting unknown DDI with high precision is challenging. We proposed "NDD: Neural network-based method for drug-drug interaction prediction" for predicting unknown DDIs using various information about drugs. Multiple drug similarities based on drug substructure, target, side effect, off-label side effect, pathway, transporter, and indication data are calculated. At first, NDD uses a heuristic similarity selection process and then integrates the selected similarities with a nonlinear similarity fusion method to achieve high-level features. Afterward, it uses a neural network for interaction prediction. The similarity selection and similarity integration parts of NDD have been proposed in previous studies of other problems. Our novelty is to combine these parts with new neural network architecture and apply these approaches in the context of DDI prediction. We compared NDD with six machine learning classifiers and six state-of-the-art graph-based methods on three benchmark datasets. NDD achieved superior performance in cross-validation with AUPR ranging from 0.830 to 0.947, AUC from 0.954 to 0.994 and F-measure from 0.772 to 0.902. Moreover, cumulative evidence in case studies on numerous drug pairs, further confirm the ability of NDD to predict unknown DDIs. The evaluations corroborate that NDD is an efficient method for predicting unknown DDIs. The data and implementation of NDD are available at https://github.com/nrohani/NDD.


2021 ◽  
Vol 13 (14) ◽  
pp. 2656
Author(s):  
Furong Shi ◽  
Tong Zhang

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Fuyong Xing ◽  
Yuanpu Xie ◽  
Xiaoshuang Shi ◽  
Pingjun Chen ◽  
Zizhao Zhang ◽  
...  

Abstract Background Nucleus or cell detection is a fundamental task in microscopy image analysis and supports many other quantitative studies such as object counting, segmentation, tracking, etc. Deep neural networks are emerging as a powerful tool for biomedical image computing; in particular, convolutional neural networks have been widely applied to nucleus/cell detection in microscopy images. However, almost all models are tailored for specific datasets and their applicability to other microscopy image data remains unknown. Some existing studies casually learn and evaluate deep neural networks on multiple microscopy datasets, but there are still several critical, open questions to be addressed. Results We analyze the applicability of deep models specifically for nucleus detection across a wide variety of microscopy image data. More specifically, we present a fully convolutional network-based regression model and extensively evaluate it on large-scale digital pathology and microscopy image datasets, which consist of 23 organs (or cancer diseases) and come from multiple institutions. We demonstrate that for a specific target dataset, training with images from the same types of organs might be usually necessary for nucleus detection. Although the images can be visually similar due to the same staining technique and imaging protocol, deep models learned with images from different organs might not deliver desirable results and would require model fine-tuning to be on a par with those trained with target data. We also observe that training with a mixture of target and other/non-target data does not always mean a higher accuracy of nucleus detection, and it might require proper data manipulation during model training to achieve good performance. Conclusions We conduct a systematic case study on deep models for nucleus detection in a wide variety of microscopy images, aiming to address several important but previously understudied questions. We present and extensively evaluate an end-to-end, pixel-to-pixel fully convolutional regression network and report a few significant findings, some of which might have not been reported in previous studies. The model performance analysis and observations would be helpful to nucleus detection in microscopy images.


2021 ◽  
Author(s):  
Gerald Eichstädt ◽  
John Rogers ◽  
Glenn Orton ◽  
Candice Hansen

&lt;p&gt;We derive Jupiter's zonal vorticity profile from JunoCam images, with Juno's polar orbit allowing the observation of latitudes that are difficult to observe from Earth or from equatorial flybys. &amp;#160;We identify cyclonic local vorticity maxima near 77.9&amp;#176;, 65.6&amp;#176;, 59.3&amp;#176;, 50.9&amp;#176;, 42.4&amp;#176;, and 34.3&amp;#176;S planetocentric at a resolution of ~1&amp;#176;, based on analyzing selected JunoCam image pairs taken during the 16 Juno perijove flybys 15-30. We identify zonal anticyclonic local vorticity maxima near 80.7&amp;#176;, 73.8&amp;#176;, 62.1&amp;#176;, 56.4&amp;#176;, 46.9&amp;#176;, 38.0&amp;#176;, and 30.7&amp;#176;S. &amp;#160;These results agree with the known zonal wind profile below 64&amp;#176;S, and reveal novel structure further south, including a prominent cyclonic band centered near 66&amp;#176;S. The anticyclonic vorticity maximum near 73.8&amp;#176;S represents a broad and skewed fluctuating anticyclonic band between ~69.0&amp;#176; and ~76.5&amp;#176;S, and is hence poorly defined. This band may even split temporarily into two or three bands. &amp;#160;The cyclonic vorticity maximum near 77.9&amp;#176;S appears to be fairly stable during these flybys, probably representing irregular cyclonic structures in the region. The area between ~82&amp;#176; and 90&amp;#176;S is relatively small and close to the terminator, resulting in poor statistics, but generally shows a strongly cyclonic mean vorticity, representing the well-known circumpolar cyclone cluster.&lt;/p&gt;&lt;p&gt;The latitude range between ~30&amp;#176;S and ~85&amp;#176;S was particularly well observed, allowing observation periods lasting several hours. For each considered perijove we selected a pair of images separated by about 30 - 60 minutes. We derived high-passed and contrast-normalized south polar equidistant azimuthal maps of Jupiter's cloud tops. They were used to derive maps of local rotation at a resolution of ~1&amp;#176; latitude by stereo-corresponding Monte-Carlo-distributed and Gauss-weighted round tiles for each image pair considered. Only the rotation portion of the stereo correspondence between tiles was used to sample the vorticity maps. For each image pair, we rendered ~40 vorticity maps with different Monte-Carlo runs. The standard deviation of the resulting statistics provided a criterion to define a valid area of the mean vorticity map. Averaging vorticities along circles centered on the south pole returned a zonal vorticity profile for each of the perijoves considered. Averaging the resulting zonal vorticity profiles built the basis for a discussion of the mean profile.&lt;/p&gt;&lt;p&gt;JunoCam also images the northern hemisphere, at higher resolution but with coverage restricted to a briefer time span and smaller area due to the nature of Juno's elliptical orbit, which will restrict our ability to obtain zonal vorticity profiles.&lt;/p&gt;


Author(s):  
Abhijeet Bhattacharya ◽  
Tanmay Baweja ◽  
S. P. K. Karri

The electroencephalogram (EEG) is the most promising and efficient technique to study epilepsy and record all the electrical activity going in our brain. Automated screening of epilepsy through data-driven algorithms reduces the manual workload of doctors to diagnose epilepsy. New algorithms are biased either towards signal processing or deep learning, which holds subjective advantages and disadvantages. The proposed pipeline is an end-to-end automated seizure prediction framework with a Fourier transform feature extraction and deep learning-based transformer model, a blend of signal processing and deep learning — this imbibes the potential features to automatically identify the attentive regions in EEG signals for effective screening. The proposed pipeline has demonstrated superior performance on the benchmark dataset with average sensitivity and false-positive rate per hour (FPR/h) as 98.46%, 94.83% and 0.12439, 0, respectively. The proposed work shows great results on the benchmark datasets and a big potential for clinics as a support system with medical experts monitoring the patients.


2021 ◽  
Vol 7 ◽  
pp. e571
Author(s):  
Nurdan Ayse Saran ◽  
Murat Saran ◽  
Fatih Nar

In the last decade, deep learning has been applied in a wide range of problems with tremendous success. This success mainly comes from large data availability, increased computational power, and theoretical improvements in the training phase. As the dataset grows, the real world is better represented, making it possible to develop a model that can generalize. However, creating a labeled dataset is expensive, time-consuming, and sometimes not likely in some domains if not challenging. Therefore, researchers proposed data augmentation methods to increase dataset size and variety by creating variations of the existing data. For image data, variations can be obtained by applying color or spatial transformations, only one or a combination. Such color transformations perform some linear or nonlinear operations in the entire image or in the patches to create variations of the original image. The current color-based augmentation methods are usually based on image processing methods that apply color transformations such as equalizing, solarizing, and posterizing. Nevertheless, these color-based data augmentation methods do not guarantee to create plausible variations of the image. This paper proposes a novel distribution-preserving data augmentation method that creates plausible image variations by shifting pixel colors to another point in the image color distribution. We achieved this by defining a regularized density decreasing direction to create paths from the original pixels’ color to the distribution tails. The proposed method provides superior performance compared to existing data augmentation methods which is shown using a transfer learning scenario on the UC Merced Land-use, Intel Image Classification, and Oxford-IIIT Pet datasets for classification and segmentation tasks.


Sign in / Sign up

Export Citation Format

Share Document