scholarly journals Self-Supervised Learning and Network Architecture Search for Domain-Adaptive Landmark Detector

2020 ◽  
Author(s):  
kanji tanaka

Fine-tuning a deep convolutional neural network (DCN) as a place-class detector (PCD) is a direct method to realize domain-adaptive visual place recognition (VPR). Although the PCD model is effective, a PCD model requires considerable amount of class-specific training examples and class-set maintenance in long-term large-scale VPR scenarios. Therefore, we propose to employ a DCN as a landmark-class detector (LCD), which allows to distinguish exponentially large numbers of different places by combining multiple landmarks, and furthermore, allows to select a stable part of the scenes (such as buildings) as landmark classes to reduce the need for class-set maintenance. However, the following important questions remain. 1) How we should mine such training examples (landmark objects) even when we have no domain-specific object detector? 2) How we should fine-tune the architecture and parameters of the DCN to a new domain-specific landmark set? To answer these questions, we present a self-supervised landmark mining approach for collecting pseudo-labeled landmark examples, and then consider the network architecture search (NAS) on the LCD task, which has significantly larger search space than typical NAS applications such as PCD. Extensive verification experiments demonstrate the superiority of the proposed framework to previous LCD methods with hand-crafted architectures and/or non-adaptive parameters, and 90% reduction in NAS cost compared with the naive NAS implementation.

2020 ◽  
Author(s):  
kanji tanaka

Fine-tuning a deep convolutional neural network (DCN) as a place-class detector (PCD) is a direct method to realize domain-adaptive visual place recognition (VPR). Although the PCD model is effective, a PCD model requires considerable amount of class-specific training examples and class-set maintenance in long-term large-scale VPR scenarios. Therefore, we propose to employ a DCN as a landmark-class detector (LCD), which allows to distinguish exponentially large numbers of different places by combining multiple landmarks, and furthermore, allows to select a stable part of the scenes (such as buildings) as landmark classes to reduce the need for class-set maintenance. However, the following important questions remain. 1) How we should mine such training examples (landmark objects) even when we have no domain-specific object detector? 2) How we should fine-tune the architecture and parameters of the DCN to a new domain-specific landmark set? To answer these questions, we present a self-supervised landmark mining approach for collecting pseudo-labeled landmark examples, and then consider the network architecture search (NAS) on the LCD task, which has significantly larger search space than typical NAS applications such as PCD. Extensive verification experiments demonstrate the superiority of the proposed framework to previous LCD methods with hand-crafted architectures and/or non-adaptive parameters, and 90% reduction in NAS cost compared with the naive NAS implementation.


2020 ◽  
Vol 34 (05) ◽  
pp. 7780-7788
Author(s):  
Siddhant Garg ◽  
Thuy Vu ◽  
Alessandro Moschitti

We propose TandA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving the impressive MAP scores of 92% and 94.3%, respectively, which largely outperform the the highest scores of 83.4% and 87.5% of previous work. We empirically show that TandA generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters. Additionally, we show that the transfer step of TandA makes the adaptation step more robust to noise. This enables a more effective use of noisy datasets for fine-tuning. Finally, we also confirm the positive impact of TandA in an industrial setting, using domain specific datasets subject to different types of noise.


Author(s):  
Kanji Tanaka ◽  

With recent progress in large-scale map maintenance and long-term map learning, the task of change detection on a large-scale map from a visual image captured by a mobile robot has become a problem of increasing criticality. In this paper, we present an efficient approach of change-classifier-learning, more specifically, in the proposed approach, a collection of place-specific change classifiers is employed. Our approach requires the memorization of only training examples (rather than the classifier itself), which can be further compressed in the form of bag-of-words (BoW). Furthermore, through the proposed approach the most recent map can be incorporated into the classifiers by straightforwardly adding or deleting a few training examples that correspond to these classifiers. The proposed algorithm is applied and evaluated on a practical long-term cross-season change detection system that consists of a large number of place-specific object-level change classifiers.


2021 ◽  
Author(s):  
Dejin Xun ◽  
Deheng Chen ◽  
Yitian Zhou ◽  
Volker M. Lauschke ◽  
Rui Wang ◽  
...  

Deep learning-based cell segmentation is increasingly utilized in cell biology and molecular pathology, due to massive accumulation of diverse large-scale datasets and excellent performance in cell representation. However, the development of specialized algorithms has long been hampered by a paucity of annotated training data, whereas the performance of generalist algorithm was limited without experiment-specific calibration. Here, we present a deep learning-based tool called Scellseg consisted of novel pre-trained network architecture and contrastive fine-tuning strategy. In comparison to four commonly used algorithms, Scellseg outperformed in average precision on three diverse datasets with no need for dataset-specific configuration. Interestingly, we found that eight images are sufficient for model tuning to achieve satisfied performance based on a shot data scale experiment. We also developed a graphical user interface integrated with functions of annotation, fine-tuning and inference, that allows biologists to easily specialize their own segmentation model and analyze data at the single-cell level.


Neurology ◽  
2017 ◽  
Vol 88 (16) ◽  
pp. 1546-1555 ◽  
Author(s):  
Roza M. Umarova ◽  
Lena Beume ◽  
Marco Reisert ◽  
Christoph P. Kaller ◽  
Stefan Klöppel ◽  
...  

Objective:To distinguish white matter remodeling directly induced by stroke lesion from that evoked by remote network dysfunction, using spatial neglect as a model.Methods:We examined 24 visual neglect/extinction patients and 17 control patients combining comprehensive analyses of diffusion tensor metrics and global fiber tracking with neuropsychological testing in the acute (6.3 ± 0.5 days poststroke) and chronic (134 ± 7 days poststroke) stroke phases.Results:Compared to stroke controls, patients with spatial neglect/extinction displayed longitudinal white matter alterations with 2 defining signatures: (1) perilesional degenerative changes characterized by congruently reduced fractional anisotropy and increased radial diffusivity (RD), axial diffusivity, and mean diffusivity, all suggestive of direct axonal damage by lesion and therefore nonspecific for impaired attention network and (2) transneuronal changes characterized by an increased RD in contralesional frontoparietal and bilateral occipital connections, suggestive of primary periaxonal involvement; these changes were distinctly related to the degree of unrecovered neglect symptoms in chronic stroke, hence emerging as network-specific alterations.Conclusions:The present data show how stroke entails global alterations of lesion-spared network architecture over time. Sufficiently large lesions of widely interconnected association cortex induce distinct, large-scale structural reorganization in domain-specific network connections. Besides their relevance to unrecovered domain-specific symptoms, these effects might also explain mechanisms of domain-general deficits in stroke patients, pointing to potential targets for therapeutic intervention.


Author(s):  
Swapnali Gavali ◽  
Dr. Bashirahamad Momin

— Landmark recognition is one type of problem of object recognition that has not been well solved. The classical techniques used for object recognition can not be applied directly due to a large number of landmarks and a highly imbalanced dataset. This paper presents the application of a triplet network for large scale landmarkbased visual place recognition. By fine-tuning pre-trained convolutional neural network (CNN) and minimizing triplet loss, the triplet network can learn appropriate metrics so that most similar images can be retrieved through algorithms for the k-nearest neighbor (KNN). The performance of the proposed method is evaluated on a data set for the recognition of real-world landmarks.


2018 ◽  
Author(s):  
Fabian H. Sinz ◽  
Alexander S. Ecker ◽  
Paul G. Fahey ◽  
Edgar Y. Walker ◽  
Erick Cobos ◽  
...  

AbstractTo better understand the representations in visual cortex, we need to generate better predictions of neural activity in awake animals presented with their ecological input: natural video. Despite recent advances in models for static images, models for predicting responses to natural video are scarce and standard linear-nonlinear models perform poorly. We developed a new deep recurrent network architecture that predicts inferred spiking activity of thousands of mouse V1 neurons simulta-neously recorded with two-photon microscopy, while accounting for confounding factors such as the animal’s gaze position and brain state changes related to running state and pupil dilation. Powerful system identification models provide an opportunity to gain insight into cortical functions through in silico experiments that can subsequently be tested in the brain. However, in many cases this approach requires that the model is able to generalize to stimulus statistics that it was not trained on, such as band-limited noise and other parameterized stimuli. We investigated these domain transfer properties in our model and find that our model trained on natural images is able to correctly predict the orientation tuning of neurons in responses to artificial noise stimuli. Finally, we show that we can fully generalize from movies to noise and maintain high predictive performance on both stimulus domains by fine-tuning only the final layer’s weights on a network otherwise trained on natural movies. The converse, however, is not true.


2020 ◽  
Vol 2020 (10) ◽  
pp. 181-1-181-7
Author(s):  
Takahiro Kudo ◽  
Takanori Fujisawa ◽  
Takuro Yamaguchi ◽  
Masaaki Ikehara

Image deconvolution has been an important issue recently. It has two kinds of approaches: non-blind and blind. Non-blind deconvolution is a classic problem of image deblurring, which assumes that the PSF is known and does not change universally in space. Recently, Convolutional Neural Network (CNN) has been used for non-blind deconvolution. Though CNNs can deal with complex changes for unknown images, some CNN-based conventional methods can only handle small PSFs and does not consider the use of large PSFs in the real world. In this paper we propose a non-blind deconvolution framework based on a CNN that can remove large scale ringing in a deblurred image. Our method has three key points. The first is that our network architecture is able to preserve both large and small features in the image. The second is that the training dataset is created to preserve the details. The third is that we extend the images to minimize the effects of large ringing on the image borders. In our experiments, we used three kinds of large PSFs and were able to observe high-precision results from our method both quantitatively and qualitatively.


2021 ◽  
Vol 13 (9) ◽  
pp. 5108
Author(s):  
Navin Ranjan ◽  
Sovit Bhandari ◽  
Pervez Khan ◽  
Youn-Sik Hong ◽  
Hoon Kim

The transportation system, especially the road network, is the backbone of any modern economy. However, with rapid urbanization, the congestion level has surged drastically, causing a direct effect on the quality of urban life, the environment, and the economy. In this paper, we propose (i) an inexpensive and efficient Traffic Congestion Pattern Analysis algorithm based on Image Processing, which identifies the group of roads in a network that suffers from reoccurring congestion; (ii) deep neural network architecture, formed from Convolutional Autoencoder, which learns both spatial and temporal relationships from the sequence of image data to predict the city-wide grid congestion index. Our experiment shows that both algorithms are efficient because the pattern analysis is based on the basic operations of arithmetic, whereas the prediction algorithm outperforms two other deep neural networks (Convolutional Recurrent Autoencoder and ConvLSTM) in terms of large-scale traffic network prediction performance. A case study was conducted on the dataset from Seoul city.


2021 ◽  
Vol 3 (2) ◽  
pp. 299-317
Author(s):  
Patrick Schrempf ◽  
Hannah Watson ◽  
Eunsoo Park ◽  
Maciej Pajak ◽  
Hamish MacKinnon ◽  
...  

Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.


Sign in / Sign up

Export Citation Format

Share Document