scholarly journals Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations

Author(s):  
Max Losch ◽  
Mario Fritz ◽  
Bernt Schiele

AbstractToday’s deep learning systems deliver high performance based on end-to-end training but are notoriously hard to inspect. We argue that there are at least two reasons making inspectability challenging: (i) representations are distributed across hundreds of channels and (ii) a unifying metric quantifying inspectability is lacking. In this paper, we address both issues by proposing Semantic Bottlenecks (SB), which can be integrated into pretrained networks, to align channel outputs with individual visual concepts and introduce the model agnostic Area Under inspectability Curve (AUiC) metric to measure the alignment. We present a case study on semantic segmentation to demonstrate that SBs improve the AUiC up to six-fold over regular network outputs. We explore two types of SB-layers in this work. First, concept-supervised SB-layers (SSB), which offer inspectability w.r.t. predefined concepts that the model is demanded to rely on. And second, unsupervised SBs (USB), which offer equally strong AUiC improvements by restricting distributedness of representations across channels. Importantly, for both SB types, we can recover state of the art segmentation performance across two different models despite a drastic dimensionality reduction from 1000s of non aligned channels to 10s of semantics-aligned channels that all downstream results are based on.

Energies ◽  
2021 ◽  
Vol 14 (13) ◽  
pp. 3800
Author(s):  
Sebastian Krapf ◽  
Nils Kemmerzell ◽  
Syed Khawaja Haseeb Khawaja Haseeb Uddin ◽  
Manuel Hack Hack Vázquez ◽  
Fabian Netzler ◽  
...  

Roof-mounted photovoltaic systems play a critical role in the global transition to renewable energy generation. An analysis of roof photovoltaic potential is an important tool for supporting decision-making and for accelerating new installations. State of the art uses 3D data to conduct potential analyses with high spatial resolution, limiting the study area to places with available 3D data. Recent advances in deep learning allow the required roof information from aerial images to be extracted. Furthermore, most publications consider the technical photovoltaic potential, and only a few publications determine the photovoltaic economic potential. Therefore, this paper extends state of the art by proposing and applying a methodology for scalable economic photovoltaic potential analysis using aerial images and deep learning. Two convolutional neural networks are trained for semantic segmentation of roof segments and superstructures and achieve an Intersection over Union values of 0.84 and 0.64, respectively. We calculated the internal rate of return of each roof segment for 71 buildings in a small study area. A comparison of this paper’s methodology with a 3D-based analysis discusses its benefits and disadvantages. The proposed methodology uses only publicly available data and is potentially scalable to the global level. However, this poses a variety of research challenges and opportunities, which are summarized with a focus on the application of deep learning, economic photovoltaic potential analysis, and energy system analysis.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dominik Jens Elias Waibel ◽  
Sayedali Shetab Boushehri ◽  
Carsten Marr

Abstract Background Deep learning contributes to uncovering molecular and cellular processes with highly performant algorithms. Convolutional neural networks have become the state-of-the-art tool to provide accurate and fast image data processing. However, published algorithms mostly solve only one specific problem and they typically require a considerable coding effort and machine learning background for their application. Results We have thus developed InstantDL, a deep learning pipeline for four common image processing tasks: semantic segmentation, instance segmentation, pixel-wise regression and classification. InstantDL enables researchers with a basic computational background to apply debugged and benchmarked state-of-the-art deep learning algorithms to their own data with minimal effort. To make the pipeline robust, we have automated and standardized workflows and extensively tested it in different scenarios. Moreover, it allows assessing the uncertainty of predictions. We have benchmarked InstantDL on seven publicly available datasets achieving competitive performance without any parameter tuning. For customization of the pipeline to specific tasks, all code is easily accessible and well documented. Conclusions With InstantDL, we hope to empower biomedical researchers to conduct reproducible image processing with a convenient and easy-to-use pipeline.


2018 ◽  
Author(s):  
Alexey A. Shvets ◽  
Alexander Rakhlin ◽  
Alexandr A. Kalinin ◽  
Vladimir I. Iglovikov

AbstractSemantic segmentation of robotic instruments is an important problem for the robot-assisted surgery. One of the main challenges is to correctly detect an instrument’s position for the tracking and pose estimation in the vicinity of surgical scenes. Accurate pixel-wise instrument segmentation is needed to address this challenge. In this paper we describe our deep learning-based approach for robotic instrument segmentation. Our approach demonstrates an improvement over the state-of-the-art results using several novel deep neural network architectures. It addressed the binary segmentation problem, where every pixel in an image is labeled as an instrument or background from the surgery video feed. In addition, we solve a multi-class segmentation problem, in which we distinguish between different instruments or different parts of an instrument from the background. In this setting, our approach outperforms other methods for automatic instrument segmentation thereby providing state-of-the-art results for these problems. The source code for our solution is made publicly available.


2019 ◽  
Vol 11 (6) ◽  
pp. 684 ◽  
Author(s):  
Maria Papadomanolaki ◽  
Maria Vakalopoulou ◽  
Konstantinos Karantzalos

Deep learning architectures have received much attention in recent years demonstrating state-of-the-art performance in several segmentation, classification and other computer vision tasks. Most of these deep networks are based on either convolutional or fully convolutional architectures. In this paper, we propose a novel object-based deep-learning framework for semantic segmentation in very high-resolution satellite data. In particular, we exploit object-based priors integrated into a fully convolutional neural network by incorporating an anisotropic diffusion data preprocessing step and an additional loss term during the training process. Under this constrained framework, the goal is to enforce pixels that belong to the same object to be classified at the same semantic category. We compared thoroughly the novel object-based framework with the currently dominating convolutional and fully convolutional deep networks. In particular, numerous experiments were conducted on the publicly available ISPRS WGII/4 benchmark datasets, namely Vaihingen and Potsdam, for validation and inter-comparison based on a variety of metrics. Quantitatively, experimental results indicate that, overall, the proposed object-based framework slightly outperformed the current state-of-the-art fully convolutional networks by more than 1% in terms of overall accuracy, while intersection over union results are improved for all semantic categories. Qualitatively, man-made classes with more strict geometry such as buildings were the ones that benefit most from our method, especially along object boundaries, highlighting the great potential of the developed approach.


Author(s):  
Alex Dexter ◽  
Spencer A. Thomas ◽  
Rory T. Steven ◽  
Kenneth N. Robinson ◽  
Adam J. Taylor ◽  
...  

AbstractHigh dimensionality omics and hyperspectral imaging datasets present difficult challenges for feature extraction and data mining due to huge numbers of features that cannot be simultaneously examined. The sample numbers and variables of these methods are constantly growing as new technologies are developed, and computational analysis needs to evolve to keep up with growing demand. Current state of the art algorithms can handle some routine datasets but struggle when datasets grow above a certain size. We present a training deep learning via neural networks on non-linear dimensionality reduction, in particular t-distributed stochastic neighbour embedding (t-SNE), to overcome prior limitations of these methods.One Sentence SummaryAnalysis of prohibitively large datasets by combining deep learning via neural networks with non-linear dimensionality reduction.


2021 ◽  
Vol 13 (24) ◽  
pp. 5100
Author(s):  
Teerapong Panboonyuen ◽  
Kulsawasd Jitkajornwanich ◽  
Siam Lawawirojwong ◽  
Panu Srestasathiern ◽  
Peerapon Vateekul

Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% F1 score), Thailand North Landsat-8 corpus (63.12% F1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.


Information ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 51 ◽  
Author(s):  
Melanie Mitchell

Today’s AI systems sorely lack the essence of human intelligence: Understanding the situations we experience, being able to grasp their meaning. The lack of humanlike understanding in machines is underscored by recent studies demonstrating lack of robustness of state-of-the-art deep-learning systems. Deeper networks and larger datasets alone are not likely to unlock AI’s “barrier of meaning”; instead the field will need to embrace its original roots as an interdisciplinary science of intelligence.


2022 ◽  
Author(s):  
Yujian Mo ◽  
Yan Wu ◽  
Xinneng Yang ◽  
Feilin Liu ◽  
Yujun Liao

Author(s):  
Bo Chen ◽  
Hua Zhang ◽  
Yonglong Li ◽  
Shuang Wang ◽  
Huaifang Zhou ◽  
...  

Abstract An increasing number of detection methods based on computer vision are applied to detect cracks in water conservancy infrastructure. However, most studies directly use existing feature extraction networks to extract cracks information, which are proposed for open-source datasets. As the cracks distribution and pixel features are different from these data, the extracted cracks information is incomplete. In this paper, a deep learning-based network for dam surface crack detection is proposed, which mainly addresses the semantic segmentation of cracks on the dam surface. Particularly, we design a shallow encoding network to extract features of crack images based on the statistical analysis of cracks. Further, to enhance the relevance of contextual information, we introduce an attention module into the decoding network. During the training, we use the sum of Cross-Entropy and Dice Loss as the loss function to overcome data imbalance. The quantitative information of cracks is extracted by the imaging principle after using morphological algorithms to extract the morphological features of the predicted result. We built a manual annotation dataset containing 1577 images to verify the effectiveness of the proposed method. This method achieves the state-of-the-art performance on our dataset. Specifically, the precision, recall, IoU, F1_measure, and accuracy achieve 90.81%, 81.54%, 75.23%, 85.93%, 99.76%, respectively. And the quantization error of cracks is less than 4%.


2018 ◽  
Vol 20 (22) ◽  
pp. 5158-5168 ◽  
Author(s):  
Weifeng Peng ◽  
Fang Yao ◽  
Jianghuai Hu ◽  
Yao Liu ◽  
Zheng Lu ◽  
...  

This work first introduced the protein-based l-tyrosine to thermosets, and proved that the l-tyrosine building-block can confer materials with high performance.


Sign in / Sign up

Export Citation Format

Share Document