Introducing AIDE: a Software Suite for Annotating Images with Deep and Active Learning Assistance

Mapping Intimacies ◽

10.5194/egusphere-egu21-12065 ◽

2021 ◽

Author(s):

Benjamin Kellenberger ◽

Devis Tuia ◽

Dan Morris

Keyword(s):

Deep Learning ◽

Active Learning ◽

Expert Knowledge ◽

Semantic Segmentation ◽

Third Party ◽

Learning Models ◽

Web Browser ◽

Web Based ◽

Model Training ◽

Bounding Boxes

Ecological research like wildlife censuses increasingly relies on data on the scale of Terabytes. For example, modern camera trap datasets contain millions of images that require prohibitive amounts of manual labour to be annotated with species, bounding boxes, and the like. Machine learning, especially deep learning [3], could greatly accelerate this task through automated predictions, but involves expansive coding and expert knowledge.In this abstract we present AIDE, the Annotation Interface for Data-driven Ecology [2]. In a first instance, AIDE is a web-based annotation suite for image labelling with support for concurrent access and scalability, up to the cloud. In a second instance, it tightly integrates deep learning models into the annotation process through active learning [7], where models learn from user-provided labels and in turn select the most relevant images for review from the large pool of unlabelled ones (Fig. 1). The result is a system where users only need to label what is required, which saves time and decreases errors due to fatigue.<img src="https://contentmanager.copernicus.org/fileStorageProxy.php?f=gnp.0402be60f60062057601161/sdaolpUECMynit/12UGE&app=m&a=0&c=131251398e575ac9974634bd0861fadc&ct=x&pn=gnp.elif&d=1" alt="">Fig. 1: AIDE offers concurrent web image labelling support and uses annotations and deep learning models in an active learning loop.AIDE includes a comprehensive set of built-in models, such as ResNet [1] for image classification, Faster R-CNN [5] and RetinaNet [4] for object detection, and U-Net [6] for semantic segmentation. All models can be customised and used without having to write a single line of code. Furthermore, AIDE accepts any third-party model with minimal implementation requirements. To complete the package, AIDE offers both user annotation and model prediction evaluation, access control, customisable model training, and more, all through the web browser.AIDE is fully open source and available under https://github.com/microsoft/aerial_wildlife_detection.&#160;References

Download Full-text

AUTOMATED MARINE OIL SPILL DETECTION USING DEEP LEARNING INSTANCE SEGMENTATION MODEL

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-1271-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 1271-1276

Author(s):

S. T. Yekeen ◽

A.-L. Balogun

Keyword(s):

Deep Learning ◽

Oil Spill ◽

Semantic Segmentation ◽

Learning Models ◽

Marine Oil ◽

Conventional Machine ◽

Feature Pyramid ◽

Model Training ◽

Instance Segmentation ◽

Better Than

Abstract. This study developed a novel deep learning oil spill instance segmentation model using Mask-Region-based Convolutional Neural Network (Mask R-CNN) model which is a state-of-the-art computer vision model. A total of 2882 imageries containing oil spill, look-alike, ship, and land area after conducting different pre-processing activities were acquired. These images were subsequently sub-divided into 88% training and 12% for testing, equating to 2530 and 352 images respectively. The model training was conducted using transfer learning on a pre-trained ResNet 101 with COCO data as a backbone in combination with Feature Pyramid Network (FPN) architecture for the extraction of features at 30 epochs with 0.001 learning rate. The model’s performance was evaluated using precision, recall, and F1-measure which shows a higher performance than other existing models with value of 0.964, 0.969 and 0.968 respectively. As a specialized task, the study concluded that the developed deep learning instance segmentation model (Mask R-CNN) performs better than conventional machine learning models and semantic segmentation deep learning models in detection and segmentation of marine oil spill.

Download Full-text

A Deep Learning-Based Framework for Automated Extraction of Building Footprint Polygons from Very High-Resolution Aerial Imagery

Remote Sensing ◽

10.3390/rs13183630 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3630

Author(s):

Ziming Li ◽

Qinchuan Xin ◽

Ying Sun ◽

Mengying Cao

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Learning Models ◽

Building Footprint ◽

Bounding Boxes ◽

Very High ◽

Segmentation Models

Accurate building footprint polygons provide essential data for a wide range of urban applications. While deep learning models have been proposed to extract pixel-based building areas from remote sensing imagery, the direct vectorization of pixel-based building maps often leads to building footprint polygons with irregular shapes that are inconsistent with real building boundaries, making it difficult to use them in geospatial analysis. In this study, we proposed a novel deep learning-based framework for automated extraction of building footprint polygons (DLEBFP) from very high-resolution aerial imagery by combining deep learning models for different tasks. Our approach uses the U-Net, Cascade R-CNN, and Cascade CNN deep learning models to obtain building segmentation maps, building bounding boxes, and building corners, respectively, from very high-resolution remote sensing images. We used Delaunay triangulation to construct building footprint polygons based on the detected building corners with the constraints of building bounding boxes and building segmentation maps. Experiments on the Wuhan University building dataset and ISPRS Vaihingen dataset indicate that DLEBFP can perform well in extracting high-quality building footprint polygons. Compared with the other semantic segmentation models and the vector map generalization method, DLEBFP is able to achieve comparable mapping accuracies with semantic segmentation models on a pixel basis and generate building footprint polygons with concise edges and vertices with regular shapes that are close to the reference data. The promising performance indicates that our method has the potential to extract accurate building footprint polygons from remote sensing images for applications in geospatial analysis.

Download Full-text

Self-Attention in Reconstruction Bias U-Net for Semantic Segmentation of Building Rooftops in Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs13132524 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2524

Author(s):

Ziyi Chen ◽

Dilong Li ◽

Wentao Fan ◽

Haiyan Guan ◽

Cheng Wang ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Semantic Segmentation ◽

Extraction Methods ◽

The Self ◽

Optical Remote Sensing ◽

Building Extraction ◽

Learning Models ◽

Remote Sensing Images ◽

Segmentation Methods

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.

Download Full-text

A Survey on Various Available Object Detection Models and Application In Automatic License Plate Detection

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05222 ◽

2021 ◽

Vol 23 (06) ◽

pp. 47-57

Author(s):

Aditya Kulkarni ◽

◽

Manali Munot ◽

Sai Salunkhe ◽

Shubham Mhaske ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Image Databases ◽

License Plate ◽

Learning Models ◽

Python Language ◽

Performance Accuracy ◽

License Plate Detection ◽

Bounding Boxes ◽

Complex Images

With the development in technologies right from serial to parallel computing, GPU, AI, and deep learning models a series of tools to process complex images have been developed. The main focus of this research is to compare various algorithms(pre-trained models) and their contributions to process complex images in terms of performance, accuracy, time, and their limitations. The pre-trained models we are using are CNN, R-CNN, R-FCN, and YOLO. These models are python language-based and use libraries like TensorFlow, OpenCV, and free image databases (Microsoft COCO and PAS-CAL VOC 2007/2012). These not only aim at object detection but also on building bounding boxes around appropriate locations. Thus, by this review, we get a better vision of these models and their performance and a good idea of which models are ideal for various situations.

Download Full-text

Semantic Segmentation of Urban Buildings Using a High-Resolution Network (HRNet) with Channel and Spatial Attention Gates

Remote Sensing ◽

10.3390/rs13163087 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3087

Author(s):

Seonkyeong Seong ◽

Jaewan Choi

Keyword(s):

Deep Learning ◽

High Resolution ◽

Spatial Attention ◽

Semantic Segmentation ◽

Aerial Images ◽

Building Extraction ◽

Learning Models ◽

Urban Buildings

In this study, building extraction in aerial images was performed using csAG-HRNet by applying HRNet-v2 in combination with channel and spatial attention gates. HRNet-v2 consists of transition and fusion processes based on subnetworks according to various resolutions. The channel and spatial attention gates were applied in the network to efficiently learn important features. A channel attention gate assigns weights in accordance with the importance of each channel, and a spatial attention gate assigns weights in accordance with the importance of each pixel position for the entire channel. In csAG-HRNet, csAG modules consisting of a channel attention gate and a spatial attention gate were applied to each subnetwork of stage and fusion modules in the HRNet-v2 network. In experiments using two datasets, it was confirmed that csAG-HRNet could minimize false detections based on the shapes of large buildings and small nonbuilding objects compared to existing deep learning models.

Download Full-text

HIVE: Hierarchical Information Visualization for Explainability

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/709 ◽

2021 ◽

Author(s):

Yi-Ning Juan ◽

Yi-Shyuan Chiang ◽

Shang-Chuan Liu ◽

Ming-Feng Tsai ◽

Chuan-Ju Wang

Keyword(s):

Deep Learning ◽

Information Visualization ◽

Financial Statements ◽

Use Cases ◽

Special Focus ◽

Learning Models ◽

Web Based ◽

Risk Ranking ◽

Interactive Tool ◽

Ranking Model

In this demonstration, we develop an interactive tool, HIVE, to demonstrate the ability and versatility of an explainable risk ranking model with a special focus on financial use cases. HIVE is a web-based tool that provides users with automated highlighted financial statements, and HIVE is designed for making comparing statements rather more efficient. Moreover, with the proposed tool, users can find related reports at ease, and we believe that HIVE can benefit both academics and practitioners in finance as they can work around deep learning models with their newly gained insights.

Download Full-text

Comparative Analysis of Semantic Segmentation by Using Deep Learning Models on Retinal Vessel

10.1007/978-981-16-5348-3_25 ◽

2021 ◽

pp. 313-322

Author(s):

Twinkle Tiwari ◽

Mukesh Saraswat

Keyword(s):

Deep Learning ◽

Comparative Analysis ◽

Retinal Vessel ◽

Semantic Segmentation ◽

Learning Models

Download Full-text

APLIKASI SEMANTIC SEGMENTATION UNTUK EKSTRAKSI FITUR BANGUNAN PADA PETA RUPABUMI SKALA BESAR

Seminar Nasional Geomatika ◽

10.24895/sng.2020.0-0.1207 ◽

2021 ◽

pp. 923

Author(s):

Prayudha Hartanto ◽

Nugroho Purwono ◽

Danang Budi Susetyo ◽

Fahrul Hidayat ◽

Mochamad Irwan Hariyono

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

Learning Rate ◽

Cross Entropy ◽

Model Training

Teknologi kecerdasan buatan adalah sebuah inovasi mutakhir yang mengandalkan peran komputer untuk mengenali dan memprediksi berbagai objek yang menjadi perhatian, dalam hal ini adalah fitur bangunan pada peta Rupabumi Indonesia (RBI) skala besar. Teknologi ini memiliki cakupan yang sangat luas, dan dalam penelitian ini akan dibahas aplikasi salah satu cabang kecerdasan buatan yang paling kompleks, yakni deep learning. Metode deep learning yang digunakan dalam ekstraksi fitur bangunan pada peta RBI skala besar adalah semantic segmentation, dimana objek tidak hanya dideteksi, namun juga disegmentasi bagian tepinya tanpa memperhatikan unit satuan bangunan sehingga diperoleh hasil-hasil berupa fitur bangunan dalam satu kesatuan segmen dan fitur selain bangunan menjadi segmen lainnya. Algoritma semantic segmentation yang dipilih adalah Unet yang dibagi ke dalam arsitektur Small Unet dan Full Unet. Arsitektur Small Unet menggunakan 18 layer konvolusi sedangkan Full Unet menggunakan 19 layer. Data training yang digunakan adalah data UAV wilayah Kantor Badan Informasi Geospasial (BIG), foto udara Wuhan University, dan foto udara kota Austin Texas. Rasio data training-testing yang digunakan adalah 80%:20%, dengan learning rate 10-4, fungsi optimasi Adams dan fungsi loss binary cross-entropy. Proses pembuatan model (training) dilakukan menggunakan perangkat lunak Tensorflow yang dijalankan dalam platform Google Colaboratory. Arsitektur Small Unet memberikan hasil 0,119 untuk model loss; 0,932 untuk akurasi piksel dan 0,698 untuk mean Intersection over Union (IoU). Sementara itu arsitektur Full Unet memberikan hasil yang relatif lebih baik yakni 0,112; 0,943; dan 0,773 masing-masing untuk model loss, akurasi piksel dan IoU.

Download Full-text

Light deep learning models enriched with Entangled features for RGB-D semantic segmentation

Robotics and Autonomous Systems ◽

10.1016/j.robot.2021.103862 ◽

2021 ◽

pp. 103862

Author(s):

Matteo Terreran ◽

Stefano Ghidoni

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

Learning Models

Download Full-text

A merged molecular representation learning for molecular properties prediction with a web-based service

Scientific Reports ◽

10.1038/s41598-021-90259-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hyunseob Kim ◽

Jeongcheol Lee ◽

Sunil Ahn ◽

Jongsuk Ruth Lee

Keyword(s):

Deep Learning ◽

Quantitative Estimation ◽

Chemical Properties ◽

Representation Learning ◽

Fine Tuning ◽

Learning Models ◽

Web Based ◽

Property Prediction ◽

Matrix Embedding ◽

Molecular Properties Prediction

AbstractDeep learning has brought a dramatic development in molecular property prediction that is crucial in the field of drug discovery using various representations such as fingerprints, SMILES, and graphs. In particular, SMILES is used in various deep learning models via character-based approaches. However, SMILES has a limitation in that it is hard to reflect chemical properties. In this paper, we propose a new self-supervised method to learn SMILES and chemical contexts of molecules simultaneously in pre-training the Transformer. The key of our model is learning structures with adjacency matrix embedding and learning logics that can infer descriptors via Quantitative Estimation of Drug-likeness prediction in pre-training. As a result, our method improves the generalization of the data and achieves the best average performance by benchmarking downstream tasks. Moreover, we develop a web-based fine-tuning service to utilize our model on various tasks.

Download Full-text