Introducing AIDE: a Software Suite for Annotating Images with Deep and Active Learning Assistance

Author(s):  
Benjamin Kellenberger ◽  
Devis Tuia ◽  
Dan Morris

<p>Ecological research like wildlife censuses increasingly relies on data on the scale of Terabytes. For example, modern camera trap datasets contain millions of images that require prohibitive amounts of manual labour to be annotated with species, bounding boxes, and the like. Machine learning, especially deep learning [3], could greatly accelerate this task through automated predictions, but involves expansive coding and expert knowledge.</p><p>In this abstract we present AIDE, the Annotation Interface for Data-driven Ecology [2]. In a first instance, AIDE is a web-based annotation suite for image labelling with support for concurrent access and scalability, up to the cloud. In a second instance, it tightly integrates deep learning models into the annotation process through active learning [7], where models learn from user-provided labels and in turn select the most relevant images for review from the large pool of unlabelled ones (Fig. 1). The result is a system where users only need to label what is required, which saves time and decreases errors due to fatigue.</p><p><img src="https://contentmanager.copernicus.org/fileStorageProxy.php?f=gnp.0402be60f60062057601161/sdaolpUECMynit/12UGE&app=m&a=0&c=131251398e575ac9974634bd0861fadc&ct=x&pn=gnp.elif&d=1" alt=""></p><p><em>Fig. 1: AIDE offers concurrent web image labelling support and uses annotations and deep learning models in an active learning loop.</em></p><p>AIDE includes a comprehensive set of built-in models, such as ResNet [1] for image classification, Faster R-CNN [5] and RetinaNet [4] for object detection, and U-Net [6] for semantic segmentation. All models can be customised and used without having to write a single line of code. Furthermore, AIDE accepts any third-party model with minimal implementation requirements. To complete the package, AIDE offers both user annotation and model prediction evaluation, access control, customisable model training, and more, all through the web browser.</p><p>AIDE is fully open source and available under https://github.com/microsoft/aerial_wildlife_detection.</p><p> </p><p><strong>References</strong></p>

Author(s):  
S. T. Yekeen ◽  
A.-L. Balogun

Abstract. This study developed a novel deep learning oil spill instance segmentation model using Mask-Region-based Convolutional Neural Network (Mask R-CNN) model which is a state-of-the-art computer vision model. A total of 2882 imageries containing oil spill, look-alike, ship, and land area after conducting different pre-processing activities were acquired. These images were subsequently sub-divided into 88% training and 12% for testing, equating to 2530 and 352 images respectively. The model training was conducted using transfer learning on a pre-trained ResNet 101 with COCO data as a backbone in combination with Feature Pyramid Network (FPN) architecture for the extraction of features at 30 epochs with 0.001 learning rate. The model’s performance was evaluated using precision, recall, and F1-measure which shows a higher performance than other existing models with value of 0.964, 0.969 and 0.968 respectively. As a specialized task, the study concluded that the developed deep learning instance segmentation model (Mask R-CNN) performs better than conventional machine learning models and semantic segmentation deep learning models in detection and segmentation of marine oil spill.


2021 ◽  
Vol 13 (18) ◽  
pp. 3630
Author(s):  
Ziming Li ◽  
Qinchuan Xin ◽  
Ying Sun ◽  
Mengying Cao

Accurate building footprint polygons provide essential data for a wide range of urban applications. While deep learning models have been proposed to extract pixel-based building areas from remote sensing imagery, the direct vectorization of pixel-based building maps often leads to building footprint polygons with irregular shapes that are inconsistent with real building boundaries, making it difficult to use them in geospatial analysis. In this study, we proposed a novel deep learning-based framework for automated extraction of building footprint polygons (DLEBFP) from very high-resolution aerial imagery by combining deep learning models for different tasks. Our approach uses the U-Net, Cascade R-CNN, and Cascade CNN deep learning models to obtain building segmentation maps, building bounding boxes, and building corners, respectively, from very high-resolution remote sensing images. We used Delaunay triangulation to construct building footprint polygons based on the detected building corners with the constraints of building bounding boxes and building segmentation maps. Experiments on the Wuhan University building dataset and ISPRS Vaihingen dataset indicate that DLEBFP can perform well in extracting high-quality building footprint polygons. Compared with the other semantic segmentation models and the vector map generalization method, DLEBFP is able to achieve comparable mapping accuracies with semantic segmentation models on a pixel basis and generate building footprint polygons with concise edges and vertices with regular shapes that are close to the reference data. The promising performance indicates that our method has the potential to extract accurate building footprint polygons from remote sensing images for applications in geospatial analysis.


2021 ◽  
Vol 13 (13) ◽  
pp. 2524
Author(s):  
Ziyi Chen ◽  
Dilong Li ◽  
Wentao Fan ◽  
Haiyan Guan ◽  
Cheng Wang ◽  
...  

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.


2021 ◽  
Vol 23 (06) ◽  
pp. 47-57
Author(s):  
Aditya Kulkarni ◽  
◽  
Manali Munot ◽  
Sai Salunkhe ◽  
Shubham Mhaske ◽  
...  

With the development in technologies right from serial to parallel computing, GPU, AI, and deep learning models a series of tools to process complex images have been developed. The main focus of this research is to compare various algorithms(pre-trained models) and their contributions to process complex images in terms of performance, accuracy, time, and their limitations. The pre-trained models we are using are CNN, R-CNN, R-FCN, and YOLO. These models are python language-based and use libraries like TensorFlow, OpenCV, and free image databases (Microsoft COCO and PAS-CAL VOC 2007/2012). These not only aim at object detection but also on building bounding boxes around appropriate locations. Thus, by this review, we get a better vision of these models and their performance and a good idea of which models are ideal for various situations.


2021 ◽  
Vol 13 (16) ◽  
pp. 3087
Author(s):  
Seonkyeong Seong ◽  
Jaewan Choi

In this study, building extraction in aerial images was performed using csAG-HRNet by applying HRNet-v2 in combination with channel and spatial attention gates. HRNet-v2 consists of transition and fusion processes based on subnetworks according to various resolutions. The channel and spatial attention gates were applied in the network to efficiently learn important features. A channel attention gate assigns weights in accordance with the importance of each channel, and a spatial attention gate assigns weights in accordance with the importance of each pixel position for the entire channel. In csAG-HRNet, csAG modules consisting of a channel attention gate and a spatial attention gate were applied to each subnetwork of stage and fusion modules in the HRNet-v2 network. In experiments using two datasets, it was confirmed that csAG-HRNet could minimize false detections based on the shapes of large buildings and small nonbuilding objects compared to existing deep learning models.


Author(s):  
Yi-Ning Juan ◽  
Yi-Shyuan Chiang ◽  
Shang-Chuan Liu ◽  
Ming-Feng Tsai ◽  
Chuan-Ju Wang

In this demonstration, we develop an interactive tool, HIVE, to demonstrate the ability and versatility of an explainable risk ranking model with a special focus on financial use cases. HIVE is a web-based tool that provides users with automated highlighted financial statements, and HIVE is designed for making comparing statements rather more efficient. Moreover, with the proposed tool, users can find related reports at ease, and we believe that HIVE can benefit both academics and practitioners in finance as they can work around deep learning models with their newly gained insights.


2021 ◽  
pp. 923
Author(s):  
Prayudha Hartanto ◽  
Nugroho Purwono ◽  
Danang Budi Susetyo ◽  
Fahrul Hidayat ◽  
Mochamad Irwan Hariyono

Teknologi kecerdasan buatan adalah sebuah inovasi mutakhir yang mengandalkan peran komputer untuk mengenali dan memprediksi berbagai objek yang menjadi perhatian, dalam hal ini adalah fitur bangunan pada peta Rupabumi Indonesia (RBI) skala besar. Teknologi ini memiliki cakupan yang sangat luas, dan dalam penelitian ini akan dibahas aplikasi salah satu cabang kecerdasan buatan yang paling kompleks, yakni deep learning. Metode deep learning yang digunakan dalam ekstraksi fitur bangunan pada peta RBI skala besar adalah semantic segmentation, dimana objek tidak hanya dideteksi, namun juga disegmentasi bagian tepinya tanpa memperhatikan unit satuan bangunan sehingga diperoleh hasil-hasil berupa fitur bangunan dalam satu kesatuan segmen dan fitur selain bangunan menjadi segmen lainnya. Algoritma semantic segmentation yang dipilih adalah Unet yang dibagi ke dalam arsitektur Small Unet dan Full Unet. Arsitektur Small Unet menggunakan 18 layer konvolusi sedangkan Full Unet menggunakan 19 layer. Data training yang digunakan adalah data UAV wilayah Kantor Badan Informasi Geospasial (BIG), foto udara Wuhan University, dan foto udara kota Austin Texas. Rasio data training-testing yang digunakan adalah 80%:20%, dengan learning rate 10-4, fungsi optimasi Adams dan fungsi loss binary cross-entropy. Proses pembuatan model (training) dilakukan menggunakan perangkat lunak Tensorflow yang dijalankan dalam platform Google Colaboratory. Arsitektur Small Unet memberikan hasil 0,119 untuk model loss; 0,932 untuk akurasi piksel dan 0,698 untuk mean Intersection over Union (IoU). Sementara itu arsitektur Full Unet memberikan hasil yang relatif lebih baik yakni 0,112; 0,943; dan 0,773 masing-masing untuk model loss, akurasi piksel dan IoU.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hyunseob Kim ◽  
Jeongcheol Lee ◽  
Sunil Ahn ◽  
Jongsuk Ruth Lee

AbstractDeep learning has brought a dramatic development in molecular property prediction that is crucial in the field of drug discovery using various representations such as fingerprints, SMILES, and graphs. In particular, SMILES is used in various deep learning models via character-based approaches. However, SMILES has a limitation in that it is hard to reflect chemical properties. In this paper, we propose a new self-supervised method to learn SMILES and chemical contexts of molecules simultaneously in pre-training the Transformer. The key of our model is learning structures with adjacency matrix embedding and learning logics that can infer descriptors via Quantitative Estimation of Drug-likeness prediction in pre-training. As a result, our method improves the generalization of the data and achieves the best average performance by benchmarking downstream tasks. Moreover, we develop a web-based fine-tuning service to utilize our model on various tasks.


Sign in / Sign up

Export Citation Format

Share Document