scholarly journals Reducing Manual Supervision Required for Biodiversity Monitoring with Self-Supervised Learning

Author(s):  
Omiros Pantazis ◽  
Gabriel Brostow ◽  
Kate Jones ◽  
Oisin Mac Aodha

Recent years have ushered in a vast array of different types of low cost and reliable sensors that are capable of capturing large quantities of audio and visual information from the natural world. In the case of biodiversity monitoring, camera traps (i.e. remote cameras that take images when movement is detected (Kays et al. 2009) have shown themselves to be particularly effective tools for the automated monitoring of the presence and activity of different animal species. However, this ease of deployment comes at a cost, as even a small scale camera trapping project can result in hundreds of thousands of images that need to be reviewed. Until recently, this review process was an extremely time consuming endeavor. It required domain experts to manually inspect each image to: determine if it contained a species of interest and identify, where possible, which species was present. Fortunately, in the last five years, advances in machine learning have resulted in a new suite of algorithms that are capable of automatically performing image classification tasks like species classification. determine if it contained a species of interest and identify, where possible, which species was present. Fortunately, in the last five years, advances in machine learning have resulted in a new suite of algorithms that are capable of automatically performing image classification tasks like species classification. The effectiveness of deep neural networks (Norouzzadeh et al. 2018), coupled with transfer learning (tuning a model that is pretrained on a larger dataset (Willi et al. 2018), have resulted in high levels of accuracy on camera trap images. However, camera trap images exhibit unique challenges that are typically not present in standard benchmark datasets used in computer vision. For example, objects of interest are often heavily occluded, the appearance of a scene can change dramatically over time due to changes in weather and lighting, and while the overall number of images can be large, the variation in locations is often limited (Schneider et al. 2020). These challenges combined mean that in order to reach high performance on species classification it is necessary to collect a large amount of annotated data to train the deep models. This again takes a significant amount of time for each project, and this time could be better spent addressing the ecological or conservation questions of interest. Self-supervised learning is a paradigm in machine learning that attempts to forgo the need for manual supervision by instead learning informative representations from images directly, e.g. transforming an image in two different ways without impacting the semantics of the included object, and learn by imposing similarity between the two tranformations. This is a tantalizing proposition for camera trap data, as it has the potential to drastically reduce the amount of time required to annotate data. The current performance of these methods on standard computer vision benchmarks is encouraging, as it suggests that self-supervised models have begun to reach the accuracy of their fully supervised counterparts for tasks like classifying everyday objects in images (Chen et al. 2020). However, existing self-supervised methods can struggle when applied to tasks that contain highly similar, i.e. fine-grained, object categories such as different species of plants and animals (Van Horn et al. 2021). To this end, we explore the effectiveness of self-supervised learning when applied to camera trap imagery. We show that these methods can be used to train image classifiers with a significant reduction in manual supervision. Furthermore, we extend this analysis by showing, with some careful design considerations, that off-the-shelf self-supervised methods can be made to learn even more effective image representations for automated species classification. We show that by exploiting cues at training time related to where and when a given image was captured can result in further improvements in classification performance. We demonstrate, across several different camera trapping datasets, that it is possible to achieve similar, and sometimes even superior, accuracy to fully supervised transfer learning-based methods using a factor of ten times less manual supervision. Finally, we discuss some of the limitations of the outlined approaches and their implications on automated species classification from images.

Technologies ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 2
Author(s):  
Ashish Jaiswal ◽  
Ashwin Ramesh Babu ◽  
Mohammad Zaki Zadeh ◽  
Debapriya Banerjee ◽  
Fillia Makedon

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.


2018 ◽  
Vol 7 (2.7) ◽  
pp. 614 ◽  
Author(s):  
M Manoj krishna ◽  
M Neelima ◽  
M Harshali ◽  
M Venu Gopala Rao

The image classification is a classical problem of image processing, computer vision and machine learning fields. In this paper we study the image classification using deep learning. We use AlexNet architecture with convolutional neural networks for this purpose. Four test images are selected from the ImageNet database for the classification purpose. We cropped the images for various portion areas and conducted experiments. The results show the effectiveness of deep learning based image classification using AlexNet.  


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2684 ◽  
Author(s):  
Obed Tettey Nartey ◽  
Guowu Yang ◽  
Sarpong Kwadwo Asare ◽  
Jinzhao Wu ◽  
Lady Nadia Frempong

Traffic sign recognition is a classification problem that poses challenges for computer vision and machine learning algorithms. Although both computer vision and machine learning techniques have constantly been improved to solve this problem, the sudden rise in the number of unlabeled traffic signs has become even more challenging. Large data collation and labeling are tedious and expensive tasks that demand much time, expert knowledge, and fiscal resources to satisfy the hunger of deep neural networks. Aside from that, the problem of having unbalanced data also poses a greater challenge to computer vision and machine learning algorithms to achieve better performance. These problems raise the need to develop algorithms that can fully exploit a large amount of unlabeled data, use a small amount of labeled samples, and be robust to data imbalance to build an efficient and high-quality classifier. In this work, we propose a novel semi-supervised classification technique that is robust to small and unbalanced data. The framework integrates weakly-supervised learning and self-training with self-paced learning to generate attention maps to augment the training set and utilizes a novel pseudo-label generation and selection algorithm to generate and select pseudo-labeled samples. The method improves the performance by: (1) normalizing the class-wise confidence levels to prevent the model from ignoring hard-to-learn samples, thereby solving the imbalanced data problem; (2) jointly learning a model and optimizing pseudo-labels generated on unlabeled data; and (3) enlarging the training set to satisfy the hunger of deep learning models. Extensive evaluations on two public traffic sign recognition datasets demonstrate the effectiveness of the proposed technique and provide a potential solution for practical applications.


2020 ◽  
Vol 23 (6) ◽  
pp. 1172-1191
Author(s):  
Artem Aleksandrovich Elizarov ◽  
Evgenii Viktorovich Razinkov

Recently, such a direction of machine learning as reinforcement learning has been actively developing. As a consequence, attempts are being made to use reinforcement learning for solving computer vision problems, in particular for solving the problem of image classification. The tasks of computer vision are currently one of the most urgent tasks of artificial intelligence. The article proposes a method for image classification in the form of a deep neural network using reinforcement learning. The idea of ​​the developed method comes down to solving the problem of a contextual multi-armed bandit using various strategies for achieving a compromise between exploitation and research and reinforcement learning algorithms. Strategies such as -greedy, -softmax, -decay-softmax, and the UCB1 method, and reinforcement learning algorithms such as DQN, REINFORCE, and A2C are considered. The analysis of the influence of various parameters on the efficiency of the method is carried out, and options for further development of the method are proposed.


Author(s):  
Luís Fabrício de F. Souza ◽  
Gabriel Bandeira Holanda ◽  
Francisco Hércules dos S. Silva ◽  
Shara S.A. Alves ◽  
Pedro Pedrosa Rebouças Filho

2014 ◽  
Vol 602-605 ◽  
pp. 1933-1937
Author(s):  
Lian Fa Wu

In recent years, Internet traffic classification using machine learning is a hot topic, and supervised learning methods which contain Support Vector Machine were used to identify Internet traffic in many papers. The supervised learning methods need many instances which have been labeled to train classifying model, but it is difficult to label the instances because many traffic have been encrypted. Labeled instances and unlabeled instances can be used by semi-supervised learning methods to train the classifying model, so that it is very fit for p2p traffic identification. Transductive support vector machine is one of the typical semi-supervised learning methods. Based on theoretic analyzing and experiment, we compared the accuracy of TSVM and SVM. The experiment results show that the semi-supervised methods have some advantages on identification of p2p traffic.


2021 ◽  
Vol 13 (5) ◽  
pp. 858
Author(s):  
Joshua C.O. Koh ◽  
German Spangenberg ◽  
Surya Kant

Automated machine learning (AutoML) has been heralded as the next wave in artificial intelligence with its promise to deliver high-performance end-to-end machine learning pipelines with minimal effort from the user. However, despite AutoML showing great promise for computer vision tasks, to the best of our knowledge, no study has used AutoML for image-based plant phenotyping. To address this gap in knowledge, we examined the application of AutoML for image-based plant phenotyping using wheat lodging assessment with unmanned aerial vehicle (UAV) imagery as an example. The performance of an open-source AutoML framework, AutoKeras, in image classification and regression tasks was compared to transfer learning using modern convolutional neural network (CNN) architectures. For image classification, which classified plot images as lodged or non-lodged, transfer learning with Xception and DenseNet-201 achieved the best classification accuracy of 93.2%, whereas AutoKeras had a 92.4% accuracy. For image regression, which predicted lodging scores from plot images, transfer learning with DenseNet-201 had the best performance (R2 = 0.8303, root mean-squared error (RMSE) = 9.55, mean absolute error (MAE) = 7.03, mean absolute percentage error (MAPE) = 12.54%), followed closely by AutoKeras (R2 = 0.8273, RMSE = 10.65, MAE = 8.24, MAPE = 13.87%). In both tasks, AutoKeras models had up to 40-fold faster inference times compared to the pretrained CNNs. AutoML has significant potential to enhance plant phenotyping capabilities applicable in crop breeding and precision agriculture.


Author(s):  
Sara Beery ◽  
Dan Morris ◽  
Siyu Yang ◽  
Marcel Simon ◽  
Arash Norouzzadeh ◽  
...  

Camera traps are heat- or motion-activated cameras placed in the wild to monitor and investigate animal populations and behavior. They are used to locate threatened species, identify important habitats, monitor sites of interest, and analyze wildlife activity patterns. At present, the time required to manually review images severely limits productivity. Additionally, ~70% of camera trap images are empty, due to a high rate of false triggers. Previous work has shown good results on automated species classification in camera trap data (Norouzzadeh et al. 2018), but further analysis has shown that these results do not generalize to new cameras or new geographic regions (Beery et al. 2018). Additionally, these models will fail to recognize any species they were not trained on. In theory, it is possible to re-train an existing model in order to add missing species, but in practice, this is quite difficult and requires just as much machine learning expertise as training models from scratch. Consequently, very few organizations have successfully deployed machine learning tools for accelerating camera trap image annotation. We propose a different approach to applying machine learning to camera trap projects, combining a generalizable detector with project-specific classifiers. We have trained an animal detector that is able to find and localize (but not identify) animals, even species not seen during training, in diverse ecosystems worldwide. See Fig. 1 for examples of the detector run over camera trap data covering a diverse set of regions and species, unseen at training time. By first finding and localizing animals, we are able to: drastically reduce the time spent filtering empty images, and dramatically simplify the process of training species classifiers, because we can crop images to individual animals (and thus classifiers need only worry about animal pixels, not background pixels). drastically reduce the time spent filtering empty images, and dramatically simplify the process of training species classifiers, because we can crop images to individual animals (and thus classifiers need only worry about animal pixels, not background pixels). With this detector model as a powerful new tool, we have established a modular pipeline for on-boarding new organizations and building project-specific image processing systems. We break our pipeline into four stages: 1. Data ingestion First we transfer images to the cloud, either by uploading to a drop point or by mailing an external hard drive. Data comes in a variety of formats; we convert each data set to the COCO-Camera Traps format, i.e. we create a Javascript Object Notation (JSON) file that encodes the annotations and the image locations within the organization’s file structure. 2. Animal detection We next run our (generic) animal detector on all the images to locate animals. We have developed an infrastructure for efficiently running this detector on millions of images, dividing the load over multiple nodes. We find that a single detector works for a broad range of regions and species. If the detection results (as validated by the organization) are not sufficiently accurate, it is possible to collect annotations for a small set of their images and fine-tune the detector. Typically these annotations would be fed back into a new version of the general detector, improving results for subsequent projects. 3. Species classification Using species labels provided by the organization, we train a (project-specific) classifier on the cropped-out animals. 4. Applying the system to new data We use the general detector and the project-specific classifier to power tools facilitating accelerated verification and image review, e.g. visualizing the detections, selecting images for review based on model confidence, etc. The aim of this presentation is to present a new approach to structuring camera trap projects, and to formalize discussion around the steps that are required to successfully apply machine learning to camera trap images. The work we present is available at http://github.com/microsoft/cameratraps, and we welcome new collaborating organizations.


Sebatik ◽  
2020 ◽  
Vol 24 (2) ◽  
pp. 300-306
Author(s):  
Muhamad Jaelani Akbar ◽  
Mochamad Wisuda Sardjono ◽  
Margi Cahyanti ◽  
Ericks Rachmat Swedia

Sayuran merupakan sebutan bagi bahan pangan asal tumbuhan yang biasanya mengandung kadar air tinggi dan dikonsumsi dalam keadaan segar atau setelah diolah secara minimal. Keanekaragaman sayur yang terdapat di dunia menyebabkan keragaman pula dalam pengklasifikasian sayur. Oleh karena itu diperlukan adanya pendekatan digital agar dapat mengenali jenis sayuran dengan cepat dan mudah. Dalam penelitian ini jumlah jenis sayuran yang digunakan sebanyak 7 jenis diantara: brokoli, jagung, kacang panjang, pare, terung ungu, tomat dan kubis. Dataset yang digunakan berjumlah 941 gambar sayur dari 7 jenis sayur, ditambah 131 gambar sayur dari jenis yang tidak terdapat pada dataset, selain itu digunakan 291 gambar selain sayuran. Untuk melakukan klasifikasi jenis sayuran digunakan algoritme Convolutional Neural Network (CNN), yang merupakan salah satu bidang ilmu baru dalam Machine Learning dan berkembang dengan pesat. CNN merupakan salah satu algoritme yang terdapat pada metode Deep Learning dengan memiliki kemampuan yang baik dalam Computer Vision, salah satunya yaitu image classification atau klasifikasi objek citra. Uji coba dilakukan pada lima perangkat selular berbasiskan sistem operasi Android. Python digunakan sebagai bahasa pemrograman dalam merancang aplikasi mobile ini dengan menggunakan modul Tensor flow untuk melakukan training dan testing data. Metode yang dapat digunakan dalam melakukan klasifikasi citra ini yaitu Convolutional Neural Network (CNN). Hasil final test accuracy yang diperoleh yaitu didapat keakuratan mengenali jenis sayuran sebesar 98.1% dengan salah satu hasil pengujian yaitu klasifikasi sayur jagung dengan akurasi sebesar 99.98049%.


Sign in / Sign up

Export Citation Format

Share Document