scholarly journals Robust Machine Learning Classification of Unlabeled Biological Data: A case study with herbaria sheets

Author(s):  
Jonathan Koss ◽  
Anthony Jiang ◽  
Patrick Sweeney ◽  
Nelson Rios ◽  
Aaron Dollar

There is much excitement across a broad range of biological disciplines over the prospect of using deep learning and similar modern statistical methods to label research data. The extensive time, effort, and cost required for humans to label a dataset drastically limits the type and amount of data that can be reasonably utilized, and is currently a major bottleneck to the extensive application of biological datasets such as specimen imagery, video and audio recordings. While a number of researchers have shown how deep convolutional neural networks (CNN) can be trained to classify image data with 80-90% accuracy, that range of accuracy is still too low for most research applications. Furthermore, applying these classifiers to new, unlabeled data from a dataset other than the one used for training the classifier would likely result in even lower accuracy. As a result, these classifiers have still not generally been applied to unlabeled data—which is where they could be most useful. In this talk, we will present a method for determining a confidence metric on predicted classifications (i.e. "labels") from a deep CNN classifier that can inform a user whether to trust a particular automatic label or to discard it, thereby giving a reasonable and straight-forward method to label a previously unlabeled dataset with high confidence. Essentially, it is an approach that allows an imperfect method of classification to be used in a useful way that can save an enormous amount of time and effort and/or greatly increase the amount of data that can be reasonably utilized. In this work, the training dataset consisted of a set of records of flowering plant species that collectively exhibited a range of reproductive morphologies, represented multiple taxonomic groups, and could be easily scored by humans for reproductive condition by examination of specimen images. The records were labeled as reproductive, budding, flowering and/or fruiting. All of the data and images were obtained from the Consortium of Northeastern Herbaria portal (CNH). There were two unscored datasets that were used to evaluate the classifiers. One dataset contained the same taxa that were in the training dataset and the second dataset contained all remaining flowering plant taxa in the CNH portal database that were not included in the other two datasets. Records of families with flowers that are obscure (i.e., they lack petals & sepals or have vestigial structures) were excluded. To label the reproductive state of the plants, we trained one deep CNN classifier using the XCeption architecture for the binary classification of each state (e.g., budding vs. not budding). This method and architecture was chosen because of its success in similar image-classification tasks. Each of these networks takes an image of a herbarium sheet as input, and outputs a value in the interval [0,1]. In these networks, the output is typically thresholded to generate a binary label, but we found it could also be used to approximate a measure of confidence in the network’s classification. By treating this value as a confidence metric, we are able to input a large unlabeled dataset into the classifier and then trust the labels that were assigned a “high confidence” and leave the remainder unlabeled. After training the network, the performance of the four classifiers (reproductive, budding, flowering, fruiting) achieved 85-90% accuracy compared to expert-labeled data. However, as described above, the real value of these approaches comes from their prospects for labelling previously unlabeled data, thus helping to replace expensive and time-consuming human labor. We then applied our confidence-interval-based approach to a collection of 600k images and were able to label 35-70% of the samples with a chosen confidence threshold of 95%. In other words, we could then use the high-confidence labels and simply not automatically label the remaining unclassifiable samples. The data from these samples could then be labeled manually, or, if appropriate, not labeled at all.

Author(s):  
Xiaojun Lu ◽  
Yue Yang ◽  
Weilin Zhang ◽  
Qi Wang ◽  
Yang Wang

Face verification for unrestricted faces in the wild is a challenging task. This paper proposes a method based on two deep convolutional neural networks(CNN) for face verification. In this work, we explore to use identification signal to supervise one CNN and the combination of semi-verification and identification to train the other one. In order to estimate semi-verification loss at a low computation cost, a circle, which is composed of all faces, is used for selecting face pairs from pairwise samples. In the process of face normalization, we propose to use different landmarks of faces to solve the problems caused by poses. And the final face representation is formed by the concatenating feature of each deep CNN after PCA reduction. What's more, each feature is a combination of multi-scale representations through making use of auxiliary classifiers. For the final verification, we only adopt the face representation of one region and one resolution of a face jointing Joint Bayesian classifier. Experiments show that our method can extract effective face representation with a small training dataset and our algorithm achieves 99.71% verification accuracy on LFW dataset.


2020 ◽  
Vol 2020 (10) ◽  
pp. 28-1-28-7 ◽  
Author(s):  
Kazuki Endo ◽  
Masayuki Tanaka ◽  
Masatoshi Okutomi

Classification of degraded images is very important in practice because images are usually degraded by compression, noise, blurring, etc. Nevertheless, most of the research in image classification only focuses on clean images without any degradation. Some papers have already proposed deep convolutional neural networks composed of an image restoration network and a classification network to classify degraded images. This paper proposes an alternative approach in which we use a degraded image and an additional degradation parameter for classification. The proposed classification network has two inputs which are the degraded image and the degradation parameter. The estimation network of degradation parameters is also incorporated if degradation parameters of degraded images are unknown. The experimental results showed that the proposed method outperforms a straightforward approach where the classification network is trained with degraded images only.


2020 ◽  
Vol 13 (5) ◽  
pp. 508-523 ◽  
Author(s):  
Guan‐Hua Huang ◽  
Chih‐Hsuan Lin ◽  
Yu‐Ren Cai ◽  
Tai‐Been Chen ◽  
Shih‐Yen Hsu ◽  
...  

2021 ◽  
Vol 79 ◽  
pp. 52-58
Author(s):  
Arnaldo Stanzione ◽  
Renato Cuocolo ◽  
Francesco Verde ◽  
Roberta Galatola ◽  
Valeria Romeo ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Song-Quan Ong ◽  
Hamdan Ahmad ◽  
Gomesh Nair ◽  
Pradeep Isawasan ◽  
Abdul Hafiz Ab Majid

AbstractClassification of Aedes aegypti (Linnaeus) and Aedes albopictus (Skuse) by humans remains challenging. We proposed a highly accessible method to develop a deep learning (DL) model and implement the model for mosquito image classification by using hardware that could regulate the development process. In particular, we constructed a dataset with 4120 images of Aedes mosquitoes that were older than 12 days old and had common morphological features that disappeared, and we illustrated how to set up supervised deep convolutional neural networks (DCNNs) with hyperparameter adjustment. The model application was first conducted by deploying the model externally in real time on three different generations of mosquitoes, and the accuracy was compared with human expert performance. Our results showed that both the learning rate and epochs significantly affected the accuracy, and the best-performing hyperparameters achieved an accuracy of more than 98% at classifying mosquitoes, which showed no significant difference from human-level performance. We demonstrated the feasibility of the method to construct a model with the DCNN when deployed externally on mosquitoes in real time.


2020 ◽  
Vol 65 (6) ◽  
pp. 759-773
Author(s):  
Segu Praveena ◽  
Sohan Pal Singh

AbstractLeukaemia detection and diagnosis in advance is the trending topic in the medical applications for reducing the death toll of patients with acute lymphoblastic leukaemia (ALL). For the detection of ALL, it is essential to analyse the white blood cells (WBCs) for which the blood smear images are employed. This paper proposes a new technique for the segmentation and classification of the acute lymphoblastic leukaemia. The proposed method of automatic leukaemia detection is based on the Deep Convolutional Neural Network (Deep CNN) that is trained using an optimization algorithm, named Grey wolf-based Jaya Optimization Algorithm (GreyJOA), which is developed using the Grey Wolf Optimizer (GWO) and Jaya Optimization Algorithm (JOA) that improves the global convergence. Initially, the input image is applied to pre-processing and the segmentation is performed using the Sparse Fuzzy C-Means (Sparse FCM) clustering algorithm. Then, the features, such as Local Directional Patterns (LDP) and colour histogram-based features, are extracted from the segments of the pre-processed input image. Finally, the extracted features are applied to the Deep CNN for the classification. The experimentation evaluation of the method using the images of the ALL IDB2 database reveals that the proposed method acquired a maximal accuracy, sensitivity, and specificity of 0.9350, 0.9528, and 0.9389, respectively.


Sign in / Sign up

Export Citation Format

Share Document