scholarly journals Loss Architecture Search for Few-Shot Object Recognition

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Jun Yue ◽  
Zelang Miao ◽  
Yueguang He ◽  
Nianchun Du

Few-shot object recognition, which exploits a set of well-labeled data to build a classifier for new classes that have only several samples per class, has received extensive attention from the machine learning community. In this paper, we investigate the problem of designing an optimal loss function for few-shot object recognition and propose a novel few-shot object recognition system that includes the following three steps: (1) generate a loss function architecture using a recurrent neural network (generator); (2) train a base embedding network with the generated loss function on a training set; (3) fine-tune the base embedding network using the few-shot instances from a validation set to obtain the accuracy and use it as a reward signal to update the generator. This procedure is repeated and implemented in the reinforcement learning framework for finding the best loss architecture such that the embedding network yields the highest validation accuracy. Our key insight is to create a search space of the loss function architectures and evaluate the quality of a particular loss function on the dataset of interest. We conduct experiments on three popular datasets for few-shot learning. The results show that the proposed approach achieves better performance than state-of-the-art methods.

Author(s):  
Amelia Zafra

The multiple-instance problem is a difficult machine learning problem that appears in cases where knowledge about training examples is incomplete. In this problem, the teacher labels examples that are sets (also called bags) of instances. The teacher does not label whether an individual instance in a bag is positive or negative. The learning algorithm needs to generate a classifier that will correctly classify unseen examples (i.e., bags of instances). This learning framework is receiving growing attention in the machine learning community and since it was introduced by Dietterich, Lathrop, Lozano-Perez (1997), a wide range of tasks have been formulated as multi-instance problems. Among these tasks, we can cite content-based image retrieval (Chen, Bi, & Wang, 2006) and annotation (Qi and Han, 2007), text categorization (Andrews, Tsochantaridis, & Hofmann, 2002), web index page recommendation (Zhou, Jiang, & Li, 2005; Xue, Han, Jiang, & Zhou, 2007) and drug activity prediction (Dietterich et al., 1997; Zhou & Zhang, 2007). In this chapter we introduce MOG3P-MI, a multiobjective grammar guided genetic programming algorithm to handle multi-instance problems. In this algorithm, based on SPEA2, individuals represent classification rules which make it possible to determine if a bag is positive or negative. The quality of each individual is evaluated according to two quality indexes: sensitivity and specificity. Both these measures have been adapted to MIL circumstances. Computational experiments show that the MOG3P-MI is a robust algorithm for classification in different domains where achieves competitive results and obtain classifiers which contain simple rules which add comprehensibility and simplicity in the knowledge discovery process, being suitable method for solving MIL problems (Zafra & Ventura, 2007).


1997 ◽  
Vol 08 (02) ◽  
pp. 173-179 ◽  
Author(s):  
Soheil Shams

The task of visual object recognition is often complicated by the fact that a single 3-D object can undergo a number of transformations which can substantially alter its projection onto a 2-D surface, such as the retina. Such transformations include translation of the object in the visual field, changes in the size of the object, its orientation in the 2-D plane and the viewing perspective. For a general pattern recognition system to detect and recognize an object after such transformations, it must be able to associate widely differing patterns with the same object label. In this paper, a novel self-organizing model, called the Multiple Elastic Modules (MEM), is presented which attempts to solve this problem by searching a multi-dimensional space, where each axis is defined by one of the transformations (e.g scale, translation, rotation, etc.). A particular object of a specific size, orientation and spatial location is mapped onto a single point in this space. Of course, distortions and minor variations in an object's image will expand this point to a small localized area in this multi-dimensional space. Such a powerful representation scheme comes at a cost of high computational demand due to the combinatorially large search space. The MEM approach to solving this problem efficiently partitions the solution space to search the most promising areas for the correct match. Simulation results are presented on detecting a stick-figure object under translation, distortion, scale, and rotation transformations in a cluttered background.


Author(s):  
Phat Nguyen Huu ◽  
Loc Hoang Bao ◽  
Hoang Lai The

Many researches have been going on since last two decades for object recognition, shape matching, and pattern recognition in the field of computer vision. Face recognition is one of the important issues in object recognition and computer vision. Many face image datasets, related competitions, and evaluation programs have encouraged innovation, producing more powerful facial recognition technology with promising results. In recent years, we have witnessed tremendous improvements in face recognition performance from complex deep neural network architectures trained on millions of face images. Face recognition is the most important biometric and stills many challenges such as pose variation, illumination variation, etc. In order to achieve the desired performance when deploying in reality, the methods depend on many factors. One of the main factors is quality of input image. Therefore, facial recognition systems is installed outdoors which are always affected by extreme weather events such as haze, fog. The existence of haze dramatically degrades the visibility of outdoor images captured in inclement weather and affects many high-level computer vision tasks such as detection and recognition system. In this paper, we propose a preprocessing method to remove haze from input images that enhances their quality to improve effectiveness and recognition rate for face identification based on Convolutional Neural Network (CNN) based on the available datasets and our self-built data. To perform the proposed method for outdoor face recognition system, we have improved the system accuracy from 90.53% to 98.14%. The results show that the proposed method improves the quality of the image with other traditional methods.


2015 ◽  
Vol 65 ◽  
pp. 691-700
Author(s):  
Esraa Elhariri ◽  
Nashwa El-Bendary ◽  
Aboul Ella Hassanien ◽  
Vaclav Snasel

2021 ◽  
Vol 11 (11) ◽  
pp. 4758
Author(s):  
Ana Malta ◽  
Mateus Mendes ◽  
Torres Farinha

Maintenance professionals and other technical staff regularly need to learn to identify new parts in car engines and other equipment. The present work proposes a model of a task assistant based on a deep learning neural network. A YOLOv5 network is used for recognizing some of the constituent parts of an automobile. A dataset of car engine images was created and eight car parts were marked in the images. Then, the neural network was trained to detect each part. The results show that YOLOv5s is able to successfully detect the parts in real time video streams, with high accuracy, thus being useful as an aid to train professionals learning to deal with new equipment using augmented reality. The architecture of an object recognition system using augmented reality glasses is also designed.


2021 ◽  
Vol 11 (6) ◽  
pp. 2838
Author(s):  
Nikitha Johnsirani Venkatesan ◽  
Dong Ryeol Shin ◽  
Choon Sung Nam

In the pharmaceutical field, early detection of lung nodules is indispensable for increasing patient survival. We can enhance the quality of the medical images by intensifying the radiation dose. High radiation dose provokes cancer, which forces experts to use limited radiation. Using abrupt radiation generates noise in CT scans. We propose an optimal Convolutional Neural Network model in which Gaussian noise is removed for better classification and increased training accuracy. Experimental demonstration on the LUNA16 dataset of size 160 GB shows that our proposed method exhibit superior results. Classification accuracy, specificity, sensitivity, Precision, Recall, F1 measurement, and area under the ROC curve (AUC) of the model performance are taken as evaluation metrics. We conducted a performance comparison of our proposed model on numerous platforms, like Apache Spark, GPU, and CPU, to depreciate the training time without compromising the accuracy percentage. Our results show that Apache Spark, integrated with a deep learning framework, is suitable for parallel training computation with high accuracy.


Author(s):  
Ke Wang ◽  
Qingwen Xue ◽  
Jian John Lu

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.


Sign in / Sign up

Export Citation Format

Share Document