Association Loss and Self-Discovery Cross-Camera Anchors Detection for Unsupervised Video-Based Person Re-Identification

Author(s):  
Xiuhuan Yuan ◽  
Hua Han ◽  
Li Huang

With the continuous improvement and development of cameras network, surveillance video has become the data source of the column stream, which greatly promotes the development of cross-camera person re-identification (Re-ID). However, supervised learning requires a lot of effort to manually label cross-cameras pairwise training data, which is lack of scalability and practical in actual video surveillance because there is a lack of well-labeled pairs of positive and negative samples under each camera. For addressing these negative effects, we set judgment conditions by using the association ranking method to self-discover positive and negative track-lets pairs of anchors with none of the pairwise ID labels, thereby defining a triplet loss. In order to optimize association loss for learning effective discriminative feature, the triplet loss adds adaptive weights according to the degree of easy-hard samples to generate an Adaptive Weighted Conditional Triplet Loss. Besides, for increasing the accuracy of self-discovering cross-camera anchors independently, which means successfully mine mutually best-matched track-lets and merge them under cross-camera, we use the top-rank from the intra-camera ranking list as a self-matched query sample which can double verify the matched-degree between top-rank. And eventually, we establish a new Association Loss and Self-Discovery Learning (ALSL) model with a complete end-to-end manner. We use three standard datasets, PRID2011, iLIDS-VID and MARS, to train the model and the experimental results prove that ALSL rank-1 is better than some superior video-based unsupervised person Re-ID methods.

Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1757
Author(s):  
María J. Gómez-Silva ◽  
Arturo de la Escalera ◽  
José M. Armingol

Recognizing the identity of a query individual in a surveillance sequence is the core of Multi-Object Tracking (MOT) and Re-Identification (Re-Id) algorithms. Both tasks can be addressed by measuring the appearance affinity between people observations with a deep neural model. Nevertheless, the differences in their specifications and, consequently, in the characteristics and constraints of the available training data for each one of these tasks, arise from the necessity of employing different learning approaches to attain each one of them. This article offers a comparative view of the Double-Margin-Contrastive and the Triplet loss function, and analyzes the benefits and drawbacks of applying each one of them to learn an Appearance Affinity model for Tracking and Re-Identification. A batch of experiments have been conducted, and their results support the hypothesis concluded from the presented study: Triplet loss function is more effective than the Contrastive one when an Re-Id model is learnt, and, conversely, in the MOT domain, the Contrastive loss can better discriminate between pairs of images rendering the same person or not.


Author(s):  
Xiawu Zheng ◽  
Rongrong Ji ◽  
Xiaoshuai Sun ◽  
Yongjian Wu ◽  
Feiyue Huang ◽  
...  

Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemesare typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fine-grained object retrieval scheme that conquers these issues in a unified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very efficient (1,000times training speedup comparing to the triplet loss) and discriminative feature learning by a ?centralized? global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features ?within? the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance ofthe proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We havereported significant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017]on CARS196, and 3.7% on CUB200-2011.  


2020 ◽  
Vol 34 (07) ◽  
pp. 11029-11036
Author(s):  
Jiabo Huang ◽  
Qi Dong ◽  
Shaogang Gong ◽  
Xiatian Zhu

Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.


Author(s):  
Victa Sari Dwi Kurniati ◽  
Kankamon Suthum

This research is an action research that investigates the implementation of self-discovery autonomous learning using self-discovery technique with self-prepared worksheet in extensive reading class. The study was conducted in the contexts of a classroom teaching and learning situation. Self-discovery learning is believed to be effective in helping the students to improve their reading skills because this technique forces student to be autonomous learner. The study used action research model by Kemmis and Taggart. However, the researcher does not use cyclic treatment in this study, instead its implemented in one big cycle in one semester. As it is needed to be implemented in the whole semester to get the real result. The autonomous learning using self-discovery technique with self-prepared worksheet in extensive reading class was able to improve the students’ achievement in extensive reading, it is indicated in the post test scores compared to the pre-test score. Nevertheless, there are positives and negative effects found during the acting and observing phase. The positive effects are the students become more active in class, the students can explore the text freely as they like, thus, lead to the students’ better communication. Whereas, the class becomes noisy since the students have to discuss in their group and the lecturer had more burdens in finding, selecting, choosing and providing the text for the class activities. Keywords: autonomous learning, self-prepared worksheet, extensive reading.


Author(s):  
Robert Planas ◽  
Nicholas Oune ◽  
Ramin Bostanabad

Abstract Emulation plays an indispensable role in engineering design. However, the majority of emulation methods are formulated for interpolation purposes and their performance significantly deteriorates in extrapolation. In this paper, we develop a method for extrapolation by integrating Gaussian processes (GPs) and evolutionary programming (EP). Our underlying assumption is that there is a set of free-form parametric bases that can model the data source reasonably well. Consequently, if we can find these bases via some training data over a region, we can do predictions outside of that region. To systematically and efficiently find these bases, we start by learning a GP without any parametric mean function. Then, a rich dataset is generated by this GP and subsequently used in EP to find some parametric bases. Afterwards, we retrain the GP while using the bases found by EP. This retraining essentially allows to validate and/or correct the discovered bases via maximum likelihood estimation. By iterating between GP and EP we robustly and efficiently find the underlying bases that can be used for extrapolation. We validate our approach with a host of analytical problems in the absence or presence of noise. We also study an engineering example on finding the constitutive law of a composite microstructure.


Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6612
Author(s):  
Guoyong Zhang ◽  
Zhaohui Tang ◽  
Jin Zhang ◽  
Weihua Gui

Visual perception-based methods are a promising means of capturing the surface damage state of wire ropes and hence provide a potential way to monitor the condition of wire ropes. Previous methods mainly concentrated on the handcrafted feature-based flaw representation, and a classifier was constructed to realize fault recognition. However, appearances of outdoor wire ropes are seriously affected by noises like lubricating oil, dust, and light. In addition, in real applications, it is difficult to prepare a sufficient amount of flaw data to train a fault classifier. In the context of these issues, this study proposes a new flaw detection method based on the convolutional denoising autoencoder (CDAE) and Isolation Forest (iForest). CDAE is first trained by using an image reconstruction loss. Then, it is finetuned to minimize a cost function that penalizes the iForest-based flaw score difference between normal data and flaw data. Real hauling rope images of mine cableways were used to test the effectiveness and advantages of the newly developed method. Comparisons of various methods showed the CDAE-iForest method performed better in discriminative feature learning and flaw isolation with a small amount of flaw training data.


2021 ◽  
Vol 22 (1) ◽  
pp. 103-117
Author(s):  
Amir Hossein Danesh ◽  
Hossein Shirgahi

Although research on social networks is progressing rapidly, the positive and negative effects of this area should be evaluated. One of the problems is that social networks are very broad and anyone can have influence on them. This matter can cause the issue of people with different beliefs. Therefore, determining the amount of trust to various resources on social networks, and especially resources for which there is no previous history on the web, is one of the main challenges in this field. In this paper, we present a method for predicting trust in a social network by structural similarities through the neural network. In this method, the web of trust data set is converted to a structural similarity data set based on the similarity of the trustors and trustees first. Then, on the created data set, a part of the data set is considered as the training data and it is trained based on the multilayer perceptron neural network and then the trained neural network is tested based on the test data. In the proposed method, the MSE value is less than 0.01, which has improved more than 0.02 compared to previous methods. Based on the obtained results, the proposed method has provided acceptable accuracy. ABSTRAK: Walaupun kajian tentang rangkaian sosial adalah sangat pesat, kesan positif dan negatif dalam ruang lingkup ini perlu dinilai. Masalah rangkaian sosial adalah sangat luas dan sesiapa sahaja boleh terpengaruh. Perkara ini akan menyebabkan manusia dengan pelbagai isu kepercayaan. Oleh itu, menentukan nilai kepercayaan melalui pelbagai sumber dalam rangkaian sosial, terutama sumber-sumber yang tidak mempunyai sejarah lepas dalam web, adalah salah satu cabaran dalam bidang ini. Kajian ini membentangkan jangkaan kepercayaan dalam rangkaian sosial melalui persamaan struktur dengan menggunakan rangkaian neural. Kaedah ini ditentukan dengan menukar set data web kepercayaan kepada struktur set data hampir sama berdasarkan kesamaan pemegang dan pemberi amanah. Kemudian, sebilangan set data yang telah dibina ini dipertimbangkan sebagai data latihan dan ia dilatih berdasarkan rangkaian neural perseptron berbagai lapisan dan kemudian rangkaian neural yang terlatih ini diuji berdasarkan data ujian. Dalam kaedah yang dicadangkan ini, nilai MSE adalah kurang daripada 0.01, di mana telah diperbaiki kepada 0.02 lebih daripada kaedah-kaedah sebelum ini. Berdasarkan dapatan kajian, didapati kaedah yang dicadangkan ini menunjukkan ketepatan yang boleh diterima.


2016 ◽  
Vol 26 (1) ◽  
pp. 203-213 ◽  
Author(s):  
Bartłomiej Stasiak ◽  
Jędrzej Mońko ◽  
Adam Niewiadomski

Abstract The problem of note onset detection in musical signals is considered. The proposed solution is based on known approaches in which an onset detection function is defined on the basis of spectral characteristics of audio data. In our approach, several onset detection functions are used simultaneously to form an input vector for a multi-layer non-linear perceptron, which learns to detect onsets in the training data. This is in contrast to standard methods based on thresholding the onset detection functions with a moving average or a moving median. Our approach is also different from most of the current machine-learning-based solutions in that we explicitly use the onset detection functions as an intermediate representation, which may therefore be easily replaced with a different one, e.g., to match the characteristics of a particular audio data source. The results obtained for a database containing annotated onsets for 17 different instruments and ensembles are compared with state-of-the-art solutions.


2019 ◽  
Vol 9 (15) ◽  
pp. 3133
Author(s):  
Sanghyun Seo ◽  
Juntae Kim

Traditional supervised learning is dependent on the label of the training data, so there is a limitation that the class label which is not included in the training data cannot be recognized properly. Therefore, zero-shot learning, which can recognize unseen-classes that are not used in training, is gaining research interest. One approach to zero-shot learning is to embed visual data such as images and rich semantic data related to text labels of visual data into a common vector space to perform zero-shot cross-modal retrieval on newly input unseen-class data. This paper proposes a hierarchical semantic loss and confidence estimator to more efficiently perform zero-shot learning on visual data. Hierarchical semantic loss improves learning efficiency by using hierarchical knowledge in selecting a negative sample of triplet loss, and the confidence estimator estimates the confidence score to determine whether it is seen-class or unseen-class. These methodologies improve the performance of zero-shot learning by adjusting distances from a semantic vector to visual vector when performing zero-shot cross-modal retrieval. Experimental results show that the proposed method can improve the performance of zero-shot learning in terms of hit@k accuracy.


Sign in / Sign up

Export Citation Format

Share Document