scholarly journals Feature Re-Learning for Video Recommendation

Author(s):  
Chanjal C

Predicting the relevance between two given videos with respect to their visual content is a key component for content-based video recommendation and retrieval. The application is in video recommendation, video annotation, Category or near-duplicate video retrieval, video copy detection and so on. In order to estimate video relevance previous works utilize textual content of videos and lead to poor performance. The proposed method is feature re-learning for video relevance prediction. This work focus on the visual contents to predict the relevance between two videos. A given feature is projected into a new space by an affine transformation. Different from previous works this use a standard triplet ranking loss that optimize the projection process by a novel negative-enhanced triplet ranking loss. In order to generate more training data, propose a data augmentation strategy which works directly on video features. The multi-level augmentation strategy works for video features, which benefits the feature relearning. The proposed augmentation strategy can be flexibly used for frame-level or video-level features. The loss function that consider the absolute similarity of positive pairs and supervise the feature re-learning process and a new formula for video relevance computation.

2019 ◽  
Vol 8 (9) ◽  
pp. 390 ◽  
Author(s):  
Kun Zheng ◽  
Mengfei Wei ◽  
Guangmin Sun ◽  
Bilal Anas ◽  
Yu Li

Vehicle detection based on very high-resolution (VHR) remote sensing images is beneficial in many fields such as military surveillance, traffic control, and social/economic studies. However, intricate details about the vehicle and the surrounding background provided by VHR images require sophisticated analysis based on massive data samples, though the number of reliable labeled training data is limited. In practice, data augmentation is often leveraged to solve this conflict. The traditional data augmentation strategy uses a combination of rotation, scaling, and flipping transformations, etc., and has limited capabilities in capturing the essence of feature distribution and proving data diversity. In this study, we propose a learning method named Vehicle Synthesis Generative Adversarial Networks (VS-GANs) to generate annotated vehicles from remote sensing images. The proposed framework has one generator and two discriminators, which try to synthesize realistic vehicles and learn the background context simultaneously. The method can quickly generate high-quality annotated vehicle data samples and greatly helps in the training of vehicle detectors. Experimental results show that the proposed framework can synthesize vehicles and their background images with variations and different levels of details. Compared with traditional data augmentation methods, the proposed method significantly improves the generalization capability of vehicle detectors. Finally, the contribution of VS-GANs to vehicle detection in VHR remote sensing images was proved in experiments conducted on UCAS-AOD and NWPU VHR-10 datasets using up-to-date target detection frameworks.


2019 ◽  
Vol 63 (7) ◽  
pp. 1017-1030 ◽  
Author(s):  
Zhenjun Tang ◽  
Lv Chen ◽  
Heng Yao ◽  
Xianquan Zhang ◽  
Chunqiang Yu

Abstract Video hashing is a novel technique of multimedia processing and finds applications in video retrieval, video copy detection, anti-piracy search and video authentication. In this paper, we propose a robust video hashing based on discrete cosine transform (DCT) and non-negative matrix decomposition (NMF). The proposed video hashing extracts secure features from a normalized video via random partition and dominant DCT coefficients, and exploits NMF to learn a compact representation from the secure features. Experiments with 2050 videos are carried out to validate efficiency of the proposed video hashing. The results show that the proposed video hashing is robust to many digital operations and reaches good discrimination. Receiver operating characteristic (ROC) curve comparisons illustrate that the proposed video hashing outperforms some state-of-the-art algorithms in classification between robustness and discrimination.


Symmetry ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1497
Author(s):  
Harold Achicanoy ◽  
Deisy Chaves ◽  
Maria Trujillo

Deep learning applications on computer vision involve the use of large-volume and representative data to obtain state-of-the-art results due to the massive number of parameters to optimise in deep models. However, data are limited with asymmetric distributions in industrial applications due to rare cases, legal restrictions, and high image-acquisition costs. Data augmentation based on deep learning generative adversarial networks, such as StyleGAN, has arisen as a way to create training data with symmetric distributions that may improve the generalisation capability of built models. StyleGAN generates highly realistic images in a variety of domains as a data augmentation strategy but requires a large amount of data to build image generators. Thus, transfer learning in conjunction with generative models are used to build models with small datasets. However, there are no reports on the impact of pre-trained generative models, using transfer learning. In this paper, we evaluate a StyleGAN generative model with transfer learning on different application domains—training with paintings, portraits, Pokémon, bedrooms, and cats—to generate target images with different levels of content variability: bean seeds (low variability), faces of subjects between 5 and 19 years old (medium variability), and charcoal (high variability). We used the first version of StyleGAN due to the large number of publicly available pre-trained models. The Fréchet Inception Distance was used for evaluating the quality of synthetic images. We found that StyleGAN with transfer learning produced good quality images, being an alternative for generating realistic synthetic images in the evaluated domains.


Author(s):  
Chengwei Chen ◽  
Yuan Xie ◽  
Shaohui Lin ◽  
Ruizhi Qiao ◽  
Jian Zhou ◽  
...  

Novelty detection is the process of determining whether a query example differs from the learned training distribution. Previous generative adversarial networks based methods and self-supervised approaches suffer from instability training, mode dropping, and low discriminative ability. We overcome such problems by introducing a novel decoder-encoder framework. Firstly, a generative network (decoder) learns the representation by mapping the initialized latent vector to an image. In particular, this vector is initialized by considering the entire distribution of training data to avoid the problem of mode-dropping. Secondly, a contrastive network (encoder) aims to ``learn to compare'' through mutual information estimation, which directly helps the generative network to obtain a more discriminative representation by using a negative data augmentation strategy. Extensive experiments show that our model has significant superiority over cutting-edge novelty detectors and achieves new state-of-the-art results on various novelty detection benchmarks, e.g. CIFAR10 and DCASE. Moreover, our model is more stable for training in a non-adversarial manner, compared to other adversarial based novelty detection methods.


Author(s):  
Y. Xie ◽  
K. Schindler ◽  
J. Tian ◽  
X. X. Zhu

Abstract. Deep learning models achieve excellent semantic segmentation results for airborne laser scanning (ALS) point clouds, if sufficient training data are provided. Increasing amounts of annotated data are becoming publicly available thanks to contributors from all over the world. However, models trained on a specific dataset typically exhibit poor performance on other datasets. I.e., there are significant domain shifts, as data captured in different environments or by distinct sensors have different distributions. In this work, we study this domain shift and potential strategies to mitigate it, using two popular ALS datasets: the ISPRS Vaihingen benchmark from Germany and the LASDU benchmark from China. We compare different training strategies for cross-city ALS point cloud semantic segmentation. In our experiments, we analyse three factors that may lead to domain shift and affect the learning: point cloud density, LiDAR intensity, and the role of data augmentation. Moreover, we evaluate a well-known standard method of domain adaptation, deep CORAL (Sun and Saenko, 2016). In our experiments, adapting the point cloud density and appropriate data augmentation both help to reduce the domain gap and improve segmentation accuracy. On the contrary, intensity features can bring an improvement within a dataset, but deteriorate the generalisation across datasets. Deep CORAL does not further improve the accuracy over the simple adaptation of density and data augmentation, although it can mitigate the impact of improperly chosen point density, intensity features, and further dataset biases like lack of diversity.


2020 ◽  
Vol 2020 (14) ◽  
pp. 248-1-248-7
Author(s):  
Lan Fu ◽  
Hongkai Yu ◽  
Megna Shah ◽  
Jeff Simmons ◽  
Song Wang

Accurately and rapidly detecting the locations of the cores of large-scale dendrites from 2D sectioned microscopic images helps quantify the microstructure of material components. This provides a critical link between the processing and properties of the material. Such a tool could be a critical part of a quality control procedure for manufacturing these components. In this paper, we propose to use Faster R-CNN, a convolutional neural network (CNN) model that considers both the detection accuracy and computational efficiency, to detect the dendrite cores with complex shapes. However, training CNN models usually requires a large number of images annotated with ground-truth locations of dendrite cores, which are usually obtained by highly laborintensive manual annotations. In this paper, we leverage the crystallographic symmetry of dendrite cores for data augmentation – the cross sections of dendrite cores show, not perfect, but near four-fold rotation symmetry and we can rotate the image around the center of dendrite cores by specified angles to construct new training data without additional manual annotations. We conduct a series of experiments and the results show the effectiveness of the Faster R-CNN method with the proposed data augmentation strategy. Particularly, we find that we can reduce the number of the manually annotated training images by 75% while still maintaining the same detection accuracy of dendrite cores.


Information ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 255
Author(s):  
Yu Li ◽  
Xiao Li ◽  
Yating Yang ◽  
Rui Dong

One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data augmentation method that does not use extra monolingual data. We expand the training data by generating diversity pseudo parallel data on the source and target sides. To generate diversity data, the restricted sampling strategy is employed at the decoding steps. Finally, we filter and merge origin data and synthetic parallel corpus to train the final model. In the experiment, the proposed approach achieved 1.96 BLEU points in the IWSLT2014 German–English translation tasks, which was used to simulate a low-resource language. Our approach also consistently and substantially obtained 1.0 to 2.0 BLEU improvement in three other low-resource translation tasks, including English–Turkish, Nepali–English, and Sinhala–English translation tasks.


2019 ◽  
Vol 9 (6) ◽  
pp. 1128 ◽  
Author(s):  
Yundong Li ◽  
Wei Hu ◽  
Han Dong ◽  
Xueyan Zhang

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Michał Klimont ◽  
Mateusz Flieger ◽  
Jacek Rzeszutek ◽  
Joanna Stachera ◽  
Aleksandra Zakrzewska ◽  
...  

Hydrocephalus is a common neurological condition that can have traumatic ramifications and can be lethal without treatment. Nowadays, during therapy radiologists have to spend a vast amount of time assessing the volume of cerebrospinal fluid (CSF) by manual segmentation on Computed Tomography (CT) images. Further, some of the segmentations are prone to radiologist bias and high intraobserver variability. To improve this, researchers are exploring methods to automate the process, which would enable faster and more unbiased results. In this study, we propose the application of U-Net convolutional neural network in order to automatically segment CT brain scans for location of CSF. U-Net is a neural network that has proven to be successful for various interdisciplinary segmentation tasks. We optimised training using state of the art methods, including “1cycle” learning rate policy, transfer learning, generalized dice loss function, mixed float precision, self-attention, and data augmentation. Even though the study was performed using a limited amount of data (80 CT images), our experiment has shown near human-level performance. We managed to achieve a 0.917 mean dice score with 0.0352 standard deviation on cross validation across the training data and a 0.9506 mean dice score on a separate test set. To our knowledge, these results are better than any known method for CSF segmentation in hydrocephalic patients, and thus, it is promising for potential practical applications.


Sign in / Sign up

Export Citation Format

Share Document