Unsupervised Learning of Low Dimensional Satellite Image Representations via Variational Autoencoders

Author(s):  
Silvia Valero ◽  
Ferran Agullo ◽  
Jordi Inglada
Entropy ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. 1290
Author(s):  
Hongjuan Gao ◽  
Guohua Geng ◽  
Sheng Zeng

Computer-aided classification serves as the basis of virtual cultural relic management and display. The majority of the existing cultural relic classification methods require labelling of the samples of the dataset; however, in practical applications, there is often a lack of category labels of samples or an uneven distribution of samples of different categories. To solve this problem, we propose a 3D cultural relic classification method based on a low dimensional descriptor and unsupervised learning. First, the scale-invariant heat kernel signature (Si-HKS) was computed. The heat kernel signature denotes the heat flow of any two vertices across a 3D shape and the heat diffusion propagation is governed by the heat equation. Secondly, the Bag-of-Words (BoW) mechanism was utilized to transform the Si-HKS descriptor into a low-dimensional feature tensor, named a SiHKS-BoW descriptor that is related to entropy. Finally, we applied an unsupervised learning algorithm, called MKDSIF-FCM, to conduct the classification task. A dataset consisting of 3D models from 41 Tang tri-color Hu terracotta Eures was utilized to validate the effectiveness of the proposed method. A series of experiments demonstrated that the SiHKS-BoW descriptor along with the MKDSIF-FCM algorithm showed the best classification accuracy, up to 99.41%, which is a solution for an actual case with the absence of category labels and an uneven distribution of different categories of data. The present work promotes the application of virtual reality in digital projects and enriches the content of digital archaeology.


Author(s):  
Nistor Grozavu ◽  
Nicoleta Rogovschi ◽  
Guenael Cabanes ◽  
Andres Troya-Galvis ◽  
Pierre Gancarski

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Jack Goffinet ◽  
Samuel Brudner ◽  
Richard Mooney ◽  
John Pearson

Increases in the scale and complexity of behavioral data pose an increasing challenge for data analysis. A common strategy involves replacing entire behaviors with small numbers of handpicked, domain-specific features, but this approach suffers from several crucial limitations. For example, handpicked features may miss important dimensions of variability, and correlations among them complicate statistical testing. Here, by contrast, we apply the variational autoencoder (VAE), an unsupervised learning method, to learn features directly from data and quantify the vocal behavior of two model species: the laboratory mouse and the zebra finch. The VAE converges on a parsimonious representation that outperforms handpicked features on a variety of common analysis tasks, enables the measurement of moment-by-moment vocal variability on the timescale of tens of milliseconds in the zebra finch, provides strong evidence that mouse ultrasonic vocalizations do not cluster as is commonly believed, and captures the similarity of tutor and pupil birdsong with qualitatively higher fidelity than previous approaches. In all, we demonstrate the utility of modern unsupervised learning approaches to the quantification of complex and high-dimensional vocal behavior.


Author(s):  
Qi Li ◽  
Jieping Ye ◽  
Chandra Kambhamettu

Visual media data such as an image is the raw data representation for many important applications, such as image retrieval (Mikolajczyk & Schmid 2001), video classification (Lin & Hauptmann, 2002), facial expression recognition (Wang & Ahuja 2003), face recognition (Zhao, Chellappa, Phillips & Rosenfeld 2003), etc. Reducing the dimensionality of raw visual media data is highly desirable since high dimensionality may severely degrade the effectiveness and the efficiency of retrieval algorithms. To obtain low-dimensional representation of visual media data, we can start by selecting good low-level features, such as colors, textures, and interest pixels (Swain & Ballard 1991; Gevers & Smeulders 1998; Schmid, Mohr & Bauckhage 2000). Pixels of an image may hold different interest strengths according to a specific filtering or convolution technique. The pixels of high interest strengths are expected to be more repeatable and stable than the pixels of low interest strengths across various imaging conditions, such as rotations, lighting conditions, and scaling. Interest pixel mining aims to detect a set of pixels that have the best repeatability across imaging conditions. (An algorithm for interest pixel mining is called a detector.) Interest pixel mining can be formulated into two steps: i) interest strength assignment via a specific filtering technique; and ii) candidate selection. The second step, candidate selection, plays an important role in preventing the output of interest pixels from being jammed in a small number of image regions in order to achieve best repeatability. Based on interest pixels, various image representations can be derived. A straightforward scheme is to represent an image as a collection of local appearances—the intensities of neighboring pixels—of interest pixels (Schmid & Mohr 1997). By ignoring the spatial relationship of interest pixels, this “unstructured” representation requires no image alignment, i.e., free from establishing pixel-to-pixel correspondence among imaging objects by image transformations such as rotation, translation, and scaling. Furthermore, the unstructured representation is very robust with respect to outlier regions in a retrieval application. However, the retrieval cost under unstructured representation is extremely expensive. In the context of face recognition, feature distribution is introduced to capture both global and local information of faces (Li, Ye & Kambhamettu 2006A). A limitation of feature distribution is the assumption of image alignment. A promising trend on interest pixel based representation is to build graph or tree representation for each image and measure the similarity of two images by the edit distance of their graphs or trees (Zhang & Shasha 1989). But as we will see in the later section, this trend is strongly supported by a recently proposed interest pixel mining method (Li, Ye & Kambhamettu 2008).


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Ruairidh M. Battleday ◽  
Joshua C. Peterson ◽  
Thomas L. Griffiths

Abstract Human categorization is one of the most important and successful targets of cognitive modeling, with decades of model development and assessment using simple, low-dimensional artificial stimuli. However, it remains unclear how these findings relate to categorization in more natural settings, involving complex, high-dimensional stimuli. Here, we take a step towards addressing this question by modeling human categorization over a large behavioral dataset, comprising more than 500,000 judgments over 10,000 natural images from ten object categories. We apply a range of machine learning methods to generate candidate representations for these images, and show that combining rich image representations with flexible cognitive models captures human decisions best. We also find that in the high-dimensional representational spaces these methods generate, simple prototype models can perform comparably to the more complex memory-based exemplar models dominant in laboratory settings.


Sign in / Sign up

Export Citation Format

Share Document