Methods for Binary Multidimensional Scaling

2002 ◽  
Vol 14 (5) ◽  
pp. 1195-1232 ◽  
Author(s):  
Douglas L. T. Rohde

Multidimensional scaling (MDS) is the process of transforming a set of points in a high-dimensional space to a lower-dimensional one while preserving the relative distances between pairs of points. Although effective methods have been developed for solving a variety of MDS problems, they mainly depend on the vectors in the lower-dimensional space having real-valued components. For some applications, the training of neural networks in particular, it is preferable or necessary to obtain vectors in a discrete, binary space. Unfortunately, MDS into a low-dimensional discrete space appears to be a significantly harder problem than MDS into a continuous space. This article introduces and analyzes several methods for performing approximately optimized binary MDS.

2015 ◽  
Vol 7 (3) ◽  
pp. 275-279 ◽  
Author(s):  
Agnė Dzidolikaitė

The paper analyzes global optimization problem. In order to solve this problem multidimensional scaling algorithm is combined with genetic algorithm. Using multidimensional scaling we search for multidimensional data projections in a lower-dimensional space and try to keep dissimilarities of the set that we analyze. Using genetic algorithms we can get more than one local solution, but the whole population of optimal points. Different optimal points give different images. Looking at several multidimensional data images an expert can notice some qualities of given multidimensional data. In the paper genetic algorithm is applied for multidimensional scaling and glass data is visualized, and certain qualities are noticed. Analizuojamas globaliojo optimizavimo uždavinys. Jis apibrėžiamas kaip netiesinės tolydžiųjų kintamųjų tikslo funkcijos optimizavimas leistinojoje srityje. Optimizuojant taikomi įvairūs algoritmai. Paprastai taikant tikslius algoritmus randamas tikslus sprendinys, tačiau tai gali trukti labai ilgai. Dažnai norima gauti gerą sprendinį per priimtiną laiko tarpą. Tokiu atveju galimi kiti – euristiniai, algoritmai, kitaip dar vadinami euristikomis. Viena iš euristikų yra genetiniai algoritmai, kopijuojantys gyvojoje gamtoje vykstančią evoliuciją. Sudarant algoritmus naudojami evoliuciniai operatoriai: paveldimumas, mutacija, selekcija ir rekombinacija. Taikant genetinius algoritmus galima rasti pakankamai gerus sprendinius tų uždavinių, kuriems nėra tikslių algoritmų. Genetiniai algoritmai taip pat taikytini vizualizuojant duomenis daugiamačių skalių metodu. Taikant daugiamates skales ieškoma daugiamačių duomenų projekcijų mažesnio skaičiaus matmenų erdvėje siekiant išsaugoti analizuojamos aibės panašumus arba skirtingumus. Taikant genetinius algoritmus gaunamas ne vienas lokalusis sprendinys, o visa optimumų populiacija. Skirtingi optimumai atitinka skirtingus vaizdus. Matydamas kelis daugiamačių duomenų variantus, ekspertas gali įžvelgti daugiau daugiamačių duomenų savybių. Straipsnyje genetinis algoritmas pritaikytas daugiamatėms skalėms. Parodoma, kad daugiamačių skalių algoritmą galima kombinuoti su genetiniu algoritmu ir panaudoti daugiamačiams duomenims vizualizuoti.


Author(s):  
Wen-Ji Zhou ◽  
Yang Yu ◽  
Min-Ling Zhang

In multi-label classification tasks, labels are commonly related with each other. It has been well recognized that utilizing label relationship is essential to multi-label learning. One way to utilizing label relationship is to map labels to a lower-dimensional space of uncorrelated labels, where the relationship could be encoded in the mapping. Previous linear mapping methods commonly result in regression subproblems in the lower-dimensional label space. In this paper, we disclose that mappings to a low-dimensional multi-label regression problem can be worse than mapping to a classification problem, since regression requires more complex model than classification. We then propose the binary linear compression (BILC) method that results in a binary label space, leading to classification subproblems. Experiments on several multi-label datasets show that, employing classification in the embedded space results in much simpler models than regression, leading to smaller structure risk. The proposed methods are also shown to be superior to some state-of-the-art approaches.


2014 ◽  
Vol 10 (S306) ◽  
pp. 68-71
Author(s):  
Giuseppe Vinci ◽  
Peter Freeman ◽  
Jeffrey Newman ◽  
Larry Wasserman ◽  
Christopher Genovese

AbstractThe incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information. Dictionary learning and sparse coding allow us to reduce the high dimensional space of shapes into a manageable low dimensional continuous vector space. Statistical inference can be done in the reduced space via probability distribution estimation and manifold estimation.


2021 ◽  
Vol 11 (15) ◽  
pp. 6963
Author(s):  
Jan Y. K. Chan ◽  
Alex Po Leung ◽  
Yunbo Xie

Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space Rd before the kernel k-means clustering motivated by upper error bounds. With random projections, previous work on bounds for dot products and an improved bound for kernel methods are considered for kernel k-means. The complexities for both kernel k-means with Lloyd’s algorithm and centroid initialization with k-means++ are known to be O(nkD) and Θ(nkD), respectively, with n being the number of data points, the dimensionality of input feature vectors D and the number of clusters k. The proposed method reduces the computational complexity for the kernel computation of kernel k-means from O(n2D) to O(n2d) and the subsequent computation for k-means with Lloyd’s algorithm and centroid initialization from O(nkD) to O(nkd). Our experiments demonstrate that the speed-up of the clustering method with reduced dimensionality d=200 is 2 to 26 times with very little performance degradation (less than one percent) in general.


2021 ◽  
pp. 1-12
Author(s):  
Jian Zheng ◽  
Jianfeng Wang ◽  
Yanping Chen ◽  
Shuping Chen ◽  
Jingjin Chen ◽  
...  

Neural networks can approximate data because of owning many compact non-linear layers. In high-dimensional space, due to the curse of dimensionality, data distribution becomes sparse, causing that it is difficulty to provide sufficient information. Hence, the task becomes even harder if neural networks approximate data in high-dimensional space. To address this issue, according to the Lipschitz condition, the two deviations, i.e., the deviation of the neural networks trained using high-dimensional functions, and the deviation of high-dimensional functions approximation data, are derived. This purpose of doing this is to improve the ability of approximation high-dimensional space using neural networks. Experimental results show that the neural networks trained using high-dimensional functions outperforms that of using data in the capability of approximation data in high-dimensional space. We find that the neural networks trained using high-dimensional functions more suitable for high-dimensional space than that of using data, so that there is no need to retain sufficient data for neural networks training. Our findings suggests that in high-dimensional space, by tuning hidden layers of neural networks, this is hard to have substantial positive effects on improving precision of approximation data.


Author(s):  
Jian Zheng ◽  
Jianfeng Wang ◽  
Yanping Chen ◽  
Shuping Chen ◽  
Jingjin Chen ◽  
...  

2019 ◽  
Vol 43 (4) ◽  
pp. 653-660 ◽  
Author(s):  
M.V. Gashnikov

Adaptive multidimensional signal interpolators are developed. These interpolators take into account the presence and direction of boundaries of flat signal regions in each local neighborhood based on the automatic selection of the interpolating function for each signal sample. The selection of the interpolating function is performed by a parameterized rule, which is optimized in a parametric lower dimensional space. The dimension reduction is performed using rank filtering of local differences in the neighborhood of each signal sample. The interpolating functions of adaptive interpolators are written for the multidimensional, three-dimensional and two-dimensional cases. The use of adaptive interpolators in the problem of compression of multidimensional signals is also considered. Results of an experimental study of adaptive interpolators for real multidimensional signals of various types are presented.


Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 218 (1) ◽  
pp. 45-56 ◽  
Author(s):  
C Nur Schuba ◽  
Jonathan P Schuba ◽  
Gary G Gray ◽  
Richard G Davy

SUMMARY We present a new approach to estimate 3-D seismic velocities along a target interface. This approach uses an artificial neural network trained with user-supplied geological and geophysical input features derived from both a 3-D seismic reflection volume and a 2-D wide-angle seismic profile that were acquired from the Galicia margin, offshore Spain. The S-reflector detachment fault was selected as the interface of interest. The neural network in the form of a multilayer perceptron was employed with an autoencoder and a regression layer. The autoencoder was trained using a set of input features from the 3-D reflection volume. This set of features included the reflection amplitude and instantaneous frequency at the interface of interest, time-thicknesses of overlying major layers and ratios of major layer time-thicknesses to the total time-depth of the interface. The regression model was trained to estimate the seismic velocities of the crystalline basement and mantle from these features. The ‘true’ velocities were obtained from an independent full-waveform inversion along a 2-D wide-angle seismic profile, contained within the 3-D data set. The autoencoder compressed the vector of inputs into a lower dimensional space, then the regression layer was trained in the lower dimensional space to estimate velocities above and below the targeted interface. This model was trained on 50 networks with different initializations. A total of 37 networks reached minimum achievable error of 2 per cent. The low standard deviation (<300  m s−1) between different networks and low errors on velocity estimations demonstrate that the input features were sufficient to capture variations in the velocity above and below the targeted S-reflector. This regression model was then applied to the 3-D reflection volume where velocities were predicted over an area of ∼400 km2. This approach provides an alternative way to obtain velocities across a 3-D seismic survey from a deep non-reflective lithology (e.g. upper mantle) , where conventional reflection velocity estimations can be unreliable.


2020 ◽  
pp. 105971232092291
Author(s):  
Guido Schillaci ◽  
Antonio Pico Villalpando ◽  
Verena V Hafner ◽  
Peter Hanappe ◽  
David Colliaux ◽  
...  

This work presents an architecture that generates curiosity-driven goal-directed exploration behaviours for an image sensor of a microfarming robot. A combination of deep neural networks for offline unsupervised learning of low-dimensional features from images and of online learning of shallow neural networks representing the inverse and forward kinematics of the system have been used. The artificial curiosity system assigns interest values to a set of pre-defined goals and drives the exploration towards those that are expected to maximise the learning progress. We propose the integration of an episodic memory in intrinsic motivation systems to face catastrophic forgetting issues, typically experienced when performing online updates of artificial neural networks. Our results show that adopting an episodic memory system not only prevents the computational models from quickly forgetting knowledge that has been previously acquired but also provides new avenues for modulating the balance between plasticity and stability of the models.


Sign in / Sign up

Export Citation Format

Share Document