scholarly journals ON MULTIDIMENSIONAL SCALING WITH EUCLIDEAN AND CITY BLOCK METRICS

2006 ◽  
Vol 12 (1) ◽  
pp. 69-75 ◽  
Author(s):  
Antanas Žilinskas ◽  
Julius Žilinskas

Experimental sciences collect large amounts of data. Different techniques are available for information elicitation from data. Frequently statistical analysis should be combined with the experience and intuition of researchers. Human heuristic abilities are developed and oriented to patterns in space of dimensionality up to 3. Multidimensional scaling (MDS) addresses the problem how objects represented by proximity data can be represented by points in low dimensional space. MDS methods are implemented as the optimization of a stress function measuring fit of the proximity data by the distances between the respective points. Since the optimization problem is multimodal, a global optimization method should be used. In the present paper a combination of an evolutionary metaheuristic algorithm with a local search algorithm is used. The experimental results show the influence of metrics defining distances in the considered spaces on the results of multidimensional scaling. Data sets with known and unknown structure and different dimensionality (up to 512 variables) have been visualized.

2003 ◽  
Vol 2 (1) ◽  
pp. 68-77 ◽  
Author(s):  
Alistair Morrison ◽  
Greg Ross ◽  
Matthew Chalmers

The term ‘proximity data’ refers to data sets within which it is possible to assess the similarity of pairs of objects. Multidimensional scaling (MDS) is applied to such data and attempts to map high-dimensional objects onto low-dimensional space through the preservation of these similarity relations. Standard MDS techniques have in the past suffered from high computational complexity and, as such, could not feasibly be applied to data sets over a few thousand objects in size. Through a novel hybrid approach based upon stochastic sampling, interpolation and spring models, we have designed an algorithm running in O( N√N). Using Chalmers’ 1996 O( N2) spring model as a benchmark for the evaluation of our technique, we compare layout quality and run times using sets of synthetic and real data. Our algorithm executes significantly faster than Chalmers’ 1996 algorithm, while producing superior layouts. In reducing complexity and run time, we allow the visualisation of data sets of previously infeasible size. Our results indicate that our method is a solid foundation for interactive and visual exploration of data.


2021 ◽  
Author(s):  
Stefan Canzar ◽  
Van Hoan Do ◽  
Slobodan Jelic ◽  
Soeren Laue ◽  
Domagoj Matijevic ◽  
...  

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a neural network based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.


2019 ◽  
Vol 15 (3) ◽  
pp. 346-358
Author(s):  
Luciano Barbosa

Purpose Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that learns different representations of Web entities for entity resolution. Design/methodology/approach To match Web entities, the proposed network learns the following representations of entities: embeddings, which are vector representations of the words in the entities in a low-dimensional space; convolutional vectors from a convolutional layer, which capture short-distance patterns in word sequences in the entities; and bag-of-word vectors, created by a bow layer that learns weights for words in the vocabulary based on the task at hand. Given a pair of entities, the similarity between their learned representations is used as a feature to a binary classifier that identifies a possible match. In addition to those features, the classifier also uses a modification of inverse document frequency for pairs, which identifies discriminative words in pairs of entities. Findings The proposed approach was evaluated in two commercial and two academic entity resolution benchmarking data sets. The results have shown that the proposed strategy outperforms previous approaches in the commercial data sets, which are more challenging, and have similar results to its competitors in the academic data sets. Originality/value No previous work has used a single deep learning framework to learn different representations of Web entities for entity resolution.


2017 ◽  
Vol 29 (4) ◽  
pp. 1053-1102 ◽  
Author(s):  
Hossein Soleimani ◽  
David J. Miller

Many classification tasks require both labeling objects and determining label associations for parts of each object. Example applications include labeling segments of images or determining relevant parts of a text document when the training labels are available only at the image or document level. This task is usually referred to as multi-instance (MI) learning, where the learner typically receives a collection of labeled (or sometimes unlabeled) bags, each containing several segments (instances). We propose a semisupervised MI learning method for multilabel classification. Most MI learning methods treat instances in each bag as independent and identically distributed samples. However, in many practical applications, instances are related to each other and should not be considered independent. Our model discovers a latent low-dimensional space that captures structure within each bag. Further, unlike many other MI learning methods, which are primarily developed for binary classification, we model multiple classes jointly, thus also capturing possible dependencies between different classes. We develop our model within a semisupervised framework, which leverages both labeled and, typically, a larger set of unlabeled bags for training. We develop several efficient inference methods for our model. We first introduce a Markov chain Monte Carlo method for inference, which can handle arbitrary relations between bag labels and instance labels, including the standard hard-max MI assumption. We also develop an extension of our model that uses stochastic variational Bayes methods for inference, and thus scales better to massive data sets. Experiments show that our approach outperforms several MI learning and standard classification methods on both bag-level and instance-level label prediction. All code for replicating our experiments is available from https://github.com/hsoleimani/MLTM .


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Shuxia Wang

Aiming at the problem that the indoor target location algorithm based on received signal strength (RSSI) in the IoT environment is susceptible to interference and large fluctuations, an indoor localization algorithm combining RSSI and nonmetric multidimensional scaling (NMDS) is proposed (RSSI- NMDS). First, Gaussian filtering is performed on the received plurality of sets of RSSI signals to eliminate abnormal fluctuations of the RSSI. Then, based on the RSSI data, the dissimilarity matrix is constructed, and the relative coordinates of the nodes in the low-dimensional space are obtained by NMDS solution. Finally, according to the actual coordinates of the reference node, the coordinate transformation is performed by the planar four-parameter model, and the position of the node in the actual coordinate system is obtained. The simulation results show that the proposed method has strong anti-RSSI perturbation and high positioning accuracy.


2002 ◽  
Vol 14 (5) ◽  
pp. 1195-1232 ◽  
Author(s):  
Douglas L. T. Rohde

Multidimensional scaling (MDS) is the process of transforming a set of points in a high-dimensional space to a lower-dimensional one while preserving the relative distances between pairs of points. Although effective methods have been developed for solving a variety of MDS problems, they mainly depend on the vectors in the lower-dimensional space having real-valued components. For some applications, the training of neural networks in particular, it is preferable or necessary to obtain vectors in a discrete, binary space. Unfortunately, MDS into a low-dimensional discrete space appears to be a significantly harder problem than MDS into a continuous space. This article introduces and analyzes several methods for performing approximately optimized binary MDS.


2006 ◽  
Vol 12 (4) ◽  
pp. 353-359 ◽  
Author(s):  
Antanas Žilinskas ◽  
Julius Žilinskas

Multidimensional scaling addresses the problem of representation of objects specified by proximity data by points in low dimensional embedding space. The problem is reduced to optimization of an accuracy measure of fit of the proximity data by the distances between the respective points. Three‐dimensional embedding space is considered in the present paper. Images of data of different dimensionality are discussed as well as dependence of visualization accuracy on dimensionality of embedding space and complexity of data.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Tie-Jun Li ◽  
Chih-Cheng Chen ◽  
Jian-jun Liu ◽  
Gui-fang Shao ◽  
Christopher Chun Ki Chan

We apply time-domain spectroscopy (THz) imaging technology to perform nondestructive detection on three industrial ceramic matrix composite (CMC) samples and one silicon slice with defects. In terms of spectrum recognition, a low-resolution THz spectrum image results in an ineffective recognition on sample defect features. Therefore, in this article, we propose a spectrum clustering recognition model based on t-distribution stochastic neighborhood embedding (t-SNE) to address this ineffective sample defect recognition. Firstly, we propose a model to recognize a reduced dimensional clustering of different spectrums drawn from the imaging spectrum data sets, in order to judge whether a sample includes a feature indicating a defect or not in a low-dimensional space. Second, we improve computation efficiency by mapping spectrum data samples from high-dimensional space to low-dimensional space by the use of a manifold learning algorithm (t-SNE). Finally, to achieve a visible observation of sample features in low-dimensional space, we use a conditional probability distribution to measure the distance invariant similarity. Comparative experiments indicate that our model can judge the existence of sample defect features or not through spectrum clustering, as a predetection process for image analysis.


Author(s):  
Lukas Miklautz ◽  
Lena G. M. Bauer ◽  
Dominik Mautz ◽  
Sebastian Tschiatschek ◽  
Christian Böhm ◽  
...  

Deep clustering techniques combine representation learning with clustering objectives to improve their performance. Among existing deep clustering techniques, autoencoder-based methods are the most prevalent ones. While they achieve promising clustering results, they suffer from an inherent conflict between preserving details, as expressed by the reconstruction loss, and finding similar groups by ignoring details, as expressed by the clustering loss. This conflict leads to brittle training procedures, dependence on trade-off hyperparameters and less interpretable results. We propose our framework, ACe/DeC, that is compatible with Autoencoder Centroid based Deep Clustering methods and automatically learns a latent representation consisting of two separate spaces. The clustering space captures all cluster-specific information and the shared space explains general variation in the data. This separation resolves the above mentioned conflict and allows our method to learn both detailed reconstructions and cluster specific abstractions. We evaluate our framework with extensive experiments to show several benefits: (1) cluster performance – on various data sets we outperform relevant baselines; (2) no hyperparameter tuning – this improved performance is achieved without introducing new clustering specific hyperparameters; (3) interpretability – isolating the cluster specific information in a separate space is advantageous for data exploration and interpreting the clustering results; and (4) dimensionality of the embedded space – we automatically learn a low dimensional space for clustering. Our ACe/DeC framework isolates cluster information, increases stability and interpretability, while improving cluster performance.


2020 ◽  
Author(s):  
Eric Johnson ◽  
William Kath ◽  
Madhav Mani

AbstractSingle-cell RNA sequencing (scRNA-seq) experiments often measure thousands of genes, making them high-dimensional data sets. As a result, dimensionality reduction (DR) algorithms such as t-SNE and UMAP are necessary for data visualization. However, the use of DR methods in other tasks, such as for cell-type detection or developmental trajectory reconstruction, is stymied by unquantified non-linear and stochastic deformations in the mapping from the high- to low-dimensional space. In this work, we present a statistical framework for the quantification of embedding quality so that DR algorithms can be used with confidence in unsupervised applications. Specifically, this framework generates a local assessment of embedding quality by statistically integrating information across embeddings. Furthermore, the approach separates biological signal from noise via the construction of an empirical null hypothesis. Using this approach on scRNA-seq data reveals biologically relevant structure and suggests a novel “spectral” decomposition of data. We apply the framework to several data sets and DR methods, illustrating its robustness and flexibility as well as its widespread utility in several quantitative applications.


Sign in / Sign up

Export Citation Format

Share Document