scholarly journals A Manifold-Based Dimension Reduction Algorithm Framework for Noisy Data Using Graph Sampling and Spectral Graph

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-18
Author(s):  
Tao Yang ◽  
Dongmei Fu ◽  
Jintao Meng ◽  
Marcin Mrugalski

This paper proposes a new manifold-based dimension reduction algorithm framework. It can deal with the dimension reduction problem of data with noise and give the dimension reduction results with the deviation values caused by noise interference. Commonly used manifold learning methods are sensitive to noise in the data. Mean computation, a denoising method, is an important step in data preprocessing but leads to a loss of local structural information. In addition, it is difficult to measure the accuracy of the dimension reduction of noisy data. Thus, manifold learning methods often transform the data into an approximately smooth manifold structure; however, practical data from the physical world may not meet the requirements. The proposed framework follows the idea of the localization of manifolds and uses graph sampling to determine some local anchor points from the given data. Subsequently, the specific range of localities is determined using graph spectral analysis, and the density within each local range is estimated to obtain the distribution parameters. Then, manifold-based dimension reduction with distribution parameters is established, and the deviation values in each local range are measured and further extended to all data. Thus, our proposed framework gives a measurement method for deviation caused by noise.

2019 ◽  
Author(s):  
Levi John Wolf ◽  
Elijah Knaap

Dimension reduction is one of the oldest concerns in geographical analysis. Despite significant, longstanding attention in geographical problems, recent advances in non-linear techniques for dimension reduction, called manifold learning, have not been adopted in classic data-intensive geographical problems. More generally, machine learning methods for geographical problems often focus more on applying standard machine learning algorithms to geographic data, rather than applying true "spatially-correlated learning," in the words of Kohonen. As such, we suggest a general way to incentivize geographical learning in machine learning algorithms, and link it to many past methods that introduced geography into statistical techniques. We develop a specific instance of this by specifying two geographical variants of Isomap, a non-linear dimension reduction, or "manifold learning," technique. We also provide a method for assessing what is added by incorporating geography and estimate the manifold's intrinsic geographic scale. To illustrate the concepts and provide interpretable results, we conducting a dimension reduction on geographical and high-dimensional structure of social and economic data on Brooklyn, New York. Overall, this paper's main endeavor--defining and explaining a way to "geographize" many machine learning methods--yields interesting and novel results for manifold learning the estimation of intrinsic geographical scale in unsupervised learning.


Author(s):  
Sheng Ding ◽  
Li Chen ◽  
Jun Li

This chapter addresses the problems in hyperspectral image classification by the methods of local manifold learning methods. A manifold is a nonlinear low dimensional subspace that is supported by data samples. Manifolds can be exploited in developing robust feature extraction and classification methods. The manifold coordinates derived from local manifold learning (LLE, LE) methods for multiple data sets. With a proper selection of parameters and a sufficient number of features, the manifold learning methods using the k-nearest neighborhood classification results produced an efficient and accurate data representation that yields higher classification accuracies than linear dimension reduction (PCA) methods for hyperspectral image.


2021 ◽  
Vol 13 (2) ◽  
pp. 51
Author(s):  
Lili Sun ◽  
Xueyan Liu ◽  
Min Zhao ◽  
Bo Yang

Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.


2021 ◽  
Vol 108 (Supplement_3) ◽  
Author(s):  
J Bote ◽  
J F Ortega-Morán ◽  
C L Saratxaga ◽  
B Pagador ◽  
A Picón ◽  
...  

Abstract INTRODUCTION New non-invasive technologies for improving early diagnosis of colorectal cancer (CRC) are demanded by clinicians. Optical Coherence Tomography (OCT) provides sub-surface structural information and offers diagnosis capabilities of colon polyps, further improved by machine learning methods. Databases of OCT images are necessary to facilitate algorithms development and testing. MATERIALS AND METHODS A database has been acquired from rat colonic samples with a Thorlabs OCT system with 930nm centre wavelength that provides 1.2KHz A-scan rate, 7μm axial resolution in air, 4μm lateral resolution, 1.7mm imaging depth in air, 6mm x 6mm FOV, and 107dB sensitivity. The colon from anaesthetised animals has been excised and samples have been extracted and preserved for ex-vivo analysis with the OCT equipment. RESULTS This database consists of OCT 3D volumes (C-scans) and 2D images (B-scans) of murine samples from: 1) healthy tissue, for ground-truth comparison (18 samples; 66 C-scans; 17,478 B-scans); 2) hyperplastic polyps, obtained from an induced colorectal hyperplastic murine model (47 samples; 153 C-scans; 42,450 B-scans); 3) neoplastic polyps (adenomatous and adenocarcinomatous), obtained from clinically validated Pirc F344/NTac-Apcam1137 rat model (232 samples; 564 C-scans; 158,557 B-scans); and 4) unknown tissue (polyp adjacent, presumably healthy) (98 samples; 157 C-scans; 42,070 B-scans). CONCLUSIONS A novel extensive ex-vivo OCT database of murine CRC model has been obtained and will be openly published for the research community. It can be used for classification/segmentation machine learning methods, for correlation between OCT features and histopathological structures, and for developing new non-invasive in-situ methods of diagnosis of colorectal cancer.


Author(s):  
Diana Mateus ◽  
Christian Wachinger ◽  
Selen Atasoy ◽  
Loren Schwarz ◽  
Nassir Navab

Computer aided diagnosis is often confronted with processing and analyzing high dimensional data. One alternative to deal with such data is dimensionality reduction. This chapter focuses on manifold learning methods to create low dimensional data representations adapted to a given application. From pairwise non-linear relations between neighboring data-points, manifold learning algorithms first approximate the low dimensional manifold where data lives with a graph; then, they find a non-linear map to embed this graph into a low dimensional space. Since the explicit pairwise relations and the neighborhood system can be designed according to the application, manifold learning methods are very flexible and allow easy incorporation of domain knowledge. The authors describe different assumptions and design elements that are crucial to building successful low dimensional data representations with manifold learning for a variety of applications. In particular, they discuss examples for visualization, clustering, classification, registration, and human-motion modeling.


Sign in / Sign up

Export Citation Format

Share Document