nonlinear dimension
Recently Published Documents


TOTAL DOCUMENTS

64
(FIVE YEARS 15)

H-INDEX

12
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Hong-He Xu ◽  
Zhi-Bin Niu ◽  
Yan-Sen Chen ◽  
Xuan Ma ◽  
Xiao-Jing Tong ◽  
...  

Abstract. Multi- elemental and -dimensional data are more and more important during the development of data-driven research, as is the case in modern palaeontology, in which visual examination, by experts or someday the artificial intelligence, to every fossil specimen acts a crucial and fundamental role. We here release an integrated image dataset of 113 Ordovician to Silurian graptolite species or subspecies that are significant in global stratigraphy and shale gas exploration. The dataset contains 1550 high-resolution graptolite specimen images and scientific information related to the specimen, e.g., every specimen's taxonomic, geologic, geographic, and related references. We develop a tool, FSIDvis (Fossil Specimen Image Dataset Visualiser), to facilitate the human-interactive exploration of the rich-attribution image dataset. A nonlinear dimension reduction technique, t-SNE (t-Distributed Stochastic Neighbor Embedding), is employed to project the images into the two-dimensional space to visualise and explore the similarities. Our dataset potentially contributes to the analysis of the global biostratigraphic correlations and improves the shale gas exploration efficiency by developing an image-based automated classification model. All images are available from https://doi.org/10.5281/zenodo.5205216 (Xu, 2021).


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Sajjad Jahanbakhsh Gudakahriz ◽  
Amir Masoud Eftekhari Moghadam ◽  
Fariborz Mahmoudi

Nowadays, opinion texts are quickly published on websites and social networks by various users in the form of short texts and also in high volumes and various fields. Because these texts reflect the opinions of many users, their processing and analysis, such as clustering, can be very useful in a variety of applications including politics, industry, commerce, and economics. High dimensions of the text representation decrease efficiency of clustering, and an effective solution for this challenge is reducing dimensions of texts. Manifold learning is a powerful tool for nonlinear dimension reduction of high-dimensional data. Therefore, in this paper, for increasing efficiency of opinion texts clustering, by manifold learning, dimensions of the represented opinion texts are reduced based on sentiment and semantics, and their intrinsic dimensions are extracted. Then, the clustering algorithm is applied to dimension-reduced opinion texts. The proposed approach helps us to cluster opinion texts with simultaneous consideration of sentiment and semantics, which has received very little attention in the previous works. This type of clustering helps users of opinion texts to obtain more useful information from texts and also provides more accurate summaries in applications, such as the summarization of opinion texts. Experimental results on three datasets show better performance of the proposed approach on opinion texts in terms of important measures for evaluating clustering efficiency. An improvement of about 9% is observed in terms of accuracy on the third dataset and clustering based on sentiment and semantics.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Xiao Liao ◽  
WeiJia Wang ◽  
Wei Wang ◽  
Chong Liang

Image matching is a method of matching by analyzing the gray scale and texture information of the reference image and the image to be matched. Firstly, the scale invariant feature transform (SIFT) algorithm has long descriptor time and poor real time, a nonlinear dimension reduction method (LLE) based on local linear embedding is proposed to preserve the nonlinear information in the original data space as much as possible, shorten the running time of the algorithm, and improve the matching accuracy. Second, aiming at the problem that the Euclidean distance takes a large amount of calculation in the matching process, Manhattan distance is proposed to calculate the similarity between the reference image and the image to be matched, so as to further reduce the algorithm time. Through the improved LLE-SIFT algorithm, experimental results show that the algorithm has a high matching rate and improves the matching speed.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Miao Zhang ◽  
Yiwen Liu ◽  
Hua Zhou ◽  
Joseph Watkins ◽  
Jin Zhou

Abstract Background Low-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce to analyze population structure of low-depth sequencing data. Results The method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common. Conclusions We apply to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The package is available on https://github.com/yiwenstat/MCPCA_PopGen.


Author(s):  
S. Schmitz ◽  
U. Weidner ◽  
H. Hammer ◽  
A. Thiele

Abstract. In this paper, the nonlinear dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) is investigated to visualize information contained in high dimensional feature representations of Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) data. Based on polarimetric parameters, target decomposition methods and interferometric coherences a wide range of features is extracted that spans the high dimensional feature space. UMAP is applied to determine a representation of the data in 2D and 3D euclidean space, preserving local and global structures of the data and still suited for classification. The performance of UMAP in terms of generating expressive visualizations is evaluated on PolInSAR data acquired by the F-SAR sensor and compared to that of Principal Component Analysis (PCA), Laplacian Eigenmaps (LE) and t-distributed Stochastic Neighbor embedding (t-SNE). For this purpose, a visual analysis of 2D embeddings is performed. In addition, a quantitative analysis is provided for evaluating the preservation of information in low dimensional representations with respect to separability of different land cover classes. The results show that UMAP exceeds the capability of PCA and LE in these regards and is competitive with t-SNE.


2021 ◽  
Vol 17 (3) ◽  
pp. e1008741
Author(s):  
Ya-Wei Eileen Lin ◽  
Tal Shnitzer ◽  
Ronen Talmon ◽  
Franz Villarroel-Espindola ◽  
Shruti Desai ◽  
...  

Imaging Mass Cytometry (IMC) combines laser ablation and mass spectrometry to quantitate metal-conjugated primary antibodies incubated in intact tumor tissue slides. This strategy allows spatially-resolved multiplexing of dozens of simultaneous protein targets with 1μm resolution. Each slide is a spatial assay consisting of high-dimensional multivariate observations (m-dimensional feature space) collected at different spatial positions and capturing data from a single biological sample or even representative spots from multiple samples when using tissue microarrays. Often, each of these spatial assays could be characterized by several regions of interest (ROIs). To extract meaningful information from the multi-dimensional observations recorded at different ROIs across different assays, we propose to analyze such datasets using a two-step graph-based approach. We first construct for each ROI a graph representing the interactions between the m covariates and compute an m dimensional vector characterizing the steady state distribution among features. We then use all these m-dimensional vectors to construct a graph between the ROIs from all assays. This second graph is subjected to a nonlinear dimension reduction analysis, retrieving the intrinsic geometric representation of the ROIs. Such a representation provides the foundation for efficient and accurate organization of the different ROIs that correlates with their phenotypes. Theoretically, we show that when the ROIs have a particular bi-modal distribution, the new representation gives rise to a better distinction between the two modalities compared to the maximum a posteriori (MAP) estimator. We applied our method to predict the sensitivity to PD-1 axis blockers treatment of lung cancer subjects based on IMC data, achieving 97.3% average accuracy on two IMC datasets. This serves as empirical evidence that the graph of graphs approach enables us to integrate multiple ROIs and the intra-relationships between the features at each ROI, giving rise to an informative representation that is strongly associated with the phenotypic state of the entire image.


Author(s):  
Ahmed Lasisi ◽  
Nii Attoh-Okine

Track Geometry parameters from rail track inspection are regulated within unique safety limits for different track classes. This paper focuses on developing an index that combines safety and track quality because of the inefficiency of having corrective maintenance activities between routine maintenance cycles when federal geometry limits are violated. This combination is achievable by summarizing multivariate track geometry parameters, as an improvement to previous linear approaches to address the problem of inefficient track geometry maintenance programs. The use of nonlinear dimension reduction (T-Stochastic Neighbor Embedding-T-SNE) for Hybrid Track Quality Index development, and the influence of time-based parameters on track quality is evaluated in this study. Results show that probability of geometry defects are correlated with principal components but T-SNE had the best prediction on train-test splits despite its poor performance on a blind validation set. The absence of observable correlation between track geometry and acceleration data calls for further investigation.


2020 ◽  
Vol 1 (3) ◽  
Author(s):  
Mahwish Yousaf ◽  
Tanzeel U. Rehman ◽  
Li Jing

Sign in / Sign up

Export Citation Format

Share Document