Semantic Visualization with Neighborhood Graph Regularization

Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. While aiming for a good fit between the model parameters and the observed data, previous approaches have not considered the local consistency among data instances. We consider the problem of semantic visualization by jointly modeling topics and visualization on the intrinsic document manifold, modeled using a neighborhood graph. Each document has both a topic distribution and visualization coordinate. Specifically, we propose an unsupervised probabilistic model, called SEMAFORE, which aims to preserve the manifold in the lower-dimensional spaces through a neighborhood regularization framework designed for the semantic visualization task. To validate the efficacy of SEMAFORE, our comprehensive experiments on a number of real-life text datasets of news articles and Web pages show that the proposed methods outperform the state-of-the-art baselines on objective evaluation metrics.

Download Full-text

A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

Computational Intelligence and Neuroscience ◽

10.1155/2017/8501683 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 14

Author(s):

Hongchao Song ◽

Zhuqing Jiang ◽

Aidong Men ◽

Bo Yang

Keyword(s):

Anomaly Detection ◽

Nearest Neighbor ◽

Dimensional Space ◽

High Dimensional Data ◽

Real Life ◽

Detection Methods ◽

High Dimensional ◽

Detection Accuracy ◽

Detection Model ◽

Anomaly Detector

Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.

Download Full-text

A CLUSTER ANALYSIS AND DECISION TREE HYBRID APPROACH IN DATA MINING TO DESCRIBING TAX AUDIT

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v4i1c.3111 ◽

2013 ◽

Vol 4 (1) ◽

pp. 114-119 ◽

Cited By ~ 1

Author(s):

Richa Dhiman ◽

Sheveta Vashisht ◽

Kapil Sharma

Keyword(s):

Decision Tree ◽

Dimensional Space ◽

Hybrid Approach ◽

Real Life ◽

Synthetic Data ◽

Optimal Decision ◽

High Dimensional ◽

Clustering And Classification ◽

Tax Audit ◽

Decision Tree Method

In this research, we use clustering and classification methods to mine the data of tax and extract the information about tax audit by using hybrid algorithms K-MEANS, SOM and HAC algorithms from clustering and CHAID and C4.5 algorithms from decision tree and it produce the better results than the traditional algorithms and compare it by applying on tax dataset. Clustering method will use for make the clusters of similar groups to extract the easily features or properties and decision tree method will use for choose to decide the optimal decision to extract the valuable information from samples of tax datasets? This comparison is able to find clusters in large high dimensional spaces efficiently. It is suitable for clustering in the full dimensional space as well as in subspaces. Experiments on both synthetic data and real-life data show that the technique is effective and also scales well for large high dimensional datasets

Download Full-text

Extension of the Spatial PDI Model to Three Dimensions

Perceptual and Motor Skills ◽

10.2466/pms.1997.84.1.176 ◽

1997 ◽

Vol 84 (1) ◽

pp. 176-178

Author(s):

Frank O'Brien

Keyword(s):

Population Density ◽

Dimensional Space ◽

Finite Lattice ◽

Three Dimensional ◽

Upper Bounds ◽

Three Dimensions ◽

Lower And Upper Bounds ◽

Density Index ◽

Three Dimensional Space

The author's population density index ( PDI) model is extended to three-dimensional distributions. A derived formula is presented that allows for the calculation of the lower and upper bounds of density in three-dimensional space for any finite lattice.

Download Full-text

A Classification Algorithm with Reject Option Based on Adaptive Minimum Spanning Tree Covering Model in High-dimensional Space

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2009.00021 ◽

2011 ◽

Vol 32 (12) ◽

pp. 2895-2900 ◽

Cited By ~ 1

Author(s):

Zheng-ping Hu ◽

Cheng-qian Xu ◽

Qian-wen Jia

Keyword(s):

Spanning Tree ◽

Minimum Spanning Tree ◽

Dimensional Space ◽

Classification Algorithm ◽

High Dimensional ◽

High Dimensional Space ◽

Reject Option ◽

Covering Model

Download Full-text

Neural networks trained with high-dimensional functions approximation data in high-dimensional space

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211417 ◽

2021 ◽

pp. 1-12

Author(s):

Jian Zheng ◽

Jianfeng Wang ◽

Yanping Chen ◽

Shuping Chen ◽

Jingjin Chen ◽

...

Keyword(s):

Neural Networks ◽

Dimensional Space ◽

Data Distribution ◽

High Dimensional ◽

Sufficient Information ◽

Sufficient Data ◽

High Dimensional Space ◽

Positive Effects ◽

The Neural Networks ◽

Using Data

Neural networks can approximate data because of owning many compact non-linear layers. In high-dimensional space, due to the curse of dimensionality, data distribution becomes sparse, causing that it is difficulty to provide sufficient information. Hence, the task becomes even harder if neural networks approximate data in high-dimensional space. To address this issue, according to the Lipschitz condition, the two deviations, i.e., the deviation of the neural networks trained using high-dimensional functions, and the deviation of high-dimensional functions approximation data, are derived. This purpose of doing this is to improve the ability of approximation high-dimensional space using neural networks. Experimental results show that the neural networks trained using high-dimensional functions outperforms that of using data in the capability of approximation data in high-dimensional space. We find that the neural networks trained using high-dimensional functions more suitable for high-dimensional space than that of using data, so that there is no need to retain sufficient data for neural networks training. Our findings suggests that in high-dimensional space, by tuning hidden layers of neural networks, this is hard to have substantial positive effects on improving precision of approximation data.

Download Full-text

Computing Expectiles Using k-Nearest Neighbours Approach

Symmetry ◽

10.3390/sym13040645 ◽

2021 ◽

Vol 13 (4) ◽

pp. 645

Author(s):

Muhammad Farooq ◽

Sehrish Sarfraz ◽

Christophe Chesneau ◽

Mahmood Ul Hassan ◽

Muhammad Ali Raza ◽

...

Keyword(s):

Computational Cost ◽

Real Life ◽

Distance Measures ◽

Computational Time ◽

High Dimensional ◽

Test Error ◽

Nearest Neighbours ◽

Comparable Performance ◽

Asymmetric Least Squares ◽

Low Computational Cost

Expectiles have gained considerable attention in recent years due to wide applications in many areas. In this study, the k-nearest neighbours approach, together with the asymmetric least squares loss function, called ex-kNN, is proposed for computing expectiles. Firstly, the effect of various distance measures on ex-kNN in terms of test error and computational time is evaluated. It is found that Canberra, Lorentzian, and Soergel distance measures lead to minimum test error, whereas Euclidean, Canberra, and Average of (L1,L∞) lead to a low computational cost. Secondly, the performance of ex-kNN is compared with existing packages er-boost and ex-svm for computing expectiles that are based on nine real life examples. Depending on the nature of data, the ex-kNN showed two to 10 times better performance than er-boost and comparable performance with ex-svm regarding test error. Computationally, the ex-kNN is found two to five times faster than ex-svm and much faster than er-boost, particularly, in the case of high dimensional data.

Download Full-text

Cyclostationary signals analysis methods based on high-dimensional space transformation under impulsive noise.

IEEE Signal Processing Letters ◽

10.1109/lsp.2021.3104996 ◽

2021 ◽

pp. 1-1

Author(s):

Qiancheng Zhang ◽

Hongbing Ji ◽

Yan Jin

Keyword(s):

Impulsive Noise ◽

Dimensional Space ◽

High Dimensional ◽

High Dimensional Space ◽

Analysis Methods ◽

Space Transformation ◽

Cyclostationary Signals

Download Full-text

Gaussian Processes Proxy Model with Latent Variable Models and Variogram-Based Sensitivity Analysis for Assisted History Matching

Energies ◽

10.3390/en13174290 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4290

Author(s):

Dongmei Zhang ◽

Yuyang Zhang ◽

Bohou Jiang ◽

Xinwei Jiang ◽

Zhijiang Kang

Keyword(s):

Sensitivity Analysis ◽

Gaussian Processes ◽

Latent Variable ◽

History Matching ◽

Latent Variable Models ◽

High Dimensional ◽

Model Parameters ◽

Variable Model ◽

Assisted History Matching ◽

Proxy Models

Reservoir history matching is a well-known inverse problem for production prediction where enormous uncertain reservoir parameters of a reservoir numerical model are optimized by minimizing the misfit between the simulated and history production data. Gaussian Process (GP) has shown promising performance for assisted history matching due to the efficient nonparametric and nonlinear model with few model parameters to be tuned automatically. Recently introduced Gaussian Processes proxy models and Variogram Analysis of Response Surface-based sensitivity analysis (GP-VARS) uses forward and inverse Gaussian Processes (GP) based proxy models with the VARS-based sensitivity analysis to optimize the high-dimensional reservoir parameters. However, the inverse GP solution (GPIS) in GP-VARS are unsatisfactory especially for enormous reservoir parameters where the mapping from low-dimensional misfits to high-dimensional uncertain reservoir parameters could be poorly modeled by GP. To improve the performance of GP-VARS, in this paper we propose the Gaussian Processes proxy models with Latent Variable Models and VARS-based sensitivity analysis (GPLVM-VARS) where Gaussian Processes Latent Variable Model (GPLVM)-based inverse solution (GPLVMIS) instead of GP-based GPIS is provided with the inputs and outputs of GPIS reversed. The experimental results demonstrate the effectiveness of the proposed GPLVM-VARS in terms of accuracy and complexity. The source code of the proposed GPLVM-VARS is available at https://github.com/XinweiJiang/GPLVM-VARS.

Download Full-text

Research on feeder network design: a case study of feeder service for the port of Kotka

European Transport Research Review ◽

10.1186/s12544-020-00450-6 ◽

2020 ◽

Vol 12 (1) ◽

Author(s):

Yisong Lin ◽

Xuefeng Wang ◽

Hao Hu ◽

Hui Zhao

Keyword(s):

Control System ◽

Numerical Experiment ◽

Network Design ◽

Evaluation System ◽

Real Life ◽

Objective Evaluation ◽

Multi Objective Optimization ◽

Real Life Data ◽

The Mathematical Model

Abstract By exemplifying the feeder service for the port of Kotka, this study proposed a multi-objective optimization model for feeder network design. Innovative for difference from the single-objective evaluation system, the objective of feeder network design was proposed to include single allocation cost, intra-Europe cargo revenue, equipment balance, sailing cycle, allocation utilization, service route competitiveness, and stability. A three-stage control system was presented, and numerical experiment based on container liner’s real life data was conducted to verify the mathematical model and the control system. The numerical experiment revealed that the three-stage control system is effective and practical, and the research ideas had been applicable with satisfactory effect.

Download Full-text

Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Signals ◽

10.3390/signals2030027 ◽

2021 ◽

Vol 2 (3) ◽

pp. 434-455

Author(s):

Sujan Kumar Roy ◽

Kuldip K. Paliwal

Keyword(s):

Kalman Filter ◽

Speech Enhancement ◽

Linear Prediction ◽

Real Life ◽

Model Parameters ◽

Noise Variance ◽

Noisy Speech ◽

Kalman Gain ◽

Whitening Filter ◽

Prediction Coefficient

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.

Download Full-text