Application of Modified OPTICS Algorithm in E-Commerce Sites Classification and Evaluation

2016 ◽  
Vol 14 (1) ◽  
pp. 64-75
Author(s):  
Zhuoxi Yu ◽  
YuJia Jin ◽  
Milan Parmar ◽  
Limin Wang

In the era of the development in network economy, e-commerce sites' operational efficiency is in relation to the development of enterprises. Thus, how to evaluate e-commerce sites have become a hot topic. Due to the evaluation index of e-commerce sites have the characteristics of high dimension and data inhomogeneity, the new method combines PCA with the improved OPTICS algorithm to classify and evaluate the e-commerce demonstration enterprise websites. Firstly, using PCA to reduce the dimension of high-dimensional data. Secondly, for the limitation of OPTICS algorithm in dealing with sparse points, then using the improved OPTICS algorithm in clustering low-dimensional data to evaluate the effect of e-commerce sites and make suggestions.

2018 ◽  
Vol 24 (4) ◽  
pp. 225-247 ◽  
Author(s):  
Xavier Warin

Abstract A new method based on nesting Monte Carlo is developed to solve high-dimensional semi-linear PDEs. Depending on the type of non-linearity, different schemes are proposed and theoretically studied: variance error are given and it is shown that the bias of the schemes can be controlled. The limitation of the method is that the maturity or the Lipschitz constants of the non-linearity should not be too high in order to avoid an explosion of the computational time. Many numerical results are given in high dimension for cases where analytical solutions are available or where some solutions can be computed by deep-learning methods.


2021 ◽  
Vol 50 (1) ◽  
pp. 138-152
Author(s):  
Mujeeb Ur Rehman ◽  
Dost Muhammad Khan

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.


2020 ◽  
Vol 49 (3) ◽  
pp. 421-437
Author(s):  
Genggeng Liu ◽  
Lin Xie ◽  
Chi-Hua Chen

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.


Author(s):  
Bharat Gupta ◽  
Durga Toshniwal

In high dimensional data large no of outliers are embedded in low dimensional subspaces known as projected outliers, but most of existing outlier detection techniques are unable to find these projected outliers, because these methods perform detection of abnormal patterns in full data space. So, outlier detection in high dimensional data becomes an important research problem. In this paper we are proposing an approach for outlier detection of high dimensional data. Here we are modifying the existing SPOT approach by adding three new concepts namely Adaption of Sparse Sub-Space Template (SST), Different combination of PCS parameters and set of non outlying cells for testing data set.


2021 ◽  
Author(s):  
Klaus Eckelt ◽  
Andreas Hinterreiter ◽  
Patrick Adelberger ◽  
Conny Walchshofer ◽  
Vaishali Dhanoa ◽  
...  

In this work, we propose an interactive visual approach for the exploration of structural relationships in embeddings of high-dimensional data. These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, are defining properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all. In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. The original high-dimensional data for single items, groups of items, or differences between connected items and groups is accessible through additional summary visualizations. We carefully tailored these summary and difference visualizations to the various data types and semantic contexts. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups, thereby creating graphs that represent visual data stories. We demonstrate the utility and potential impact of our approach by means of two use cases and multiple examples from various domains.


Author(s):  
Jing Wang ◽  
Jinglin Zhou ◽  
Xiaolu Chen

AbstractIndustrial data variables show obvious high dimension and strong nonlinear correlation. Traditional multivariate statistical monitoring methods, such as PCA, PLS, CCA, and FDA, are only suitable for solving the high-dimensional data processing with linear correlation. The kernel mapping method is the most common technique to deal with the nonlinearity, which projects the original data in the low-dimensional space to the high-dimensional space through appropriate kernel functions so as to achieve the goal of linear separability in the new space. However, the space projection from the low dimension to the high dimension is contradictory to the actual requirement of dimensionality reduction of the data. So kernel-based method inevitably increases the complexity of data processing.


2013 ◽  
Vol 6 (3) ◽  
pp. 441-448 ◽  
Author(s):  
Sajid Nagi ◽  
Dhruba Kumar Bhattacharyya ◽  
Jugal K. Kalita

When clustering high dimensional data, traditional clustering methods are found to be lacking since they consider all of the dimensions of the dataset in discovering clusters whereas only some of the dimensions are relevant. This may give rise to subspaces within the dataset where clusters may be found. Using feature selection, we can remove irrelevant and redundant dimensions by analyzing the entire dataset. The problem of automatically identifying clusters that exist in multiple and maybe overlapping subspaces of high dimensional data, allowing better clustering of the data points, is known as Subspace Clustering. There are two major approaches to subspace clustering based on search strategy. Top-down algorithms find an initial clustering in the full set of dimensions and evaluate the subspaces of each cluster, iteratively improving the results. Bottom-up approaches start from finding low dimensional dense regions, and then use them to form clusters. Based on a survey on subspace clustering, we identify the challenges and issues involved with clustering gene expression data.


Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Clustering in high-dimensional spaces is a recurrent problem in many domains. It affects time complexity, space complexity, scalability and accuracy of clustering methods. Highdimensional non-linear datausually live in different low dimensional subspaces hidden in the original space. As high‐dimensional objects appear almost alike, new approaches for clustering are required. This research has focused on developing Mathematical models, techniques and clustering algorithms specifically for high‐dimensional data. The innocent growth in the fields of communication and technology, there is tremendous growth in high dimensional data spaces. As the variant of dimensions on high dimensional non-linear data increases, many clustering techniques begin to suffer from the curse of dimensionality, de-grading the quality of the results. In high dimensional non-linear data, the data becomes very sparse and distance measures become increasingly meaningless. The principal challenge for clustering high dimensional data is to overcome the “curse of dimensionality”. This research work concentrates on devising an enhanced algorithm for clustering high dimensional non-linear data.


Sign in / Sign up

Export Citation Format

Share Document