Opinion Texts Clustering Using Manifold Learning Based on Sentiment and Semantics Analysis

Nowadays, opinion texts are quickly published on websites and social networks by various users in the form of short texts and also in high volumes and various fields. Because these texts reflect the opinions of many users, their processing and analysis, such as clustering, can be very useful in a variety of applications including politics, industry, commerce, and economics. High dimensions of the text representation decrease efficiency of clustering, and an effective solution for this challenge is reducing dimensions of texts. Manifold learning is a powerful tool for nonlinear dimension reduction of high-dimensional data. Therefore, in this paper, for increasing efficiency of opinion texts clustering, by manifold learning, dimensions of the represented opinion texts are reduced based on sentiment and semantics, and their intrinsic dimensions are extracted. Then, the clustering algorithm is applied to dimension-reduced opinion texts. The proposed approach helps us to cluster opinion texts with simultaneous consideration of sentiment and semantics, which has received very little attention in the previous works. This type of clustering helps users of opinion texts to obtain more useful information from texts and also provides more accurate summaries in applications, such as the summarization of opinion texts. Experimental results on three datasets show better performance of the proposed approach on opinion texts in terms of important measures for evaluating clustering efficiency. An improvement of about 9% is observed in terms of accuracy on the third dataset and clustering based on sentiment and semantics.

Download Full-text

Failure Mode Recognition Clustering Algorithm Based on Manifold Learning

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.2126 ◽

2012 ◽

Vol 263-266 ◽

pp. 2126-2130 ◽

Cited By ~ 1

Author(s):

Zhi Gang Lou ◽

Hong Zhao Liu

Keyword(s):

Manifold Learning ◽

Clustering Algorithm ◽

Learning Algorithm ◽

Spectral Characteristics ◽

Normal Operation ◽

High Dimensional ◽

Dimensional Manifold ◽

Pattern Clustering ◽

Oil Pipeline ◽

Adjustable Valve

Manifold learning is a new unsupervised learning method. Its main purpose is to find the inherent law of generated data sets. Be used for high dimensional nonlinear fault samples for learning, in order to identify embedded in high dimensional data space in the low dimensional manifold, can be effective data found the essential characteristics of fault identification. In many types of fault, sometimes often failure and normal operation of the equipment of some operation similar to misjudgment, such as oil pipeline transportation process, pipeline regulating pump, adjustable valve, pump switch, normal operation and pipeline leakage fault condition similar spectral characteristics, thus easy for pipeline leakage cause mistakes. This paper uses the manifold learning algorithm for fault pattern clustering recognition, and through experiments on the algorithm is evaluated.

Download Full-text

Preface to the special issue, CMAM 2011, no. 3.

Computational Methods in Applied Mathematics ◽

10.2478/cmam-2011-0014 ◽

2011 ◽

Vol 11 (3) ◽

pp. 272

Author(s):

Ivan Gavrilyuk ◽

Boris Khoromskij ◽

Eugene Tyrtyshnikov

Keyword(s):

Separation Of Variables ◽

Numerical Algorithms ◽

Multilevel Methods ◽

High Dimensional ◽

Special Issue ◽

High Dimensions ◽

Guiding Principle ◽

Tensor Methods ◽

Closed World ◽

The One

Abstract In the recent years, multidimensional numerical simulations with tensor-structured data formats have been recognized as the basic concept for breaking the "curse of dimensionality". Modern applications of tensor methods include the challenging high-dimensional problems of material sciences, bio-science, stochastic modeling, signal processing, machine learning, and data mining, financial mathematics, etc. The guiding principle of the tensor methods is an approximation of multivariate functions and operators with some separation of variables to keep the computational process in a low parametric tensor-structured manifold. Tensors structures had been wildly used as models of data and discussed in the contexts of differential geometry, mechanics, algebraic geometry, data analysis etc. before tensor methods recently have penetrated into numerical computations. On the one hand, the existing tensor representation formats remained to be of a limited use in many high-dimensional problems because of lack of sufficiently reliable and fast software. On the other hand, for moderate dimensional problems (e.g. in "ab-initio" quantum chemistry) as well as for selected model problems of very high dimensions, the application of traditional canonical and Tucker formats in combination with the ideas of multilevel methods has led to the new efficient algorithms. The recent progress in tensor numerical methods is achieved with new representation formats now known as "tensor-train representations" and "hierarchical Tucker representations". Note that the formats themselves could have been picked up earlier in the literature on the modeling of quantum systems. Until 2009 they lived in a closed world of those quantum theory publications and never trespassed the territory of numerical analysis. The tremendous progress during the very recent years shows the new tensor tools in various applications and in the development of these tools and study of their approximation and algebraic properties. This special issue treats tensors as a base for efficient numerical algorithms in various modern applications and with special emphases on the new representation formats.

Download Full-text

A Fast Clustering Algorithm for Large-scale and High Dimensional Data

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.00859 ◽

2009 ◽

Vol 35 (7) ◽

pp. 859-866

Author(s):

Ming LIU ◽

Xiao-Long WANG ◽

Yuan-Chao LIU

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Nonlinear dimension reduction for conditional quantiles

Advances in Data Analysis and Classification ◽

10.1007/s11634-021-00439-6 ◽

2021 ◽

Author(s):

Eliana Christou ◽

Annabel Settle ◽

Andreas Artemiou

Keyword(s):

Dimension Reduction ◽

Conditional Quantiles ◽

Nonlinear Dimension Reduction ◽

Nonlinear Dimension

Download Full-text

An Efficient Two Stage Clustering Algorithm for Signed Social Networks

2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE) ◽

10.1109/icraie51050.2020.9358276 ◽

2020 ◽

Author(s):

Deepti ◽

Ajay Khunteta ◽

Ajit Noonia

Keyword(s):

Social Networks ◽

Clustering Algorithm ◽

Two Stage ◽

Signed Social Networks

Download Full-text

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

Soft Computing ◽

10.1007/s00500-021-05973-1 ◽

2021 ◽

Author(s):

Parul Agarwal ◽

Shikha Mehta ◽

Ajith Abraham

Keyword(s):

Clustering Algorithm ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional

Download Full-text

Efficient Vector Partitioning Algorithms for Graph Clustering

journal of Data Intelligence ◽

10.26421/jdi1.2-1 ◽

2020 ◽

Vol 1 (2) ◽

pp. 101-123

Author(s):

Hiroaki Shiokawa ◽

Yasunori Futamura

Keyword(s):

Social Networks ◽

Large Scale ◽

Clustering Algorithm ◽

Ground Truth ◽

Graph Clustering ◽

Mining Communities ◽

Fine Grained ◽

Efficient Vector ◽

Public Datasets ◽

Many Core

This paper addressed the problem of finding clusters included in graph-structured data such as Web graphs, social networks, and others. Graph clustering is one of the fundamental techniques for understanding structures present in the complex graphs such as Web pages, social networks, and others. In the Web and data mining communities, the modularity-based graph clustering algorithm is successfully used in many applications. However, it is difficult for the modularity-based methods to find fine-grained clusters hidden in large-scale graphs; the methods fail to reproduce the ground truth. In this paper, we present a novel modularity-based algorithm, \textit{CAV}, that shows better clustering results than the traditional algorithm. The proposed algorithm employs a cohesiveness-aware vector partitioning into the graph spectral analysis to improve the clustering accuracy. Additionally, this paper also presents a novel efficient algorithm \textit{P-CAV} for further improving the clustering speed of CAV; P-CAV is an extension of CAV that utilizes the thread-based parallelization on a many-core CPU. Our extensive experiments on synthetic and public datasets demonstrate the performance superiority of our approaches over the state-of-the-art approaches.

Download Full-text

A Multi-Step Nonlinear Dimension-Reduction Approach with Applications to Big Data

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2018.2876848 ◽

2019 ◽

Vol 31 (12) ◽

pp. 2249-2261 ◽

Cited By ~ 1

Author(s):

R. Krishnan ◽

V. A. Samaranayake ◽

S. Jagannathan

Keyword(s):

Big Data ◽

Dimension Reduction ◽

Nonlinear Dimension Reduction ◽

Nonlinear Dimension ◽

Reduction Approach

Download Full-text

Improved Fuzzy FCM-LI Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.670 ◽

2013 ◽

Vol 765-767 ◽

pp. 670-673

Author(s):

Li Bo Hou

Keyword(s):

Real Time ◽

Clustering Algorithm ◽

Feature Analysis ◽

Cluster Center ◽

High Dimensional ◽

Fuzzy C Means ◽

Sample Data ◽

Fuzzy C Means Clustering ◽

Fcm Clustering ◽

Np Hard Problem

Fuzzy C-means (FCM) clustering algorithm is one of the widely applied algorithms in non-supervision of pattern recognition. However, FCM algorithm in the iterative process requires a lot of calculations, especially when feature vectors has high-dimensional, Use clustering algorithm to sub-heap, not only inefficient, but also may lead to "the curse of dimensionality." For the problem, This paper analyzes the fuzzy C-means clustering algorithm in high dimensional feature of the process, the problem of cluster center is an np-hard problem, In order to improve the effectiveness and Real-time of fuzzy C-means clustering algorithm in high dimensional feature analysis, Combination of landmark isometric (L-ISOMAP) algorithm, Proposed improved algorithm FCM-LI. Preliminary analysis of the samples, Use clustering results and the correlation of sample data, using landmark isometric (L-ISOMAP) algorithm to reduce the dimension, further analysis on the basis, obtained the final results. Finally, experimental results show that the effectiveness and Real-time of FCM-LI algorithm in high dimensional feature analysis.

Download Full-text

PCA-K-Means Based Clustering Algorithm for High Dimensional and Overlapping Spectra Signals

2018 Ninth International Conference on Intelligent Control and Information Processing (ICICIP) ◽

10.1109/icicip.2018.8606667 ◽

2018 ◽

Author(s):

Nian Zhang ◽

Keenan Leatham ◽

Jiang Xiong ◽

Jing Zhong

Keyword(s):

Clustering Algorithm ◽

High Dimensional

Download Full-text