scholarly journals Opinion Texts Clustering Using Manifold Learning Based on Sentiment and Semantics Analysis

2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Sajjad Jahanbakhsh Gudakahriz ◽  
Amir Masoud Eftekhari Moghadam ◽  
Fariborz Mahmoudi

Nowadays, opinion texts are quickly published on websites and social networks by various users in the form of short texts and also in high volumes and various fields. Because these texts reflect the opinions of many users, their processing and analysis, such as clustering, can be very useful in a variety of applications including politics, industry, commerce, and economics. High dimensions of the text representation decrease efficiency of clustering, and an effective solution for this challenge is reducing dimensions of texts. Manifold learning is a powerful tool for nonlinear dimension reduction of high-dimensional data. Therefore, in this paper, for increasing efficiency of opinion texts clustering, by manifold learning, dimensions of the represented opinion texts are reduced based on sentiment and semantics, and their intrinsic dimensions are extracted. Then, the clustering algorithm is applied to dimension-reduced opinion texts. The proposed approach helps us to cluster opinion texts with simultaneous consideration of sentiment and semantics, which has received very little attention in the previous works. This type of clustering helps users of opinion texts to obtain more useful information from texts and also provides more accurate summaries in applications, such as the summarization of opinion texts. Experimental results on three datasets show better performance of the proposed approach on opinion texts in terms of important measures for evaluating clustering efficiency. An improvement of about 9% is observed in terms of accuracy on the third dataset and clustering based on sentiment and semantics.

2012 ◽  
Vol 263-266 ◽  
pp. 2126-2130 ◽  
Author(s):  
Zhi Gang Lou ◽  
Hong Zhao Liu

Manifold learning is a new unsupervised learning method. Its main purpose is to find the inherent law of generated data sets. Be used for high dimensional nonlinear fault samples for learning, in order to identify embedded in high dimensional data space in the low dimensional manifold, can be effective data found the essential characteristics of fault identification. In many types of fault, sometimes often failure and normal operation of the equipment of some operation similar to misjudgment, such as oil pipeline transportation process, pipeline regulating pump, adjustable valve, pump switch, normal operation and pipeline leakage fault condition similar spectral characteristics, thus easy for pipeline leakage cause mistakes. This paper uses the manifold learning algorithm for fault pattern clustering recognition, and through experiments on the algorithm is evaluated.


2011 ◽  
Vol 11 (3) ◽  
pp. 272
Author(s):  
Ivan Gavrilyuk ◽  
Boris Khoromskij ◽  
Eugene Tyrtyshnikov

Abstract In the recent years, multidimensional numerical simulations with tensor-structured data formats have been recognized as the basic concept for breaking the "curse of dimensionality". Modern applications of tensor methods include the challenging high-dimensional problems of material sciences, bio-science, stochastic modeling, signal processing, machine learning, and data mining, financial mathematics, etc. The guiding principle of the tensor methods is an approximation of multivariate functions and operators with some separation of variables to keep the computational process in a low parametric tensor-structured manifold. Tensors structures had been wildly used as models of data and discussed in the contexts of differential geometry, mechanics, algebraic geometry, data analysis etc. before tensor methods recently have penetrated into numerical computations. On the one hand, the existing tensor representation formats remained to be of a limited use in many high-dimensional problems because of lack of sufficiently reliable and fast software. On the other hand, for moderate dimensional problems (e.g. in "ab-initio" quantum chemistry) as well as for selected model problems of very high dimensions, the application of traditional canonical and Tucker formats in combination with the ideas of multilevel methods has led to the new efficient algorithms. The recent progress in tensor numerical methods is achieved with new representation formats now known as "tensor-train representations" and "hierarchical Tucker representations". Note that the formats themselves could have been picked up earlier in the literature on the modeling of quantum systems. Until 2009 they lived in a closed world of those quantum theory publications and never trespassed the territory of numerical analysis. The tremendous progress during the very recent years shows the new tensor tools in various applications and in the development of these tools and study of their approximation and algebraic properties. This special issue treats tensors as a base for efficient numerical algorithms in various modern applications and with special emphases on the new representation formats.


2009 ◽  
Vol 35 (7) ◽  
pp. 859-866
Author(s):  
Ming LIU ◽  
Xiao-Long WANG ◽  
Yuan-Chao LIU

2020 ◽  
Vol 1 (2) ◽  
pp. 101-123
Author(s):  
Hiroaki Shiokawa ◽  
Yasunori Futamura

This paper addressed the problem of finding clusters included in graph-structured data such as Web graphs, social networks, and others. Graph clustering is one of the fundamental techniques for understanding structures present in the complex graphs such as Web pages, social networks, and others. In the Web and data mining communities, the modularity-based graph clustering algorithm is successfully used in many applications. However, it is difficult for the modularity-based methods to find fine-grained clusters hidden in large-scale graphs; the methods fail to reproduce the ground truth. In this paper, we present a novel modularity-based algorithm, \textit{CAV}, that shows better clustering results than the traditional algorithm. The proposed algorithm employs a cohesiveness-aware vector partitioning into the graph spectral analysis to improve the clustering accuracy. Additionally, this paper also presents a novel efficient algorithm \textit{P-CAV} for further improving the clustering speed of CAV; P-CAV is an extension of CAV that utilizes the thread-based parallelization on a many-core CPU. Our extensive experiments on synthetic and public datasets demonstrate the performance superiority of our approaches over the state-of-the-art approaches.


2013 ◽  
Vol 765-767 ◽  
pp. 670-673
Author(s):  
Li Bo Hou

Fuzzy C-means (FCM) clustering algorithm is one of the widely applied algorithms in non-supervision of pattern recognition. However, FCM algorithm in the iterative process requires a lot of calculations, especially when feature vectors has high-dimensional, Use clustering algorithm to sub-heap, not only inefficient, but also may lead to "the curse of dimensionality." For the problem, This paper analyzes the fuzzy C-means clustering algorithm in high dimensional feature of the process, the problem of cluster center is an np-hard problem, In order to improve the effectiveness and Real-time of fuzzy C-means clustering algorithm in high dimensional feature analysis, Combination of landmark isometric (L-ISOMAP) algorithm, Proposed improved algorithm FCM-LI. Preliminary analysis of the samples, Use clustering results and the correlation of sample data, using landmark isometric (L-ISOMAP) algorithm to reduce the dimension, further analysis on the basis, obtained the final results. Finally, experimental results show that the effectiveness and Real-time of FCM-LI algorithm in high dimensional feature analysis.


Sign in / Sign up

Export Citation Format

Share Document