scholarly journals Visualizing text similarities from a graph-based SOM

2015 ◽  
Vol 14 (7) ◽  
pp. 5877-5886
Author(s):  
Khalid Kahloot ◽  
Mohammad A. Mikki ◽  
Akram A. ElKhatib

Text in articles is based on expert opinion of a large number of people including the views of authors. These views are based on cultural or community aspects, which make extracting information from text very difficult. This paper introduced how to utilize the capabilities of a modified graph-based Self-Organizing Map (SOM) in showing text similarities. Text similarities are extracted from an article using Google's PageRank algorithm. Sentences from an input article are represented as graph model instead of vector space model. The resulted graph can be shown in a visual animation for eight famous graph algorithms execution with animation speed control.The resulted graph is used as an input to SOM. SOM clustering algorithm is used to construct knowledge from text data. We used a visual animation for eight famous graph methods with animation speed control and according to similarity measure; an adjustable number of most similar sentences are arranged in visual form. In addition, this paper presents a wide variety of text searching. We had compared our project with famous clustering and visualization project in term of purity, entropy and F measure. Our project showed accepted results and mostly superiority over other projects.

2014 ◽  
Vol 556-562 ◽  
pp. 3945-3948
Author(s):  
Xin Qing Geng ◽  
Hong Yan Yang ◽  
Feng Mei Tao

This paper applies the dynamic self-organizing maps algorithm to determining the number of clustering. The text eigenvector is acquired based on the vector space model (VSM) and TF.IDF method. The number of clustering acquired by the dynamic self-organizing maps. The threshold GT control the network’s growth.Compared to the traditional fuzzy clustering algorithm, the present algorithm possesses higher precision. The example demonstrates the effectiveness of the present algorithm.


Author(s):  
U. K. Sridevi ◽  
N. Nagaveni

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.


2013 ◽  
Vol 325-326 ◽  
pp. 1489-1492
Author(s):  
Tie Qi Li ◽  
Wen Shuo Zhang

People in such huge information how to find useful information becomes a problem. In order to deal with hierarchical relations in text data, a novel method, called automatic non-negative matrix factorization of the hierarchy clustering, is proposed for the text mining. We use the vector space model as the research foundation, mainly discusses the feature selection and weight calculation two problems. The experimental results on the real data sets demonstrate that our method outperforms, on average, all the other 6 methods.


2013 ◽  
Vol 427-429 ◽  
pp. 2449-2453
Author(s):  
Rong Ze Xia ◽  
Yan Jia ◽  
Hu Li

Traditional supervised classification method such as support vector machine (SVM) could achieve high performance in text categorization. However, we should first hand-labeled the samples before classifying. Its a time-consuming task. Unsupervised method such as k-means could also be used for handling the text categorization problem. However, Traditional k-means could easily be affected by several isolated observations. In this paper, we proposed a new text categorization method. First we improved the traditional k-means clustering algorithm. The improved k-means is used for clustering vectors in our vector space model. After that, we use the SVM to categorize vectors which are preprocessed by improved k-means. The experiments show that our algorithm could out-perform the traditional SVM text categorization method.


2013 ◽  
Vol 475-476 ◽  
pp. 968-971
Author(s):  
Hai Xue Liu ◽  
Rui Jun Yang ◽  
Wen Ju Li ◽  
Wan Jun Yu ◽  
Wei Lu

In this paper, we present an improved text clustering algorithm. It not only maintains the self-organizing features of SOM network, but also makes up the disadvantages of the bad clustering effect caused by the inadequate selection of K-means algorithm. Firstly, data is preprocessed to form vector space model for subsequent process. Then, we analyze the features of original clustering algorithm and SOM algorithm, and plan an improved SOM clustering algorithm to overcome low stability and poor quality of original algorithm. The experimental results indicate that the improved algorithm has a higher accuracy and has a better stability, compared with the original algorithm.


2013 ◽  
Vol 655-657 ◽  
pp. 1000-1004
Author(s):  
Chen Guang Yan ◽  
Yu Jing Liu ◽  
Jin Hui Fan

SOM (Self-organizing Map) algorithm is a clustering method basing on non-supervision condition. The paper introduces an improved algorithm based on SOM neural network clustering. It proposes SOM’s basic theory on data clustering. For SOM’s practical problems in applications, the algorithm also improved the selection of initial weights and the scope of neighborhood parameters. Finally, the simulation results in Matlab prove that the improved clustering algorithm improve the correct rate and computational efficiency of data clustering and to make the convergence speed better.


2019 ◽  
Vol 63 (3) ◽  
pp. 469-478
Author(s):  
Na Su ◽  
Shujuan Ji ◽  
Jimin Liu

Abstract Microblog is a popular social network in which hot topics propagate online rapidly. Real-time topic detection can not only understand public opinion well but also bring high commercial value. We design a method for real-time microblog data analysis in order to detect popular long lasting events as well as emerging events. Firstly, a mining frequent items algorithm on microblog data stream is proposed to count approximate word frequency. This mining frequent items algorithm can find the frequent words for some time. Secondly, the windows size of the monitored words is adjusted dynamically according to the duration time and the evolution of events. Lastly, new topics and trends of existing topics can be detected by using dynamic clustering algorithm based on vector space model. Experimental results show that the proposed algorithms can improve performance in terms of running time and accuracy.


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 873 ◽  
Author(s):  
Amin Ullah ◽  
Kilichbek Haydarov ◽  
Ijaz Ul Haq ◽  
Khan Muhammad ◽  
Seungmin Rho ◽  
...  

The exponential growth in population and their overall reliance on the usage of electrical and electronic devices have increased the demand for energy production. It needs precise energy management systems that can forecast the usage of the consumers for future policymaking. Embedded smart sensors attached to electricity meters and home appliances enable power suppliers to effectively analyze the energy usage to generate and distribute electricity into residential areas based on their level of energy consumption. Therefore, this paper proposes a clustering-based analysis of energy consumption to categorize the consumers’ electricity usage into different levels. First, a deep autoencoder that transfers the low-dimensional energy consumption data to high-level representations was trained. Second, the high-level representations were fed into an adaptive self-organizing map (SOM) clustering algorithm. Afterward, the levels of electricity energy consumption were established by conducting the statistical analysis on the obtained clustered data. Finally, the results were visualized in graphs and calendar views, and the predicted levels of energy consumption were plotted over the city map, providing a compact overview to the providers for energy utilization analysis.


Author(s):  
Sridevi U. K. ◽  
Nagaveni N.

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.


Sign in / Sign up

Export Citation Format

Share Document