Visualizing text similarities from a graph-based SOM

This paper applies the dynamic self-organizing maps algorithm to determining the number of clustering. The text eigenvector is acquired based on the vector space model (VSM) and TF.IDF method. The number of clustering acquired by the dynamic self-organizing maps. The threshold GT control the network’s growth.Compared to the traditional fuzzy clustering algorithm, the present algorithm possesses higher precision. The example demonstrates the effectiveness of the present algorithm.

Download Full-text

An Ontology Based Model for Document Clustering

International Journal of Intelligent Information Technologies ◽

10.4018/jiit.2011070105 ◽

2011 ◽

Vol 7 (3) ◽

pp. 54-69 ◽

Cited By ~ 13

Author(s):

U. K. Sridevi ◽

N. Nagaveni

Keyword(s):

Vector Space ◽

Domain Knowledge ◽

Clustering Algorithm ◽

Document Clustering ◽

Vector Space Model ◽

Search Space ◽

Space Model ◽

Clustering Model ◽

Document Collection ◽

Improved Performance

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.

Download Full-text

A Sequence Clustering Algorithm for Detecting Software Vulnerabilities Based on Vector Space Model

INTERNATIONAL JOURNAL ON Advances in Information Sciences and Service Sciences ◽

10.4156/aiss.vol4.issue16.30 ◽

2012 ◽

Vol 4 (16) ◽

pp. 258-264

Author(s):

Yanyan WANG ◽

Yanning WANG ◽

Jiadong REN

Keyword(s):

Vector Space ◽

Clustering Algorithm ◽

Vector Space Model ◽

Space Model ◽

Software Vulnerabilities ◽

Sequence Clustering

Download Full-text

The Automatic Non-Negative Matrix Factorization of the Hierarchy Clustering Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.325-326.1489 ◽

2013 ◽

Vol 325-326 ◽

pp. 1489-1492

Author(s):

Tie Qi Li ◽

Wen Shuo Zhang

Keyword(s):

Matrix Factorization ◽

Vector Space Model ◽

Real Data ◽

Data Sets ◽

Text Data ◽

Space Model ◽

Hierarchical Relations ◽

Weight Calculation ◽

Novel Method ◽

Non Negative Matrix Factorization

People in such huge information how to find useful information becomes a problem. In order to deal with hierarchical relations in text data, a novel method, called automatic non-negative matrix factorization of the hierarchy clustering, is proposed for the text mining. We use the vector space model as the research foundation, mainly discusses the feature selection and weight calculation two problems. The experimental results on the real data sets demonstrate that our method outperforms, on average, all the other 6 methods.

Download Full-text

A Text Categorization Method Based on SVM and Improved K-Means

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2449 ◽

2013 ◽

Vol 427-429 ◽

pp. 2449-2453

Author(s):

Rong Ze Xia ◽

Yan Jia ◽

Hu Li

Keyword(s):

Support Vector Machine ◽

Vector Space ◽

High Performance ◽

Supervised Classification ◽

Text Categorization ◽

Clustering Algorithm ◽

Vector Space Model ◽

Classification Method ◽

Support Vector ◽

Space Model

Traditional supervised classification method such as support vector machine (SVM) could achieve high performance in text categorization. However, we should first hand-labeled the samples before classifying. Its a time-consuming task. Unsupervised method such as k-means could also be used for handling the text categorization problem. However, Traditional k-means could easily be affected by several isolated observations. In this paper, we proposed a new text categorization method. First we improved the traditional k-means clustering algorithm. The improved k-means is used for clustering vectors in our vector space model. After that, we use the SVM to categorize vectors which are preprocessed by improved k-means. The experiments show that our algorithm could out-perform the traditional SVM text categorization method.

Download Full-text

Research on Clustering Analysis Based on SOM

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.475-476.968 ◽

2013 ◽

Vol 475-476 ◽

pp. 968-971

Author(s):

Hai Xue Liu ◽

Rui Jun Yang ◽

Wen Ju Li ◽

Wan Jun Yu ◽

Wei Lu

Keyword(s):

Clustering Analysis ◽

Clustering Algorithm ◽

Poor Quality ◽

Original Algorithm ◽

Space Model ◽

Som Algorithm ◽

Som Network ◽

Som Clustering ◽

Selection Of

In this paper, we present an improved text clustering algorithm. It not only maintains the self-organizing features of SOM network, but also makes up the disadvantages of the bad clustering effect caused by the inadequate selection of K-means algorithm. Firstly, data is preprocessed to form vector space model for subsequent process. Then, we analyze the features of original clustering algorithm and SOM algorithm, and plan an improved SOM clustering algorithm to overcome low stability and poor quality of original algorithm. The experimental results indicate that the improved algorithm has a higher accuracy and has a better stability, compared with the original algorithm.

Download Full-text

An Improving Algorithm Based on SOM Clustering and its Applications

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.655-657.1000 ◽

2013 ◽

Vol 655-657 ◽

pp. 1000-1004

Author(s):

Chen Guang Yan ◽

Yu Jing Liu ◽

Jin Hui Fan

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Network Clustering ◽

Self Organizing Map ◽

Clustering Method ◽

Map Algorithm ◽

Som Neural Network ◽

Simulation Results ◽

Som Clustering ◽

Selection Of

SOM (Self-organizing Map) algorithm is a clustering method basing on non-supervision condition. The paper introduces an improved algorithm based on SOM neural network clustering. It proposes SOM’s basic theory on data clustering. For SOM’s practical problems in applications, the algorithm also improved the selection of initial weights and the scope of neighborhood parameters. Finally, the simulation results in Matlab prove that the improved clustering algorithm improve the correct rate and computational efficiency of data clustering and to make the convergence speed better.

Download Full-text

Real-Time Topic Detection with Dynamic Windows

The Computer Journal ◽

10.1093/comjnl/bxz042 ◽

2019 ◽

Vol 63 (3) ◽

pp. 469-478

Author(s):

Na Su ◽

Shujuan Ji ◽

Jimin Liu

Keyword(s):

Data Analysis ◽

Real Time ◽

Data Stream ◽

Clustering Algorithm ◽

Vector Space Model ◽

Topic Detection ◽

Dynamic Clustering ◽

Improve Performance ◽

Space Model ◽

Frequent Items

Abstract Microblog is a popular social network in which hot topics propagate online rapidly. Real-time topic detection can not only understand public opinion well but also bring high commercial value. We design a method for real-time microblog data analysis in order to detect popular long lasting events as well as emerging events. Firstly, a mining frequent items algorithm on microblog data stream is proposed to count approximate word frequency. This mining frequent items algorithm can find the frequent words for some time. Secondly, the windows size of the monitored words is adjusted dynamically according to the duration time and the evolution of events. Lastly, new topics and trends of existing topics can be detected by using dynamic clustering algorithm based on vector space model. Experimental results show that the proposed algorithms can improve performance in terms of running time and accuracy.

Download Full-text

Deep Learning Assisted Buildings Energy Consumption Profiling Using Smart Meter Data

Sensors ◽

10.3390/s20030873 ◽

2020 ◽

Vol 20 (3) ◽

pp. 873 ◽

Cited By ~ 8

Author(s):

Amin Ullah ◽

Kilichbek Haydarov ◽

Ijaz Ul Haq ◽

Khan Muhammad ◽

Seungmin Rho ◽

...

Keyword(s):

Energy Consumption ◽

Clustering Algorithm ◽

Energy Utilization ◽

Self Organizing Map ◽

Residential Areas ◽

Energy Management Systems ◽

Consumption Data ◽

Som Clustering ◽

Low Dimensional ◽

High Level

The exponential growth in population and their overall reliance on the usage of electrical and electronic devices have increased the demand for energy production. It needs precise energy management systems that can forecast the usage of the consumers for future policymaking. Embedded smart sensors attached to electricity meters and home appliances enable power suppliers to effectively analyze the energy usage to generate and distribute electricity into residential areas based on their level of energy consumption. Therefore, this paper proposes a clustering-based analysis of energy consumption to categorize the consumers’ electricity usage into different levels. First, a deep autoencoder that transfers the low-dimensional energy consumption data to high-level representations was trained. Second, the high-level representations were fed into an adaptive self-organizing map (SOM) clustering algorithm. Afterward, the levels of electricity energy consumption were established by conducting the statistical analysis on the obtained clustered data. Finally, the results were visualized in graphs and calendar views, and the predicted levels of energy consumption were plotted over the city map, providing a compact overview to the providers for energy utilization analysis.

Download Full-text

An Ontology Based Model for Document Clustering

Organizational Efficiency through Intelligent Information Technologies ◽

10.4018/978-1-4666-2047-6.ch013 ◽

2012 ◽

pp. 199-215

Author(s):

Sridevi U. K. ◽

Nagaveni N.

Keyword(s):

Vector Space ◽

Domain Knowledge ◽

Clustering Algorithm ◽

Document Clustering ◽

Vector Space Model ◽

Search Space ◽

Space Model ◽

Before And After ◽

Document Collection ◽

Improved Performance

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.

Download Full-text