A New Vector Space Model Exploiting Semantic Correlations of Social Annotations for Web Page Clustering

Author(s):  
Xiwu Gu ◽  
Xianbing Wang ◽  
Ruixuan Li ◽  
Kunmei Wen ◽  
Yufei Yang ◽  
...  
2013 ◽  
Vol 846-847 ◽  
pp. 1801-1804
Author(s):  
Li Wei ◽  
Ling Zhang ◽  
Hua Mei Li ◽  
Xiao Zhou Chen

Chinese web page classification has been considered as a hot research area in data mining. In this paper, Chinese web page classification algorithm based on vector space model is proposed. This algorithm makes use of supervised machine learning theory to implement a web page classifier. It combined text frequency and methods for feature extraction and improved traditional TFIDF weighting formula. The results show that the classifier was feasible and effective.


2014 ◽  
Vol 543-547 ◽  
pp. 2957-2960 ◽  
Author(s):  
Xiu Xia Chen ◽  
Wen Qian Shang

This paper designs an automatic web crawler system which crawls music resources on the Internet. Firstly, this paper gives the architecture of the system and the function of each module; then describes the detailed design of each module; Finally, the key technologies and algorithms used in the system are given in a detailed description, including the use of χ2 statistics to select feature words, TF-IDF algorithm to calculate the weights of feature words, the correlation of web page and music theme using vector space model.


Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2018 ◽  
Vol 9 (2) ◽  
pp. 97-105
Author(s):  
Richard Firdaus Oeyliawan ◽  
Dennis Gunawan

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model


Sign in / Sign up

Export Citation Format

Share Document