Research on Content Analysis Algorithm of Focused Crawler Based on LBTF-IDF

2014 ◽  
Vol 971-973 ◽  
pp. 1722-1725
Author(s):  
Jun Luo ◽  
You Li Lu ◽  
Chen Xi Lin

This paper focuses on the correlation analysis method based on vector space model. In the case of dual classification, this paper made a Joint comparison to find the most appropriate method of selecting featured items for the focused crawler; and then made special effort on analysis and verification of LBTF-IDF algorithm in which the weight calculation method has been improved.

2003 ◽  
Vol 18 (1) ◽  
pp. 114-117 ◽  
Author(s):  
Ling Zhang ◽  
FanYuan Ma ◽  
YunMing Ye ◽  
JianGuo Chen

2021 ◽  
Vol 16 (7) ◽  
pp. 1107-1114
Author(s):  
Xiongli Li ◽  
Fei Xiao ◽  
Youlin Hu ◽  
Huikai Peng

In order to solve the problems of low accuracy and incomprehensive recognition of the topological relationship between households in the station area and the incomplete recognition results in traditional methods, a method for identifying topological relationships between household changes in low-voltage stations based on correlation analysis algorithm and probabilistic decision method is proposed. The BIRCH method is used to cluster the topological relationship characteristics of the household line changes in the low-voltage station area, and the topological relationship characteristics are obtained through clustering parameter initialization, clustering implementation and clustering evaluation, and the user phases in the topological relationship are identified according to the feature clustering results. The correlation analysis method is used to analyze the similarity of the voltage sequence of the points to be identified and the comprehensive similarity of all the faults of the target distribution transformer and the auxiliary distribution transformer, and set a similarity threshold to determine whether the points to be identified belong to the same station area. Finally, based on the probabilistic decision-making method, the identification of the topological relationship of the low-voltage station area household line change is completed. The experimental results show that this method can not only identify the topological relationship of single distribution transformer outage, but also identify the topological relationship of multiple distribution transformer outage. The accuracy of the identification result is high, and the identification loss function is low, which indicates that the identification result of this method is reliable and comprehensive.


2012 ◽  
Vol 182-183 ◽  
pp. 1728-1732
Author(s):  
Bin Xia ◽  
Peng Yan Guo ◽  
Hong Bo Qiao ◽  
Rui Gao

Automobile information on internet increases fast as the quick development in automobile information construction. However, general search engines are unable to meet the increasing demand for accurate searching of automobile information. The present paper reports the designing and implementation of a vertical search engine of automobile information by adopting vector space model to identify the automobile subject, and combine content analysis and link analysis. This search engine was proven making the result more reasonable and effective and thus increasing the rate of accuracy of search engine.


Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2018 ◽  
Vol 9 (2) ◽  
pp. 97-105
Author(s):  
Richard Firdaus Oeyliawan ◽  
Dennis Gunawan

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model


2019 ◽  
Vol 11 (1) ◽  
pp. 01025-1-01025-5 ◽  
Author(s):  
N. A. Borodulya ◽  
◽  
R. O. Rezaev ◽  
S. G. Chistyakov ◽  
E. I. Smirnova ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document