Fast Text Categorization Based on a Novel Class Space Model

Author(s):  
Yingfan Gao ◽  
Runbo Ma ◽  
Yushu Liu

2007 ◽  
Vol 2 (1) ◽  
pp. 14-22 ◽  
Author(s):  
Wa`el Musa Hadi ◽  
Fadi Thabtah ◽  
Salahideen Mousa ◽  
Samer Al Hawari ◽  
Ghassan Kanaan ◽  
...  


Author(s):  
Makoto Suzuki ◽  
Naohide Yamagishi ◽  
Yi-Ching Tsai ◽  
Takashi Ishida ◽  
Masayuki Goto


2014 ◽  
Vol 989-994 ◽  
pp. 1541-1546
Author(s):  
Tie Bin Liu

Event-driven investments have gained great importance and popularity. Due to the importance of the timely and effective messages for successful investment, the automated categorization of documents into predefined labels has received an ever-increased attention in the recent years. This paper implements a new text document classifier by integrating the K-nearest neighbour (KNN) classification approach with the VSM vector space model. By screening the feature items and weighted key items, the proposed classifier turns the financial information text into N-dimensional vector and identified the positive and negative information, furthermore achieve to the classification optimized. In addition, the classification model constructed by the proposed algorithm can be updated incrementally, and it has great scalability in event-driven securities investment for investors.



2013 ◽  
Vol 427-429 ◽  
pp. 2449-2453
Author(s):  
Rong Ze Xia ◽  
Yan Jia ◽  
Hu Li

Traditional supervised classification method such as support vector machine (SVM) could achieve high performance in text categorization. However, we should first hand-labeled the samples before classifying. Its a time-consuming task. Unsupervised method such as k-means could also be used for handling the text categorization problem. However, Traditional k-means could easily be affected by several isolated observations. In this paper, we proposed a new text categorization method. First we improved the traditional k-means clustering algorithm. The improved k-means is used for clustering vectors in our vector space model. After that, we use the SVM to categorize vectors which are preprocessed by improved k-means. The experiments show that our algorithm could out-perform the traditional SVM text categorization method.



2009 ◽  
Vol 18 (02) ◽  
pp. 239-272 ◽  
Author(s):  
SUJEEVAN ASEERVATHAM

Kernels are widely used in Natural Language Processing as similarity measures within inner-product based learning methods like the Support Vector Machine. The Vector Space Model (VSM) is extensively used for the spatial representation of the documents. However, it is purely a statistical representation. In this paper, we present a Concept Vector Space Model (CVSM) representation which uses linguistic prior knowledge to capture the meanings of the documents. We also propose a linear kernel and a latent kernel for this space. The linear kernel takes advantage of the linguistic concepts whereas the latent kernel combines statistical and linguistic concepts. Indeed, the latter kernel uses latent concepts extracted by the Latent Semantic Analysis (LSA) in the CVSM. The kernels were evaluated on a text categorization task in the biomedical domain. The Ohsumed corpus, well known for being difficult to categorize, was used. The results have shown that the CVSM improves performance compared to the VSM.



Author(s):  
Makoto Suzuki ◽  
Naohide Yamagishi ◽  
Takashi Ishida ◽  
Masayuki Goto ◽  
Shigeichi Hirasawa


Methodology ◽  
2006 ◽  
Vol 2 (1) ◽  
pp. 24-33 ◽  
Author(s):  
Susan Shortreed ◽  
Mark S. Handcock ◽  
Peter Hoff

Recent advances in latent space and related random effects models hold much promise for representing network data. The inherent dependency between ties in a network makes modeling data of this type difficult. In this article we consider a recently developed latent space model that is particularly appropriate for the visualization of networks. We suggest a new estimator of the latent positions and perform two network analyses, comparing four alternative estimators. We demonstrate a method of checking the validity of the positional estimates. These estimators are implemented via a package in the freeware statistical language R. The package allows researchers to efficiently fit the latent space model to data and to visualize the results.





1985 ◽  
Vol 46 (C8) ◽  
pp. C8-421-C8-425
Author(s):  
J. F. Sadoc ◽  
R. Mosseri


Sign in / Sign up

Export Citation Format

Share Document