Local and Global Latent Semantic Analysis for Text Categorization

Classification Accuracy ◽

Text Categorization ◽

Semantic Analysis ◽

Experimental Results ◽

Semantic Approach ◽

Document Categorization ◽

Second Use

In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09 ◽

Kernel latent semantic analysis using an information retrieval based kernel

10.1145/1645953.1646214 ◽

2009 ◽

Cited By ~ 1

Author(s):

Laurence A.F. Park ◽

Kotagiri Ramamohanarao

Keyword(s):

Information Retrieval ◽

Semantic Analysis

SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis

Natural Language Engineering ◽

10.1017/s1351324914000138 ◽

2014 ◽

Vol 22 (2) ◽

pp. 291-319 ◽

Author(s):

SHUDONG HAO ◽

YANYAN XU ◽

DENGFENG KE ◽

KAILE SU ◽

HENGLI PENG

Keyword(s):

Scoring System ◽

Semantic Analysis ◽

Scoring Systems ◽

Recall Rate ◽

Experimental Results ◽

Important Indicator ◽

Simplified Chinese ◽

Language Tests ◽

Essay Scoring

AbstractWriting in language tests is regarded as an important indicator for assessing language skills of test takers. As Chinese language tests become popular, scoring a large number of essays becomes a heavy and expensive task for the organizers of these tests. In the past several years, some efforts have been made to develop automated simplified Chinese essay scoring systems, reducing both costs and evaluation time. In this paper, we introduce a system called SCESS (automated Simplified Chinese Essay Scoring System) based on Weighted Finite State Automata (WFSA) and using Incremental Latent Semantic Analysis (ILSA) to deal with a large number of essays. First, SCESS uses ann-gram language model to construct a WFSA to perform text pre-processing. At this stage, the system integrates a Confusing-Character Table, a Part-Of-Speech Table, beam search and heuristic search to perform automated word segmentation and correction of essays. Experimental results show that this pre-processing procedure is effective, with a Recall Rate of 88.50%, a Detection Precision of 92.31% and a Correction Precision of 88.46%. After text pre-processing, SCESS uses ILSA to perform automated essay scoring. We have carried out experiments to compare the ILSA method with the traditional LSA method on the corpora of essays from the MHK test (the Chinese proficiency test for minorities). Experimental results indicate that ILSA has a significant advantage over LSA, in terms of both running time and memory usage. Furthermore, experimental results also show that SCESS is quite effective with a scoring performance of 89.50%.

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13 ◽

Boosting novelty for biomedical information retrieval through probabilistic latent semantic analysis

10.1145/2484028.2484174 ◽

2013 ◽

Author(s):

Xiangdong An ◽

Jimmy Xiangji Huang

Keyword(s):

Information Retrieval ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Biomedical Information Retrieval

An Application of Latent Semantic Analysis for Text Categorization

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2015.3.1923 ◽

2015 ◽

Vol 10 (3) ◽

pp. 357 ◽

Cited By ~ 6

Author(s):

Gang Kou ◽

Yi Peng

Keyword(s):

Text Categorization ◽

Semantic Analysis

Latent Semantic Analysis and Beyond

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch032 ◽

2010 ◽

pp. 546-570 ◽

Cited By ~ 1

Author(s):

Anne Kao

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Semantic Analysis ◽

Search Space ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Mathematical Approach ◽

Text Information ◽

Space Method

Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI), when applied to information retrieval, has been a major analysis approach in text mining. It is an extension of the vector space method in information retrieval, representing documents as numerical vectors but using a more sophisticated mathematical approach to characterize the essential features of the documents and reduce the number of features in the search space. This chapter summarizes several major approaches to this dimensionality reduction, each of which has strengths and weaknesses, and it describes recent breakthroughs and advances. It shows how the constructs and products of LSA applications can be made user-interpretable and reviews applications of LSA beyond information retrieval, in particular, to text information visualization.

Communications in Computer and Information Science - Applied Informatics and Communication ◽

Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization

10.1007/978-3-642-23235-0_42 ◽

2011 ◽

pp. 321-329 ◽

Cited By ~ 1

Author(s):

Yuan Wan ◽

Hengqing Tong ◽

Yanfang Deng

Keyword(s):

Support Vector Machine ◽

Text Categorization ◽

Semantic Analysis ◽

Support Vector

Latent Semantic Analysis for Text Mining and Beyond

Intelligent Multimedia Databases and Information Retrieval ◽

10.4018/978-1-61350-126-9.ch015 ◽

2013 ◽

pp. 253-280 ◽

Author(s):

Anne Kao ◽

Steve Poteet ◽

Jason Wu ◽

William Ferng ◽

Rod Tjoelker ◽

...

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Web Mining ◽

Semantic Analysis ◽

Search Space ◽

Latent Semantic Indexing ◽

Cross Language Information Retrieval ◽

Text Information ◽

Cross Language

Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI), when applied to information retrieval, has been a major analysis approach in text mining. It is an extension of the vector space method in information retrieval, representing documents as numerical vectors but using a more sophisticated mathematical approach to characterize the essential features of the documents and reduce the number of features in the search space. This chapter summarizes several major approaches to this dimensionality reduction, each of which has strengths and weaknesses, and it describes recent breakthroughs and advances. It shows how the constructs and products of LSA applications can be made user-interpretable and reviews applications of LSA beyond information retrieval, in particular, to text information visualization. While the major application of LSA is for text mining, it is also highly applicable to cross-language information retrieval, Web mining, and analysis of text transcribed from speech and textual information in video.

The Use of Text Mining Techniques in Electronic Discovery for Legal Matters

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch008 ◽

2012 ◽

pp. 174-190

Author(s):

Michael W. Berry ◽

Reed Esau ◽

Bruce Kiefer

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Matrix Factorization ◽

Semantic Analysis ◽

Electronic Documents ◽

Collection Process ◽

Relevance Judgments ◽

Electronic Discovery ◽

Non Negative Matrix Factorization

Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.

2016 Eighth International Conference on Measuring Technology and Mechatronics Automation (ICMTMA) ◽

Application Research on Latent Semantic Analysis for Information Retrieval

10.1109/icmtma.2016.37 ◽

2016 ◽

Author(s):

Chen Wenli

Keyword(s):

Information Retrieval ◽

Semantic Analysis ◽

Application Research