Latent Semantic Analysis and Beyond

Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI), when applied to information retrieval, has been a major analysis approach in text mining. It is an extension of the vector space method in information retrieval, representing documents as numerical vectors but using a more sophisticated mathematical approach to characterize the essential features of the documents and reduce the number of features in the search space. This chapter summarizes several major approaches to this dimensionality reduction, each of which has strengths and weaknesses, and it describes recent breakthroughs and advances. It shows how the constructs and products of LSA applications can be made user-interpretable and reviews applications of LSA beyond information retrieval, in particular, to text information visualization. While the major application of LSA is for text mining, it is also highly applicable to cross-language information retrieval, Web mining, and analysis of text transcribed from speech and textual information in video.

Download Full-text

The Use of Text Mining Techniques in Electronic Discovery for Legal Matters

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch008 ◽

2012 ◽

pp. 174-190

Author(s):

Michael W. Berry ◽

Reed Esau ◽

Bruce Kiefer

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Matrix Factorization ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Electronic Documents ◽

Collection Process ◽

Relevance Judgments ◽

Electronic Discovery ◽

Non Negative Matrix Factorization

Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.

Download Full-text

LSA Based Text Summarization

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3288.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 150-156

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Maximum Level ◽

Latent Semantic Indexing ◽

Text Summarization ◽

Semantic Indexing ◽

Inverse Document Frequency ◽

Document Frequency ◽

Key Terms ◽

Diversity Constraint

In this study we propose an automatic single document text summarization technique using Latent Semantic Analysis (LSA) and diversity constraint in combination. The proposed technique uses the query based sentence ranking. Here we are not considering the concept of IR (Information Retrieval) so we generate the query by using the TF-IDF(Term Frequency-Inverse Document Frequency). For producing the query vector, we identify the terms having the high IDF. We know that LSA utilizes the vectorial semantics to analyze the relationships between documents in a corpus or between sentences within a document and key terms they carry by producing a list of ideas interconnected to the documents and terms. LSA helps to represent the latent structure of documents. For selecting the sentences from the document Latent Semantic Indexing (LSI) is used. LSI helps to arrange the sentences with its score. Traditionally the highest score sentences have been chosen for summary but here we calculate the diversity between chosen sentences and produce the final summary as a good summary should have maximum level of diversity. The proposed technique is evaluated on OpinosisDataset1.0.

Download Full-text

Kernel latent semantic analysis using an information retrieval based kernel

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09 ◽

10.1145/1645953.1646214 ◽

2009 ◽

Cited By ~ 1

Author(s):

Laurence A.F. Park ◽

Kotagiri Ramamohanarao

Keyword(s):

Information Retrieval ◽

Latent Semantic Analysis ◽

Semantic Analysis

Download Full-text

Assessment of Latent Semantic Analysis (LSA) Text Mining Algorithms for Large Scale Mapping of Patent and Scientific Publication Documents

SSRN Electronic Journal ◽

10.2139/ssrn.2096159 ◽

2011 ◽

Cited By ~ 2

Author(s):

Bart Van Looy ◽

Bart Baesens ◽

Tom Magerman ◽

Koenraad Debackere

Keyword(s):

Text Mining ◽

Latent Semantic Analysis ◽

Large Scale ◽

Semantic Analysis ◽

Scientific Publication ◽

Mining Algorithms

Download Full-text

Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval

2009 International Forum on Information Technology and Applications ◽

10.1109/ifita.2009.532 ◽

2009 ◽

Author(s):

Xie Fang ◽

Liu Xiaoguang ◽

Hu Quan

Keyword(s):

Information Retrieval ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Indexing Model

Download Full-text

Local and Global Latent Semantic Analysis for Text Categorization

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014070101 ◽

2014 ◽

Vol 4 (3) ◽

pp. 1-13

Author(s):

Khadoudja Ghanem

Keyword(s):

Information Retrieval ◽

High Precision ◽

Latent Semantic Analysis ◽

Classification Accuracy ◽

Text Categorization ◽

Semantic Analysis ◽

Experimental Results ◽

Semantic Approach ◽

Document Categorization ◽

Second Use

In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.

Download Full-text

Latent Semantic Indexing based Intelligent Information Retrieval System for Digital Libraries

Journal of Computing and Information Technology ◽

10.2498/cit.2006.03.02 ◽

2006 ◽

Vol 14 (3) ◽

pp. 191 ◽

Cited By ~ 4

Author(s):

AswaniKumar Ch ◽

Ankush Gupta ◽

Shagun Trehan ◽

Mahmooda Batool

Keyword(s):

Information Retrieval ◽

Digital Libraries ◽

Retrieval System ◽

Latent Semantic Indexing ◽

Information Retrieval System ◽

Semantic Indexing ◽

Intelligent Information Retrieval ◽

Intelligent Information

Download Full-text

Analysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval

Cybernetics and Information Technologies ◽

10.2478/cait-2012-0003 ◽

2012 ◽

Vol 12 (1) ◽

pp. 34-48 ◽

Cited By ~ 11

Author(s):

Ch. Aswani Kumar ◽

M. Radvansky ◽

J. Annapurna

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Formal Concept Analysis ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Concept Analysis ◽

Formal Concept ◽

Semantic Indexing ◽

Space Model ◽

Classical Vector

Abstract Latent Semantic Indexing (LSI), a variant of classical Vector Space Model (VSM), is an Information Retrieval (IR) model that attempts to capture the latent semantic relationship between the data items. Mathematical lattices, under the framework of Formal Concept Analysis (FCA), represent conceptual hierarchies in data and retrieve the information. However, both LSI and FCA use the data represented in the form of matrices. The objective of this paper is to systematically analyze VSM, LSI and FCA for the task of IR using standard and real life datasets.

Download Full-text

Boosting novelty for biomedical information retrieval through probabilistic latent semantic analysis

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13 ◽

10.1145/2484028.2484174 ◽

2013 ◽

Cited By ~ 2

Author(s):

Xiangdong An ◽

Jimmy Xiangji Huang

Keyword(s):

Information Retrieval ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Biomedical Information Retrieval

Download Full-text