Term Categorization Using Latent Semantic Analysis for Intelligent Query Processing

With the rapid improvement in the field of social networks, a huge amount of small size texts are generated within a fraction of a second. Understanding and categorizing these texts for effective query processing is considered as one of the vital defy in the field of Natural Language Processing. The objective is to retrieve only relevant documents by categorizing the short texts. In the proposed method, terms are categorized by means of Latent Semantic Analysis (LSA). Our novel method focuses on applying the semantic enrichment for term categorization with the target of augmenting the unstructured data items for achieving faster and intelligent query processing in the big data environment. Therefore, retrieval of documents can be made effective with the flexibility of query term mapping

Download Full-text

LATENT-SEMANTIC ANALYSIS, SOCIAL NETWORKS AND NON-STRUCTURED DATA: INTERACTION METHOD

Visnyk Universytetu “Ukraina” ◽

10.36994/2707-4110-2019-2-23-29 ◽

2019 ◽

Keyword(s):

Social Networks ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Statistical Processing ◽

Unstructured Data ◽

Average Score ◽

Text Type ◽

Internet Users ◽

The Matrix ◽

Human Thinking

This article examines the method of latent-semantic analysis, its advantages, disadvantages, and the possibility of further transformation for use in arrays of unstructured data, which make up most of the information that Internet users deal with. To extract context-dependent word meanings through the statistical processing of large sets of textual data, an LSA method is used, based on operations with numeric matrices of the word-text type, the rows of which correspond to words, and the columns of text units to texts. The integration of words into themes and the representation of text units in the theme space is accomplished by applying one of the matrix expansions to the matrix data: singular decomposition or factorization of nonnegative matrices. The results of LSA studies have shown that the content of the similarity of words and text is obtained in such a way that the results obtained closely coincide with human thinking. Based on the methods described above, the author has developed and proposed a new way of finding semantic links between unstructured data, namely, information on social networks. The method is based on latent-semantic and frequency analyzes and involves processing the search result received, splitting each remaining text (post) into separate words, each of which takes the round in n words right and left, counting the number of occurrences of each term, working with a pre-created semantic resource (dictionary, ontology, RDF schema, ...). The developed method and algorithm have been tested on six well-known social networks, the interaction of which occurs through the ARI of the respective social networks. The average score for author's results exceeded that of their own social network search. The results obtained in the course of this dissertation can be used in the development of recommendation, search and other systems related to the search, rubrication and filtering of information.

Download Full-text

Hybrid Pre-Query Term Expansion using Latent Semantic Analysis

Fourth IEEE International Conference on Data Mining (ICDM'04) ◽

10.1109/icdm.2004.10085 ◽

2005 ◽

Cited By ~ 6

Author(s):

L.A.F. Park ◽

K. Ramamohanarao

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Query Term

Download Full-text

An index-based algorithm for fast on-line query processing of latent semantic analysis

PLoS ONE ◽

10.1371/journal.pone.0177523 ◽

2017 ◽

Vol 12 (5) ◽

pp. e0177523 ◽

Cited By ~ 1

Author(s):

Mingxi Zhang ◽

Pohan Li ◽

Wei Wang

Keyword(s):

Query Processing ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

On Line

Download Full-text

Thematic Context Derivator Algorithm for Enhanced Context Vector Machine: eCVM

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4564.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4872-4877

Keyword(s):

Language Processing ◽

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Named Entities ◽

Pagerank Algorithm ◽

Context Vector ◽

Improved Performance ◽

Evaluation Parameters ◽

Thematic Context

Natural Language Processing uses word embeddings to map words into vectors. Context vector is one of the techniques to map words into vectors. The context vector gives importance of terms in the document corpus. The derivation of context vector is done using various methods such as neural networks, latent semantic analysis, knowledge base methods etc. This paper proposes a novel system to devise an enhanced context vector machine called eCVM. eCVM is able to determine the context phrases and its importance. eCVM uses latent semantic analysis, existing context vector machine, dependency parsing, named entities, topics from latent dirichlet allocation and various forms of words like nouns, adjectives and verbs for building the context. eCVM uses context vector and Pagerank algorithm to find the importance of the term in document and is tested on BBC news dataset. Results of eCVM are compared with compared with the state of the art for context detrivation. The proposed system shows improved performance over existing systems for standard evaluation parameters.

Download Full-text

An Efficient Clustering Technique for Unstructured Data Utilizing Latent Semantic Analysis

Advances in Computer Science and Ubiquitous Computing - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-10-7605-3_38 ◽

2017 ◽

pp. 227-232

Author(s):

Yonghoon Kim ◽

Mokdong Chung

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Unstructured Data ◽

Clustering Technique

Download Full-text

Development of Cross Language Clone Detector for C, C++ & Java Repositories using Natural Language Processing

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3612.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2289-2293

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Code Clones ◽

Code Base ◽

Value Decomposition ◽

Cross Language ◽

Bug Fixes

Reusing the code with or without modification is common process in building all the large codebases of system software like Linux, gcc , and jdk. This process is referred to as software cloning or forking. Developers always find difficulty of bug fixes in porting large code base from one language to other native language during software porting. There exist many approaches in identifying software clones of same language that may not contribute for the developers involved in porting hence there is a need for cross language clone detector. This paper uses primary Natural Language Processing (NLP) approach using latent semantic analysis to find the cross language clones of other neighboring languages in terms of all 4 types of clones using latent semantic analysis algorithm that uses Singular value decomposition. It takes input as code(C, C++ or Java) and matches all the neighboring code clones in the static repository in terms of frequency of lines matched

Download Full-text

Latent semantic analysis of the FOMC statements

Review of Accounting and Finance ◽

10.1108/raf-10-2015-0149 ◽

2017 ◽

Vol 16 (2) ◽

pp. 179-217 ◽

Cited By ~ 3

Author(s):

Panagiotis Mazis ◽

Andrianos Tsekrekos

Keyword(s):

Monetary Policy ◽

Language Processing ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Policy Uncertainty ◽

Open Market ◽

Time Series Regression ◽

Content Type ◽

Main Research ◽

Treasury Market

Purpose The purpose of this paper is to analyze the content of the statements that are released by the Federal Open Market Committee (FOMC) after its meetings, identify the main textual associative patterns in the statements and examine their impact on the US treasury market. Design/methodology/approach Latent semantic analysis (LSA), a language processing technique that allows recognition of the textual associative patterns in documents, is applied to all the statements released by the FOMC between 2003 and 2014, so as to identify the main textual “themes” used by the Committee in its communication to the public. The importance of the main identified “themes” is tracked over time, before examining their (collective and individual) effect on treasury market yield volatility via time-series regression analysis. Findings We find that FOMC statements incorporate multiple, multifaceted and recurring textual themes, with six of them being able to characterize most of the communicated monetary policy in the authors’ sample period. The themes are statistically significant in explaining the variation in three-month, two-year, five-year and ten-year treasury yields, even after controlling for monetary policy uncertainty and the concurrent economic outlook. Research limitations/implications The main research implication of the authors’ study is that the LSA can successfully identify the most economically significant themes underlying the Fed’s communication, as the latter is expressed in monetary policy statements. The authors feel that the findings of the study would be strengthened if the analysis was repeated using intra-day (tick-by-tick or five-minute) data on treasury yields. Social implications The authors’ findings are consistent with the notion that the move to “increased transparency” by the Fed is important and meaningful for financial and capital markets, as suggested by the significant effect that the most important identified textual themes have on treasury yield volatility. Originality/value This paper makes a timely contribution to a fairly recent stream of research that combines specific textual and statistical techniques so as to conduct content analysis. To the best of their knowledge, the authors’ study is the first that applies the LSA to the statements released by the FOMC.

Download Full-text

Erratum to: An Efficient Clustering Technique for Unstructured Data Utilizing Latent Semantic Analysis

Advances in Computer Science and Ubiquitous Computing - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-10-7605-3_235 ◽

2018 ◽

pp. E1-E1

Author(s):

Yonghoon Kim ◽

Mokdong Chung

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Unstructured Data ◽

Clustering Technique

Download Full-text

A Dimensionality Reduction Model Applied to Documents Useful to Compliance

10.21528/cbic2021-1 ◽

2021 ◽

Author(s):

João Alberto da Silva Amaral ◽

Fernando Buarque Lima Neto

Keyword(s):

Natural Language Processing ◽

Dimensionality Reduction ◽

Language Processing ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Reduction Process ◽

Reduction Methods ◽

General Secretariat ◽

Audit Reports

This paper proposes a semantic Natural Language Processing (NLP) approach used to assist in the automated characterization of information relevant to compliance activities. In this context, the Latent Semantic Analysis (LSA) technique was used to assist in the dimensionality reduction process. The evaluated results were achieved through the submission of two databases to the model, namely: Database of Audit reports issued by the State General Secretariat of Management (SCGE – Secretaria da Controladoria-Geral do Estado, in Portuguese) of Pernambuco between the years of 2010 to 2019 and a Base of Appellate Decisions issued by the Brazilian Federal Accountability Office (TCU – Tribunal de Contas da União, in Portuguese) in 2019. The performance of two dimensionality reduction methods was evaluated: Tf-idf and LSA. To validate the results, K-means was used as a clustering technique. In addition, it was observed that the Silhouette technique helped us find the best cluster value for a given data sample. In the results, LSA associated with K-means presented the best performance in both databases, having achieved the best results in the TCU Base of Appellate Decisions.

Download Full-text

From Citizens to Decision-Makers

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch056 ◽

2020 ◽

pp. 1162-1177

Author(s):

Eya Boukchina ◽

Sehl Mellouli ◽

Emna Menif

Keyword(s):

Digital Media ◽

Information And Communication Technologies ◽

Language Processing ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Communication Technologies ◽

Decision Makers ◽

Social Media Platforms ◽

Information And Communication ◽

Processing Techniques

Citizens' participation is a form of democracy in which citizens are part of the decision-making process with regard to the development of their society. In today's emergence of Information and Communication Technologies, citizens can participate in these processes by submitting inputs through digital media such as social media platforms or dedicated websites. From these different means, a high quantity of data, of different forms (text, image, video), can be generated. This data needs to be processed in order to extract valuable data that can be used by a city's decision-makers. This paper presents natural language processing techniques to extract valuable information from comments posted by citizens. It applies the Latent Semantic Analysis on a corpus of citizens' comments to automatically identify the subjects that were raised by citizens.

Download Full-text