Exploring the Feasibility and Accuracy of Latent Semantic Analysis Based Text Mining Techniques to Detect Similarity between Patent Documents and Scientific Publications

Author(s):  
Tom Magerman ◽  
Bart Van Looy ◽  
Xiaoyan Song
2014 ◽  
Vol 3 (2(69)) ◽  
pp. 36
Author(s):  
Андрей Сергеевич Коляда ◽  
Виктор Дмитриевич Гогунский

Author(s):  
Anne Kao

Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI), when applied to information retrieval, has been a major analysis approach in text mining. It is an extension of the vector space method in information retrieval, representing documents as numerical vectors but using a more sophisticated mathematical approach to characterize the essential features of the documents and reduce the number of features in the search space. This chapter summarizes several major approaches to this dimensionality reduction, each of which has strengths and weaknesses, and it describes recent breakthroughs and advances. It shows how the constructs and products of LSA applications can be made user-interpretable and reviews applications of LSA beyond information retrieval, in particular, to text information visualization.


Author(s):  
Anne Kao ◽  
Steve Poteet ◽  
Jason Wu ◽  
William Ferng ◽  
Rod Tjoelker ◽  
...  

Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI), when applied to information retrieval, has been a major analysis approach in text mining. It is an extension of the vector space method in information retrieval, representing documents as numerical vectors but using a more sophisticated mathematical approach to characterize the essential features of the documents and reduce the number of features in the search space. This chapter summarizes several major approaches to this dimensionality reduction, each of which has strengths and weaknesses, and it describes recent breakthroughs and advances. It shows how the constructs and products of LSA applications can be made user-interpretable and reviews applications of LSA beyond information retrieval, in particular, to text information visualization. While the major application of LSA is for text mining, it is also highly applicable to cross-language information retrieval, Web mining, and analysis of text transcribed from speech and textual information in video.


2012 ◽  
pp. 174-190
Author(s):  
Michael W. Berry ◽  
Reed Esau ◽  
Bruce Kiefer

Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.


2021 ◽  
Vol 27 (1) ◽  
pp. 88-112
Author(s):  
Ekaterina I. DYUDIKOVA ◽  
Natal'ya N. KUNITSYNA

Subject. The digital economy emerged as a new generation of financial instruments, such as cryptocurrencies, were invented and proliferated, which were able to counteract global challenges. Those who oppose to the legitimization of digital assets and their integration into the payment infrastructure do not point out material advantages and support drastic transformations of the existing financial system. However, assuming very risky digital payments, the scope of cruptocurrency still grows. The article presents the outcome of intellectual text analysis of feedback left by users of electronic banking and digital cryptocurrency systems. Doing so, we determined to what extent they are satisfied with various systems. Objectives. The study is intended to provide the theoretical and methodological rationale for, and practically test the model that determines key themes in analyzable non-structured big data and allows to automatically evaluate the satisfaction of users with various payment systems. Methods. We resorted to the formal logic, systems approach, methods of comparative analysis, text mining and latent semantic analysis. Results. We analyzed reviews uploaded to www.banki.ru and www.otzovik.ru through parsing, stop word elimination, stemming, probabilistic thematic modeling based on the latent semantic analysis. We assessed to what extent users are satisfied with various systems by examining their reviews through the text tone analysis, the k-nearest neighbor algorithm and automated scoring of unrated reviews. Conclusions and Relevance. Text mining of unstructured big data shows that digital platforms, notwithstanding their infancy and high risks, already mostly satisfy social needs as compared to electronic banking systems, which determines the reasonableness of integrating them into the payment system to unlock their potential.


Symmetry ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 868
Author(s):  
Sarfaraz Hashemkhani Zolfani ◽  
Arman Derakhti

In this study, a new way of criteria selection and a weighting system will be presented in a multi-disciplinary framework. Weighting criteria in Multi-Attribute Decision Making (MADM) has been developing as the most attractive section in the field. Although many ideas have been developed during the last decades, there is no such great diversity that can be mentioned in the literature. This study is looking from outside the box and is presenting something totally new by using big data and text mining in a Prospective MADM outline. PMADM is a hybrid interconnected concept between the Futures Studies and MADM fields. Text mining, which is known as a useful tool in Futures Studies, is applied to create a widespread pilot system for weighting and criteria selection in the PMADM outline. Latent Semantic Analysis (LSA), as an influential method inside the general concept of text mining, is applied to show how a data warehouse’s output, which in this case is Scopus, can reach the final criteria selection and weighting of the criteria.


2016 ◽  
Vol 32 (1) ◽  
pp. 67-86 ◽  
Author(s):  
Jian Guan ◽  
Alan S. Levitan ◽  
Sandeep Goyal

ABSTRACTBig Data presents a tremendous challenge for the accounting profession today. This challenge is characterized by, among other things, the explosive growth of unstructured data, such as text. In recent years, new text-mining methods have emerged to turn unstructured textual data into actionable information. A critical role of accounting information systems (AIS) research is to help the accounting profession assess and utilize these methodologies in an accounting context. This paper introduces the latent semantic analysis (LSA), a text-mining approach that discovers latent structures in unstructured textual data, to the AIS research community. An LSA-based approach is used to analyze AIS research as published in the Journal of Information Systems (JIS) over the last 30 years. JIS research serves as an appropriate domain of analysis because of a perceived need to contextualize the scope of AIS research. The research themes and trends resulting from this analysis contribute to a better understanding of this identity.


Sign in / Sign up

Export Citation Format

Share Document