scholarly journals Text data analysis using Latent Dirichlet Allocation: an application to FOMC transcripts

2020 ◽  
Vol 28 (1) ◽  
pp. 38-42
Author(s):  
Hali Edison ◽  
Hector Carcel
Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1843-1851 ◽  
Author(s):  
Anping Zhao ◽  
Suresh Manandhar ◽  
Lei Yu

As an important enabler in achieving the maximum potential of text data analysis, topic relationship dependency structure discovery is employed to effectively support the advanced text data analysis intelligent application. The proposed framework combines an analysis approach of complex network and the Latent Dirichlet Allocation (LDA) model for topic relationship network discovery. The approach is to identify topics of the text data based on the LDA and to discover the graphical semantic structure of the intrinsic association dependency between topics. This not only exploits the association dependency between topics but also leverages a series of upper-level semantic topics covered by the text data. The results of evaluation and experimental analysis show that the proposed method is effective and feasible. The results of the proposed work imply that the topics and relationships between them can be detected by this approach. It also provides complete semantic interpretation.


Author(s):  
Jia Luo ◽  
Dongwen Yu ◽  
Zong Dai

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.


Author(s):  
Imad Rahal ◽  
Baoying Wang ◽  
James Schnepf

Since the invention of the printing press, text has been the predominate mode for collecting, storing and disseminating a vast, rich range of information. With the unprecedented increase of electronic storage and dissemination, document collections have grown rapidly, increasing the need to manage and analyze this form of data in spite of its unstructured or semistructured form. Text-data analysis (Hearst, 1999) has emerged as an interdisciplinary research area forming a junction of a number of older fields like machine learning, natural language processing, and information retrieval (Grobelnik, Mladenic, & Milic-Frayling, 2000). It is sometimes viewed as an adapted form of a very similar research field that has also emerged recently, namely, data mining, which focuses primarily on structured data mostly represented in relational tables or multidimensional cubes. This article provides an overview of the various research directions in text-data analysis. After the “Introduction,” the “Background” section provides a description of a ubiquitous text-data representation model along with preprocessing steps employed for achieving better text-data representations and applications. The focal section, “Text-Data Analysis,” presents a detailed treatment of various text-data analysis subprocesses such as information extraction, information retrieval and information filtering, document clustering and document categorization. The article closes with a “Future Trends” section followed by a “Conclusion” section.


Sign in / Sign up

Export Citation Format

Share Document