Text data analysis using Latent Dirichlet Allocation: an application to FOMC transcripts

As an important enabler in achieving the maximum potential of text data analysis, topic relationship dependency structure discovery is employed to effectively support the advanced text data analysis intelligent application. The proposed framework combines an analysis approach of complex network and the Latent Dirichlet Allocation (LDA) model for topic relationship network discovery. The approach is to identify topics of the text data based on the LDA and to discover the graphical semantic structure of the intrinsic association dependency between topics. This not only exploits the association dependency between topics but also leverages a series of upper-level semantic topics covered by the text data. The results of evaluation and experimental analysis show that the proposed method is effective and feasible. The results of the proposed work imply that the topics and relationships between them can be detected by this approach. It also provides complete semantic interpretation.

Download Full-text

A Comparative Analysis of Climate Change and Green Policy Issues: Focusing on Text Data Analysis for Each Period of Korean Government

Journal of Environmental Policy and Administration ◽

10.15301/jepa.2021.29.3.1 ◽

2021 ◽

Vol 29 (3) ◽

pp. 1-47

Author(s):

Cheon-hwan Lee ◽

Hansu Hwang ◽

SeJin An ◽

Eunchang Lee

Keyword(s):

Climate Change ◽

Data Analysis ◽

Comparative Analysis ◽

Text Data ◽

Policy Issues ◽

Green Policy ◽

Korean Government ◽

Text Data Analysis

Download Full-text

A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.2.3811 ◽

2020 ◽

Vol 15 (2) ◽

Author(s):

Jia Luo ◽

Dongwen Yu ◽

Zong Dai

Keyword(s):

Machine Learning ◽

Fuzzy Clustering ◽

Latent Dirichlet Allocation ◽

Learning Model ◽

Machine Learning Algorithms ◽

Text Data ◽

Huge Data ◽

Machine Learning Model ◽

N Gram ◽

Dirichlet Allocation

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

Download Full-text

CORPUS-BASED CONCEPTUAL COMPONENTIAL ANALYSIS OF THE RELIGIOUS TEXT DATA ANALYSIS. PART I.

DEVELOPMENT OF PHILOLOGY AND LINGUISTICS AT THE MODERN HISTORICAL PERIOD ◽

10.36059/978-966-397-146-9/84-98 ◽

2019 ◽

pp. 84-98

Author(s):

N. M. Popovych ◽

Keyword(s):

Data Analysis ◽

Text Data ◽

Religious Text ◽

Componential Analysis ◽

Text Data Analysis

Download Full-text

A Primer on Text-Data Analysis

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch496 ◽

2011 ◽

pp. 3111-3118 ◽

Cited By ~ 1

Author(s):

Imad Rahal ◽

Baoying Wang ◽

James Schnepf

Keyword(s):

Information Retrieval ◽

Data Analysis ◽

Language Processing ◽

Information Filtering ◽

Data Representation ◽

Research Area ◽

Research Field ◽

Text Data ◽

Document Collections ◽

Text Data Analysis

Since the invention of the printing press, text has been the predominate mode for collecting, storing and disseminating a vast, rich range of information. With the unprecedented increase of electronic storage and dissemination, document collections have grown rapidly, increasing the need to manage and analyze this form of data in spite of its unstructured or semistructured form. Text-data analysis (Hearst, 1999) has emerged as an interdisciplinary research area forming a junction of a number of older fields like machine learning, natural language processing, and information retrieval (Grobelnik, Mladenic, & Milic-Frayling, 2000). It is sometimes viewed as an adapted form of a very similar research field that has also emerged recently, namely, data mining, which focuses primarily on structured data mostly represented in relational tables or multidimensional cubes. This article provides an overview of the various research directions in text-data analysis. After the “Introduction,” the “Background” section provides a description of a ubiquitous text-data representation model along with preprocessing steps employed for achieving better text-data representations and applications. The focal section, “Text-Data Analysis,” presents a detailed treatment of various text-data analysis subprocesses such as information extraction, information retrieval and information filtering, document clustering and document categorization. The article closes with a “Future Trends” section followed by a “Conclusion” section.

Download Full-text