Machine Learning and Finance: A Review Using Latent Dirichlet Allocation Technique (LDA)

2020 ◽  
Author(s):  
Ahmed Sameer El Khatib
Author(s):  
Jia Luo ◽  
Dongwen Yu ◽  
Zong Dai

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.


2021 ◽  
pp. 52-58
Author(s):  
Hachem Harouni Alaoui ◽  
Elkaber Hachem ◽  
Cherif Ziti

So muchinformation keeps on being digitized and stored in several forms, web pages, scientific articles, books, etc. so the mission of discovering information has become more and more challenging. The requirement for new IT devices to retrieve and arrange these vastamounts of informationaregrowing step by step. Furthermore, platforms of e-learning are developing to meet the intended needsof students.The aim of this article is to utilize machine learning to determine the appropriate actions that support the learning procedure and the Latent Dirichlet Allocation (LDA) so as to find the topics contained in the connections proposed in a learning session. Ourpurpose is also to introduce a course which moves toward the student's attempts and which reduces the unimportant recommendations (Which aren’t proper to the need of the student grown-up) through the modeling algorithms of the subjects.


10.2196/14401 ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. e14401 ◽  
Author(s):  
Bach Xuan Tran ◽  
Carl A Latkin ◽  
Noha Sharafeldin ◽  
Katherina Nguyen ◽  
Giang Thu Vu ◽  
...  

Background Artificial intelligence (AI)–based therapeutics, devices, and systems are vital innovations in cancer control; particularly, they allow for diagnosis, screening, precise estimation of survival, informing therapy selection, and scaling up treatment services in a timely manner. Objective The aim of this study was to analyze the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer research. Methods An exploratory factor analysis was conducted to identify research domains emerging from abstract contents. The Jaccard similarity index was utilized to identify the most frequently co-occurring terms. Latent Dirichlet Allocation was used for classifying papers into corresponding topics. Results From 1991 to 2018, the number of studies examining the application of AI in cancer care has grown to 3555 papers covering therapeutics, capacities, and factors associated with outcomes. Topics with the highest volume of publications include (1) machine learning, (2) comparative effectiveness evaluation of AI-assisted medical therapies, and (3) AI-based prediction. Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications, the quality of life, and functioning of patients receiving these innovations. The growing research productivity and expansion of multidisciplinary approaches are largely driven by machine learning, artificial neural networks, and AI in various clinical practices. Conclusions The research landscapes show that the development of AI in cancer care is focused on not only improving prediction in cancer screening and AI-assisted therapeutics but also on improving other corresponding areas such as precision and personalized medicine and patient-reported outcomes.


Author(s):  
Carlo Schwarz

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.


2019 ◽  
Author(s):  
Bach Xuan Tran ◽  
Carl A. Latkin ◽  
Noha Sharafeldin ◽  
Katherina Nguyen ◽  
Giang Thu Vu ◽  
...  

BACKGROUND Artificial Intelligence (AI) - based therapeutics, devices and systems are vital innovations in cancer control. OBJECTIVE This study analyzes the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer research. METHODS Exploratory factor analysis was applied to identify research domains emerging from contents of the abstracts. Jaccard’s similarity index was utilized to identify terms most frequently co-occurring with each other. Latent Dirichlet Allocation was used for classifying papers into corresponding topics. RESULTS The number of studies applying AI to cancer during 1991-2018 has been grown with 3,555 papers covering therapeutics, capacities, and factors associated with outcomes. Topics with the highest volumes of publications include 1) Machine learning, 2) Comparative Effectiveness Evaluation of AI-assisted medical therapies, 3) AI-based Prediction. Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications, the quality of life and functioning of patients receiving these innovations. The growing research productivity and expansion of multidisciplinary approaches, largely driven by machine learning, artificial neutral network, and artificial intelligence in various clinical practices. CONCLUSIONS The research landscapes show that the development of AI in cancer is focused not only on improving prediction in cancer screening and AI-assisted therapeutics, but also other corresponding areas such as Precision and Personalized Medicine and patient-reported outcomes.


Author(s):  
Ahmed Sameer El Khatib

The aim of this paper is provide a first comprehensive structuring of the literature applying machine learning to finance. We use a probabilistic topic modelling approach to make sense of this diverse body of research spanning across the disciplines of finance, economics, computer sciences, and decision sciences. Through the topic modelling approach, a Latent Dirichlet Allocation Technique (LDA), we can extract the 14 coherent research topics that are the focus of the 6,148 academic articles during the years 1990-2019 analysed. We first describe and structure these topics, and then further show how the topic focus has evolved over the last two decades. Our study thus provides a structured topography for finance researchers seeking to integrate machine learning research approaches in their exploration of finance phenomena. We also showcase the benefits to finance researchers of the method of probabilistic modelling of topics for deep comprehension of a body of literature, especially when that literature has diverse multi-disciplinary actors.


Author(s):  
Fangyuan Zhao ◽  
Xuebin Ren ◽  
Shusen Yang ◽  
Xinyu Yang

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.


Kybernetes ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yuyan Luo ◽  
Tao Tong ◽  
Xiaoxu Zhang ◽  
Zheng Yang ◽  
Ling Li

PurposeIn the era of information overload, the density of tourism information and the increasingly sophisticated information needs of consumers have created information confusion for tourists and scenic-area managers. The study aims to help scenic-area managers determine the strengths and weaknesses in the development process of scenic areas and to solve the practical problem of tourists' difficulty in quickly and accurately obtaining the destination image of a scenic area and finding a scenic area that meets their needs.Design/methodology/approachThe study uses a variety of machine learning methods, namely, the latent Dirichlet allocation (LDA) theme extraction model, term frequency-inverse document frequency (TF-IDF) weighting method and sentiment analysis. This work also incorporates probabilistic hesitant fuzzy algorithm (PHFA) in multi-attribute decision-making to form an enhanced tourism destination image mining and analysis model based on visitor expression information. The model is intended to help managers and visitors identify the strengths and weaknesses in the development of scenic areas. Jiuzhaigou is used as an example for empirical analysis.FindingsIn the study, a complete model for the mining analysis of tourism destination image was constructed, and 24,222 online reviews on Jiuzhaigou, China were analyzed in text. The results revealed a total of 10 attributes and 100 attribute elements. From the identified attributes, three negative attributes were identified, namely, crowdedness, tourism cost and accommodation environment. The study provides suggestions for tourists to select attractions and offers recommendations and improvement measures for Jiuzhaigou in terms of crowd control and post-disaster reconstruction.Originality/valuePrevious research in this area has used small sample data for qualitative analysis. Thus, the current study fills this gap in the literature by proposing a machine learning method that incorporates PHFA through the combination of the ideas of management and multi-attribute decision theory. In addition, the study considers visitors' emotions and thematic preferences from the perspective of their expressed information, based on which the tourism destination image is analyzed. Optimization strategies are provided to help managers of scenic spots in their decision-making.


Sign in / Sign up

Export Citation Format

Share Document