Machine Learning and Finance: A Review Using Latent Dirichlet Allocation Technique (LDA)

Learning Model ◽

Machine Learning Algorithms ◽

Text Data ◽

Huge Data ◽

Machine Learning Model ◽

N Gram ◽

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

Modern Probabilistic Model: Filtering Massive Data in E-learning

Iraqi Journal of Science ◽

10.24996/ijs.2021.si.1.8 ◽

2021 ◽

pp. 52-58

Author(s):

Hachem Harouni Alaoui ◽

Elkaber Hachem ◽

Cherif Ziti

Keyword(s):

Machine Learning ◽

Probabilistic Model ◽

Web Pages ◽

Massive Data ◽

Learning Session ◽

Learning Procedure ◽

E Learning ◽

So muchinformation keeps on being digitized and stored in several forms, web pages, scientific articles, books, etc. so the mission of discovering information has become more and more challenging. The requirement for new IT devices to retrieve and arrange these vastamounts of informationaregrowing step by step. Furthermore, platforms of e-learning are developing to meet the intended needsof students.The aim of this article is to utilize machine learning to determine the appropriate actions that support the learning procedure and the Latent Dirichlet Allocation (LDA) so as to find the topics contained in the connections proposed in a learning session. Ourpurpose is also to introduce a course which moves toward the student's attempts and which reduces the unimportant recommendations (Which aren’t proper to the need of the student grown-up) through the modeling algorithms of the subjects.

PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2021.104920 ◽

2021 ◽

Vol 138 ◽

pp. 104920

Author(s):

Aakansha Gupta ◽

Rahul Katarya

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Extraction Model ◽

Characterizing Artificial Intelligence Applications in Cancer Research: A Latent Dirichlet Allocation Analysis

JMIR Medical Informatics ◽

10.2196/14401 ◽

2019 ◽

Vol 7 (4) ◽

pp. e14401 ◽

Cited By ~ 4

Author(s):

Bach Xuan Tran ◽

Carl A Latkin ◽

Noha Sharafeldin ◽

Katherina Nguyen ◽

Giang Thu Vu ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Cancer Research ◽

Cancer Care ◽

Similarity Index ◽

Cancer Control ◽

Clinical Practices ◽

Patient Reported ◽

Background Artificial intelligence (AI)–based therapeutics, devices, and systems are vital innovations in cancer control; particularly, they allow for diagnosis, screening, precise estimation of survival, informing therapy selection, and scaling up treatment services in a timely manner. Objective The aim of this study was to analyze the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer research. Methods An exploratory factor analysis was conducted to identify research domains emerging from abstract contents. The Jaccard similarity index was utilized to identify the most frequently co-occurring terms. Latent Dirichlet Allocation was used for classifying papers into corresponding topics. Results From 1991 to 2018, the number of studies examining the application of AI in cancer care has grown to 3555 papers covering therapeutics, capacities, and factors associated with outcomes. Topics with the highest volume of publications include (1) machine learning, (2) comparative effectiveness evaluation of AI-assisted medical therapies, and (3) AI-based prediction. Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications, the quality of life, and functioning of patients receiving these innovations. The growing research productivity and expansion of multidisciplinary approaches are largely driven by machine learning, artificial neural networks, and AI in various clinical practices. Conclusions The research landscapes show that the development of AI in cancer care is focused on not only improving prediction in cancer screening and AI-assisted therapeutics but also on improving other corresponding areas such as precision and personalized medicine and patient-reported outcomes.

Ldagibbs: A Command for Topic Modeling in Stata Using Latent Dirichlet Allocation

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1801800107 ◽

2018 ◽

Vol 18 (1) ◽

pp. 101-117 ◽

Cited By ~ 10

Author(s):

Carlo Schwarz

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Topic Modeling ◽

Topic Model ◽

Topic Models ◽

Text Documents ◽

Text Data ◽

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Characterizing Artificial Intelligence Applications in Cancer Research using Latent Dirichlet Allocation (Preprint)

10.2196/preprints.14401 ◽

2019 ◽

Author(s):

Bach Xuan Tran ◽

Carl A. Latkin ◽

Noha Sharafeldin ◽

Katherina Nguyen ◽

Giang Thu Vu ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Cancer Research ◽

Similarity Index ◽

Cancer Control ◽

Clinical Practices ◽

Global Trends ◽

Patient Reported ◽

BACKGROUND Artificial Intelligence (AI) - based therapeutics, devices and systems are vital innovations in cancer control. OBJECTIVE This study analyzes the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer research. METHODS Exploratory factor analysis was applied to identify research domains emerging from contents of the abstracts. Jaccard’s similarity index was utilized to identify terms most frequently co-occurring with each other. Latent Dirichlet Allocation was used for classifying papers into corresponding topics. RESULTS The number of studies applying AI to cancer during 1991-2018 has been grown with 3,555 papers covering therapeutics, capacities, and factors associated with outcomes. Topics with the highest volumes of publications include 1) Machine learning, 2) Comparative Effectiveness Evaluation of AI-assisted medical therapies, 3) AI-based Prediction. Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications, the quality of life and functioning of patients receiving these innovations. The growing research productivity and expansion of multidisciplinary approaches, largely driven by machine learning, artificial neutral network, and artificial intelligence in various clinical practices. CONCLUSIONS The research landscapes show that the development of AI in cancer is focused not only on improving prediction in cancer screening and AI-assisted therapeutics, but also other corresponding areas such as Precision and Personalized Medicine and patient-reported outcomes.

2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence ◽

Sentimental Analysis based on hybrid approach of Latent Dirichlet Allocation and Machine Learning for Large-Scale of Imbalanced Twitter Data

10.1145/3446132.3446413 ◽

2020 ◽

Author(s):

Nasir Jamal ◽

Chen Xianqiao ◽

Junaid Hussain Abro ◽

Doniyor Tukhtakhunov

Keyword(s):

Machine Learning ◽

Large Scale ◽

Hybrid Approach ◽

Twitter Data ◽

Machine Learning and Finance

International Journal for Innovation Education and Research ◽

10.31686/ijier.vol9.iss4.3016 ◽

2021 ◽

Vol 9 (4) ◽

pp. 29-55

Author(s):

Ahmed Sameer El Khatib

Keyword(s):

Machine Learning ◽

Probabilistic Modelling ◽

Topic Modelling ◽

Research Topics ◽

Decision Sciences ◽

Learning Research ◽

Modelling Approach ◽

The aim of this paper is provide a first comprehensive structuring of the literature applying machine learning to finance. We use a probabilistic topic modelling approach to make sense of this diverse body of research spanning across the disciplines of finance, economics, computer sciences, and decision sciences. Through the topic modelling approach, a Latent Dirichlet Allocation Technique (LDA), we can extract the 14 coherent research topics that are the focus of the 6,148 academic articles during the years 1990-2019 analysed. We first describe and structure these topics, and then further show how the topic focus has evolved over the last two decades. Our study thus provides a structured topography for finance researchers seeking to integrate machine learning research approaches in their exploration of finance phenomena. We also showcase the benefits to finance researchers of the method of probabilistic modelling of topics for deep comprehension of a body of literature, especially when that literature has diverse multi-disciplinary actors.

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

On Privacy Protection of Latent Dirichlet Allocation Model Training

10.24963/ijcai.2019/675 ◽

2019 ◽

Cited By ~ 1

Author(s):

Fangyuan Zhao ◽

Xuebin Ren ◽

Shusen Yang ◽

Xinyu Yang

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Machine Learning Algorithms ◽

Sensitive Information ◽

Training Algorithm ◽

Allocation Model ◽

Model Training ◽

Real World Datasets ◽

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.

Exploring destination image through online reviews: an augmented mining model using latent Dirichlet allocation combined with probabilistic hesitant fuzzy algorithm

Kybernetes ◽

10.1108/k-07-2021-0584 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yuyan Luo ◽

Tao Tong ◽

Xiaoxu Zhang ◽

Zheng Yang ◽

Ling Li

Keyword(s):

Machine Learning ◽

Decision Making ◽

Online Reviews ◽

Small Sample ◽

Destination Image ◽

Tourism Destination ◽

Fuzzy Algorithm ◽

Content Type ◽

PurposeIn the era of information overload, the density of tourism information and the increasingly sophisticated information needs of consumers have created information confusion for tourists and scenic-area managers. The study aims to help scenic-area managers determine the strengths and weaknesses in the development process of scenic areas and to solve the practical problem of tourists' difficulty in quickly and accurately obtaining the destination image of a scenic area and finding a scenic area that meets their needs.Design/methodology/approachThe study uses a variety of machine learning methods, namely, the latent Dirichlet allocation (LDA) theme extraction model, term frequency-inverse document frequency (TF-IDF) weighting method and sentiment analysis. This work also incorporates probabilistic hesitant fuzzy algorithm (PHFA) in multi-attribute decision-making to form an enhanced tourism destination image mining and analysis model based on visitor expression information. The model is intended to help managers and visitors identify the strengths and weaknesses in the development of scenic areas. Jiuzhaigou is used as an example for empirical analysis.FindingsIn the study, a complete model for the mining analysis of tourism destination image was constructed, and 24,222 online reviews on Jiuzhaigou, China were analyzed in text. The results revealed a total of 10 attributes and 100 attribute elements. From the identified attributes, three negative attributes were identified, namely, crowdedness, tourism cost and accommodation environment. The study provides suggestions for tourists to select attractions and offers recommendations and improvement measures for Jiuzhaigou in terms of crowd control and post-disaster reconstruction.Originality/valuePrevious research in this area has used small sample data for qualitative analysis. Thus, the current study fills this gap in the literature by proposing a machine learning method that incorporates PHFA through the combination of the ideas of management and multi-attribute decision theory. In addition, the study considers visitors' emotions and thematic preferences from the perspective of their expressed information, based on which the tourism destination image is analyzed. Optimization strategies are provided to help managers of scenic spots in their decision-making.