Topic prediction and knowledge discovery based on integrated topic modeling and deep neural networks approaches

2021 ◽  
pp. 1-17
Author(s):  
Zeinab Shahbazi ◽  
Yung-Cheol Byun

Understanding the real-world short texts become an essential task in the recent research area. The document deduction analysis and latent coherent topic named as the important aspect of this process. Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) are suggested to model huge information and documents. This type of contexts’ main problem is the information limitation, words relationship, sparsity, and knowledge extraction. The knowledge discovery and machine learning techniques integrated with topic modeling were proposed to overcome this issue. The knowledge discovery was applied based on the hidden information extraction to increase the suitable dataset for further analysis. The integration of machine learning techniques, Artificial Neural Network (ANN) and Long Short-Term (LSTM) are applied to anticipate topic movements. LSTM layers are fed with latent topic distribution learned from the pre-trained Latent Dirichlet Allocation (LDA) model. We demonstrate general information from different techniques applied in short text topic modeling. We proposed three categories based on Dirichlet multinomial mixture, global word co-occurrences, and self-aggregation using representative design and analysis of all categories’ performance in different tasks. Finally, the proposed system evaluates with state-of-art methods on real-world datasets, comprises them with long document topic modeling algorithms, and creates a classification framework that considers further knowledge and represents it in the machine learning pipeline.

2019 ◽  
Vol 8 (2) ◽  
pp. 4833-4837

Technology is growing day by day and the influence of them on our day-to-day life is reaching new heights in the digitized world. Most of the people are prone to the use of social media and even minute details are getting posted every second. Some even go to the extent of posting even suicide related issues. This paper addresses the issue of suicide and is predicting the suicide issues on social media and their semantic analysis. With the help of Machine Learning techniques and semantic analysis of sentiments the prediction and classification of suicide is done. The model of approach is a four-tier approach, which is very beneficial as it uses the twitter4J data by using weka tool and implementing it on WordNet. The precision and accuracy aspects are verified as the parameters for the performance efficiency of the procedure. We also give a solution for the lack of resources regarding the terminological resources by providing a phase for the generation of records of vocabulary also.


10.2196/23957 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e23957
Author(s):  
Chengda Zheng ◽  
Jia Xue ◽  
Yumin Sun ◽  
Tingshao Zhu

Background During the COVID-19 pandemic in Canada, Prime Minister Justin Trudeau provided updates on the novel coronavirus and the government’s responses to the pandemic in his daily briefings from March 13 to May 22, 2020, delivered on the official Canadian Broadcasting Corporation (CBC) YouTube channel. Objective The aim of this study was to examine comments on Canadian Prime Minister Trudeau’s COVID-19 daily briefings by YouTube users and track these comments to extract the changing dynamics of the opinions and concerns of the public over time. Methods We used machine learning techniques to longitudinally analyze a total of 46,732 English YouTube comments that were retrieved from 57 videos of Prime Minister Trudeau’s COVID-19 daily briefings from March 13 to May 22, 2020. A natural language processing model, latent Dirichlet allocation, was used to choose salient topics among the sampled comments for each of the 57 videos. Thematic analysis was used to classify and summarize these salient topics into different prominent themes. Results We found 11 prominent themes, including strict border measures, public responses to Prime Minister Trudeau’s policies, essential work and frontline workers, individuals’ financial challenges, rental and mortgage subsidies, quarantine, government financial aid for enterprises and individuals, personal protective equipment, Canada and China’s relationship, vaccines, and reopening. Conclusions This study is the first to longitudinally investigate public discourse and concerns related to Prime Minister Trudeau’s daily COVID-19 briefings in Canada. This study contributes to establishing a real-time feedback loop between the public and public health officials on social media. Hearing and reacting to real concerns from the public can enhance trust between the government and the public to prepare for future health emergencies.


2020 ◽  
Vol 12 (6) ◽  
pp. 2544
Author(s):  
Alice Consilvio ◽  
José Solís-Hernández ◽  
Noemi Jiménez-Redondo ◽  
Paolo Sanetti ◽  
Federico Papa ◽  
...  

The objective of this study is to show the applicability of machine learning and simulative approaches to the development of decision support systems for railway asset management. These techniques are applied within the generic framework developed and tested within the In2Smart project. The framework is composed by different building blocks, in order to show the complete process from data collection and knowledge extraction to the real-world decisions. The application of the framework to two different real-world case studies is described: the first case study deals with strategic earthworks asset management, while the second case study considers the tactical and operational planning of track circuits’ maintenance. Although different methodologies are applied and different planning levels are considered, both the case studies follow the same general framework, demonstrating the generality of the approach. The potentiality of combining machine learning techniques with simulative approaches to replicate real processes is shown, evaluating the key performance indicators employed within the considered asset management process. Finally, the results of the validation are reported as well as the developed human–machine interfaces for output visualization.


2018 ◽  
Vol 30 (11) ◽  
pp. 3386-3411 ◽  
Author(s):  
Eunhye (Olivia) Park ◽  
Bongsug Chae ◽  
Junehee Kwon

Purpose This paper aims to identify the intellectual structure of four leading hospitality journals over 40 years by applying mixed-method approach, using both machine learning and traditional statistical analyses. Design/methodology/approach Abstracts from all 4,139 articles published in four top hospitality journals were analyzed using the structured topic modeling and inferential statistics. Topic correlation and community detection were applied to identify strengths of correlations and sub-groups of topics. Trend visualization and regression analysis were used to quantify the effects of the metadata (i.e. year of publication and journal) on topic proportions. Findings The authors found 50 topics and eight subgroups in the hospitality journals. Different evolutionary patterns in topic popularity were demonstrated, thereby providing the insights for popular research topics over time. The significant differences in topical proportions were found across the four leading hospitality journals, suggesting different foci in research topics in each journal. Research limitations/implications Combining machine learning techniques with traditional statistics demonstrated potential for discovering valuable insights from big text data in hospitality and tourism research contexts. The findings of this study may serve as a guide to understand the trends in the research field as well as the progress of specific areas or subfields. Originality/value It is the first attempt to apply topic modeling to academic publications and explore the effects of article metadata with the hospitality literature.


2017 ◽  
Vol 7 (1.1) ◽  
pp. 143 ◽  
Author(s):  
J. Deepika ◽  
T. Senthil ◽  
C. Rajan ◽  
A. Surendar

With the greater development of technology and automation human history is predominantly updated. The technology movement shifted from large mainframes to PCs to cloud when computing the available data for a larger period. This has happened only due to the advent of many tools and practices, that elevated the next generation in computing. A large number of techniques has been developed so far to automate such computing. Research dragged towards training the computers to behave similar to human intelligence. Here the diversity of machine learning came into play for knowledge discovery. Machine Learning (ML) is applied in many areas such as medical, marketing, telecommunications, and stock, health care and so on. This paper presents reviews about machine learning algorithm foundations, its types and flavors together with R code and Python scripts possibly for each machine learning techniques.  


Author(s):  
Carlo Schwarz

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.


Author(s):  
Yining Xu ◽  
Xinran Cui ◽  
Yadong Wang

Tumor metastasis is the major cause of mortality from cancer. From this perspective, detecting cancer gene expression and transcriptome changes is important for exploring tumor metastasis molecular mechanisms and cellular events. Precisely estimating a patient’s cancer state and prognosis is the key challenge to develop a patient’s therapeutic schedule. In the recent years, a variety of machine learning techniques widely contributed to analyzing real-world gene expression data and predicting tumor outcomes. In this area, data mining and machine learning techniques have widely contributed to gene expression data analysis by supplying computational models to support decision-making on real-world data. Nevertheless, limitation of real-world data extremely restricted model predictive performance, and the complexity of data makes it difficult to extract vital features. Besides these, the efficacy of standard machine learning pipelines is far from being satisfactory despite the fact that diverse feature selection strategy had been applied. To address these problems, we developed directed relation-graph convolutional network to provide an advanced feature extraction strategy. We first constructed gene regulation network and extracted gene expression features based on relational graph convolutional network method. The high-dimensional features of each sample were regarded as an image pixel, and convolutional neural network was implemented to predict the risk of metastasis for each patient. Ten cross-validations on 1,779 cases from The Cancer Genome Atlas show that our model’s performance (area under the curve, AUC = 0.837; area under precision recall curve, AUPRC = 0.717) outstands that of an existing network-based method (AUC = 0.707, AUPRC = 0.555).


Author(s):  
R. Derbanosov ◽  
◽  
M. Bakhanova ◽  
◽  

Probabilistic topic modeling is a tool for statistical text analysis that can give us information about the inner structure of a large corpus of documents. The most popular models—Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation—produce topics in a form of discrete distributions over the set of all words of the corpus. They build topics using an iterative algorithm that starts from some random initialization and optimizes a loss function. One of the main problems of topic modeling is sensitivity to random initialization that means producing significantly different solutions from different initial points. Several studies showed that side information about documents may improve the overall quality of a topic model. In this paper, we consider the use of additional information in the context of the stability problem. We represent auxiliary information as an additional modality and use BigARTM library in order to perform experiments on several text collections. We show that using side information as an additional modality improves topics stability without significant quality loss of the model.


Sign in / Sign up

Export Citation Format

Share Document