Recommending patents based on latent topics

Author(s):  
Ralf Krestel ◽  
Padhraic Smyth
Keyword(s):  
2021 ◽  
pp. 016555152110077
Author(s):  
Sulong Zhou ◽  
Pengyu Kan ◽  
Qunying Huang ◽  
Janet Silbernagel

Natural disasters cause significant damage, casualties and economical losses. Twitter has been used to support prompt disaster response and management because people tend to communicate and spread information on public social media platforms during disaster events. To retrieve real-time situational awareness (SA) information from tweets, the most effective way to mine text is using natural language processing (NLP). Among the advanced NLP models, the supervised approach can classify tweets into different categories to gain insight and leverage useful SA information from social media data. However, high-performing supervised models require domain knowledge to specify categories and involve costly labelling tasks. This research proposes a guided latent Dirichlet allocation (LDA) workflow to investigate temporal latent topics from tweets during a recent disaster event, the 2020 Hurricane Laura. With integration of prior knowledge, a coherence model, LDA topics visualisation and validation from official reports, our guided approach reveals that most tweets contain several latent topics during the 10-day period of Hurricane Laura. This result indicates that state-of-the-art supervised models have not fully utilised tweet information because they only assign each tweet a single label. In contrast, our model can not only identify emerging topics during different disaster events but also provides multilabel references to the classification schema. In addition, our results can help to quickly identify and extract SA information to responders, stakeholders and the general public so that they can adopt timely responsive strategies and wisely allocate resource during Hurricane events.


2021 ◽  
Vol 11 (6) ◽  
pp. 303
Author(s):  
Seungsu Paek ◽  
Taehun Um ◽  
Namhyoung Kim

Recently, there has been growing educational interest in competency. Global organizations, such as the United Nations (UN) and Organization for Economic Co-operation and Development (OECD), which are leading the discourse on education reform, are undertaking the lead in spreading awareness regarding competency education. Since 2015, the number of published articles on competency education has been rapidly increasing. This paper aims to provide significant implications for creating a sustainable future of competency education. A topic modeling method was used to empirically analyze latent topics and international research trends in 26,532 articles published on competency-based education (CBE). As a result of the analysis, 15 topics were derived, including “approach to competency development.” In addition, five topics including “learning skills” and “teacher training” were found to be hot topics with the increasing article publication. The rapidly changing modern society is calling for a transformation in education. We hope that the results of this study paves the way for further research exploring new directions for education, such as competency education.


2014 ◽  
Vol 4 (1) ◽  
pp. 29-45 ◽  
Author(s):  
Rami Ayadi ◽  
Mohsen Maraoui ◽  
Mounir Zrigui

In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe our approach for analyzing and preprocessing Arabic text then we describe the stemming process. Finally, the latent model (LDA) is adapted to extract Arabic latent topics, the authors extracted significant topics of all texts, each theme is described by a particular distribution of descriptors then each text is represented on the vectors of these topics. The experiment of classification is conducted on in house corpus; latent topics are learned with LDA for different topic numbers K (25, 50, 75, and 100) then the authors compare this result with classification in the full words space. The results show that performances, in terms of precision, recall and f-measure, of classification in the reduced topics space outperform classification in full words space and when using LSI reduction.


2021 ◽  
Author(s):  
Faizah Faizah ◽  
Bor-Shen Lin

BACKGROUND The World Health Organization (WHO) declared COVID-19 as a global pandemic on January 30, 2020. However, the pandemic has not been over yet. Furthermore, in the first quartal of 2021, some countries face the third wave of the pandemic. During the difficult time, the development of the vaccines for COVID-19 accelerates rapidly. Understanding the public perception of the COVID-19 Vaccine according to the data collected from social media can widen the perspective on the state of the global pandemic OBJECTIVE This study explores and analyzes the latent topic on COVID-19 Vaccine Tweet posted by individuals from various countries by using two-stage topic modeling. METHODS A two-stage analysis in topic modeling was proposed to investigating people’s reactions in five countries. The first stage is Latent Dirichlet Allocation that produces the latent topics with the corresponding term distributions that facilitate the investigators to understand the main issues or opinions. The second stage then performs agglomerative clustering on the latent topics based on Hellinger distance, which merges close topics hierarchically into topic clusters to visualize those topics in either tree or graph views. RESULTS In general, the topic discussion regarding the COVID-19 Vaccine in five countries is similar. Topic themes such as "first vaccine" and & "vaccine effect" dominate the public discussion. The remarkable point is that people in some countries have some topic themes, such as "politician opinion" and " stay home" in Canada, "emergency" in India, and & "blood clots" in the United Kingdom. The analysis also shows the most popular COVID-19 Vaccine, which is gaining more public interest. CONCLUSIONS With LDA and Hierarchical clustering, two-stage topic modeling is powerful for visualizing the latent topics and understanding the public perception regarding the COVID-19 Vaccine.


2021 ◽  
Author(s):  
Samuel Duraivel ◽  
Lavanya R

Abstract This research paper explores the underlying factors that contribute toward vaccine hesitancy, resistance, and refusal. Using Latent Dirichlet Allocation (LDA), an unsupervised generative-probabilistic model, we generated latent topics from user generated Reddit corpora on reasons for Vaccine hesitancy. Although we hoped to explore the grounds for vaccine hesitancy across the globe, our findings suggest that the corpus used for analysis had been generated by users living predominantly in the United States.Observation of the topics generated by the LDA model led to the discovery of the following latent factors: (i) fear of risks and side effects, (ii) lack of trust in policymakers, (iii) related to religious belief, (iv) related to mass surveillance theories, (v) perception of vaccination as a precedence to totalitarianism, (vi) racial background pertaining to retrospective events of racial injustice, such as selective sterilization, (vii) depopulation agenda fueled by theories affiliated to Global warming and extinction rebellion, (viii) and perception of vaccination as a campaign to quell immigrant population growth, fueled by reports of coerced sterilization of immigrants in the ICE detention.


2021 ◽  
Author(s):  
Dominic Ligot ◽  
Frances Claire Tayco ◽  
Mark Toledo ◽  
Carlos Nazareno ◽  
Denise Brennan-Rieder

Objectives. Infodemics of false information on social media is a growing societal problem, aggravated by the occurrence of the COVID-19 pandemic. The development of infodemics has characteristic resemblances to epidemics of infectious diseases. This paper presents several methodologies which aim to measure the extent and development of infodemics through the lens of epidemiology.Methods. Time varying R was used as a measure for the infectiousness of the infodemic, topic modeling was used to create topic clouds and topic similarity heat maps, while network analysis was used to create directed and undirected graphs to identify super-spreader and multiple carrier communities on social media.Results. Forty-two (42) latent topics were discovered. Reproductive trends for a specific topic were observed to have significantly higher peaks (Rt 4-5) than general misinformation (Rt 1-3). From a sample of social media misinformation posts, a total of 385 groups and 804 connections were found within the network, with the largest group having 1,643 shares and 1,063,579 interactions over a 12 month period.Conclusions. These approaches enable the measurement of the infectiousness of an infodemic, comparative analysis of infodemic topics, and identification of likely super-spreaders and multiple carriers on social media. The results of these analyses can form the basis for taking action to stem an ongoing spread of misinformation on social media and mitigate against future infodemics. The methods are not confined to health misinformation and may be applied to other infodemics, such as conspiracy theories, political disinformation, and climate change denial.


Author(s):  
Dat Quoc Nguyen ◽  
Richard Billingsley ◽  
Lan Du ◽  
Mark Johnson

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Yue Li ◽  
Pratheeksha Nair ◽  
Xing Han Lu ◽  
Zhi Wen ◽  
Yuening Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document