Author–Subject–Topic model for reviewer recommendation

2018 ◽  
Vol 45 (4) ◽  
pp. 554-570 ◽  
Author(s):  
Jian Jin ◽  
Qian Geng ◽  
Haikun Mou ◽  
Chong Chen

Interdisciplinary studies are becoming increasingly popular, and research domains of many experts are becoming diverse. This phenomenon brings difficulty in recommending experts to review interdisciplinary submissions. In this study, an Author–Subject–Topic (AST) model is proposed with two versions. In the model, reviewers’ subject information is embedded to analyse topic distributions of submissions and reviewers’ publications. The major difference between the AST and Author–Topic models lies in the introduction of a ‘Subject’ layer, which supervises the generation of hierarchical topics and allows sharing of subjects among authors. To evaluate the performance of the AST model, papers in Information System and Management (a typical interdisciplinary domain) in a famous Chinese academic library are investigated. Comparative experiments are conducted, which show the effectiveness of the AST model in topic distribution analysis and reviewer recommendation for interdisciplinary studies.

2019 ◽  
Vol 26 (5) ◽  
pp. 531-549
Author(s):  
Chuan Wu ◽  
Evangelos Kanoulas ◽  
Maarten de Rijke

AbstractEntities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249622
Author(s):  
Zhi Wen ◽  
Pratheeksha Nair ◽  
Chih-Ying Deng ◽  
Xing Han Lu ◽  
Edward Moseley ◽  
...  

Latent knowledge can be extracted from the electronic notes that are recorded during patient encounters with the health system. Using these clinical notes to decipher a patient’s underlying comorbidites, symptom burdens, and treatment courses is an ongoing challenge. Latent topic model as an efficient Bayesian method can be used to model each patient’s clinical notes as “documents” and the words in the notes as “tokens”. However, standard latent topic models assume that all of the notes follow the same topic distribution, regardless of the type of note or the domain expertise of the author (such as doctors or nurses). We propose a novel application of latent topic modeling, using multi-note topic model (MNTM) to jointly infer distinct topic distributions of notes of different types. We applied our model to clinical notes from the MIMIC-III dataset to infer distinct topic distributions over the physician and nursing note types. Based on manual assessments made by clinicians, we observed a significant improvement in topic interpretability using MNTM modeling over the baseline single-note topic models that ignore the note types. Moreover, our MNTM model led to a significantly higher prediction accuracy for prolonged mechanical ventilation and mortality using only the first 48 hours of patient data. By correlating the patients’ topic mixture with hospital mortality and prolonged mechanical ventilation, we identified several diagnostic topics that are associated with poor outcomes. Because of its elegant and intuitive formation, we envision a broad application of our approach in mining multi-modality text-based healthcare information that goes beyond clinical notes. Code available at https://github.com/li-lab-mcgill/heterogeneous_ehr.


Author(s):  
Natalia Vasilievna Salomatina ◽  
◽  
Irina Semenovna Kononenko ◽  
Elena Anatolvna Sidorova ◽  
Ivan Sergeevich Pimenov ◽  
...  

The presented work describes the analysis of argumentative statements included into the same text topic fragment as a recognition feature in terms of its efficiency. This study is performed with the purpose of using this feature in automatic recognition of argumentative structures presented in the popular science texts written in Russian. The topic model of a text is constructed based on superphrasal units (text fragments united by one topic) that are identified by detecting clusters of words and word-combinations with the use of scan statistics. Potential relations, extracted from topic models, are verified through the use of texts with manually annotated argumentation structures. The comparison between potential (based on topic models) and manually constructed relations is performed automatically. Macro-average scores of precision and recall are equal to 48.6% and 76.2% correspondingly.


Author(s):  
Hainan Zhang ◽  
Yanyan Lan ◽  
Liang Pang ◽  
Hongshen Chen ◽  
Zhuoye Ding ◽  
...  

Topic drift is a common phenomenon in multi-turn dialogue. Therefore, an ideal dialogue generation models should be able to capture the topic information of each context, detect the relevant context, and produce appropriate responses accordingly. However, existing models usually use word or sentence level similarities to detect the relevant contexts, which fail to well capture the topical level relevance. In this paper, we propose a new model, named STAR-BTM, to tackle this problem. Firstly, the Biterm Topic Model is pre-trained on the whole training dataset. Then, the topic level attention weights are computed based on the topic representation of each context. Finally, the attention weights and the topic distribution are utilized in the decoding process to generate the corresponding responses. Experimental results on both Chinese customer services data and English Ubuntu dialogue data show that STAR-BTM significantly outperforms several state-of-the-art methods, in terms of both metric-based and human evaluations.


2020 ◽  
Vol 39 (4) ◽  
pp. 727-742 ◽  
Author(s):  
Joachim Büschken ◽  
Greg M. Allenby

User-generated content in the form of customer reviews, blogs, and tweets is an emerging and rich source of data for marketers. Topic models have been successfully applied to such data, demonstrating that empirical text analysis benefits greatly from a latent variable approach that summarizes high-level interactions among words. We propose a new topic model that allows for serial dependency of topics in text. That is, topics may carry over from word to word in a document, violating the bag-of-words assumption in traditional topic models. In the proposed model, topic carryover is informed by sentence conjunctions and punctuation. Typically, such observed information is eliminated prior to analyzing text data (i.e., preprocessing) because words such as “and” and “but” do not differentiate topics. We find that these elements of grammar contain information relevant to topic changes. We examine the performance of our models using multiple data sets and establish boundary conditions for when our model leads to improved inference about customer evaluations. Implications and opportunities for future research are discussed.


2019 ◽  
Vol 1 (1) ◽  
pp. 45-78
Author(s):  
Chankyung Pak

Abstract To disseminate their stories efficiently via social media, news organizations make decisions that resemble traditional editorial decisions. However, the decisions for social media may deviate from traditional ones because they are often made outside the newsroom and guided by audience metrics. This study focuses on selective link sharing as quasi-gatekeeping on Twitter ‐ conditioning a link sharing decision about news content. It illustrates how selective link sharing resembles and deviates from gatekeeping for the publication of news stories. Using a computational data collection method and a machine learning technique called Structural Topic Model (STM), this study shows that selective link sharing generates a different topic distribution between news websites and Twitter and thus significantly revokes the specialty of news organizations. This finding implies that emergent logic, which governs news organizations’ decisions for social media, can undermine the provision of diverse news.


2015 ◽  
Vol 43 (2) ◽  
pp. 94-97
Author(s):  
Xiaoxia Yao ◽  
YongChao Zhao

Purpose – To purpose of this study is to describe and to demonstrate the value of a consortium purchase of the ProQuest Dissertations and Theses full-text database (PQDT) in China. Design/methodology/approach – The authors provide a first-person account based on their professional positions at the China Academic Library & Information System Administrative Center. Findings – That the PQDT database has steadily increased the use of theses in China with more institutions subscribing every year. The PQDT full-text database has become one of the most cost effective databases cooperatively purchased in China. Originality/value – One of the few in-depth studies of the use of the PQDT database.


Author(s):  
Ximing Li ◽  
Jiaojiao Zhang ◽  
Jihong Ouyang

Conventional topic models suffer from a severe sparsity problem when facing extremely short texts such as social media posts. The family of Dirichlet multinomial mixture (DMM) can handle the sparsity problem, however, they are still very sensitive to ordinary and noisy words, resulting in inaccurate topic representations at the document level. In this paper, we alleviate this problem by preserving local neighborhood structure of short texts, enabling to spread topical signals among neighboring documents, so as to correct the inaccurate topic representations. This is achieved by using variational manifold regularization, constraining the close short texts should have similar variational topic representations. Upon this idea, we propose a novel Laplacian DMM (LapDMM) topic model. During the document graph construction, we further use the word mover’s distance with word embeddings to measure document similarities at the semantic level. To evaluate LapDMM, we compare it against the state-of-theart short text topic models on several traditional tasks. Experimental results demonstrate that our LapDMM achieves very significant performance gains over baseline models, e.g., achieving even about 0.2 higher scores on clustering and classification tasks in many cases.


Symmetry ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 1486
Author(s):  
Zhinan Gou ◽  
Zheng Huo ◽  
Yuanzhen Liu ◽  
Yi Yang

Supervised topic modeling has been successfully applied in the fields of document classification and tag recommendation in recent years. However, most existing models neglect the fact that topic terms have the ability to distinguish topics. In this paper, we propose a term frequency-inverse topic frequency (TF-ITF) method for constructing a supervised topic model, in which the weight of each topic term indicates the ability to distinguish topics. We conduct a series of experiments with not only the symmetric Dirichlet prior parameters but also the asymmetric Dirichlet prior parameters. Experimental results demonstrate that the result of introducing TF-ITF into a supervised topic model outperforms several state-of-the-art supervised topic models.


2015 ◽  
Vol 43 (4) ◽  
pp. 182-188 ◽  
Author(s):  
Qingkui Xi ◽  
Qian Zhang ◽  
Feng Ni ◽  
Guiting Cha ◽  
Ping Bao

Purpose – This paper aims to describe and analyse the interlibrary loan and document delivery (ILL/DD) in university libraries in Jiangsu Province, China, and to evaluate the service quality of one library as an example of how to improve. Design/methodology/approach – This paper first describes the ILL/DD of the Jiangsu Academic Library & Information System (JALIS). It then provides an analysis of the problems in JALIS ILL/DD and gives some suggestions for improvement. Finally, it evaluates the service quality of one library’s ILL/DD based on the analytic hierarchy process (AHP). Findings – It is found that JALIS ILL/DD can be done better via small consortia and discipline centres, and that AHP can be used to evaluate the service quality of a library’s ILL/DD. Social implications – More patrons can access better service, and the work effectiveness of librarians can be improved. Originality/value – This paper is helpful to librarians interested in ILL/DD or resource sharing in China.


Sign in / Sign up

Export Citation Format

Share Document