scholarly journals Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

2017 ◽  
Vol 5 ◽  
pp. 191-204 ◽  
Author(s):  
Jooyeon Kim ◽  
Dongwoo Kim ◽  
Alice Oh

Much of scientific progress stems from previously published findings, but searching through the vast sea of scientific publications is difficult. We often rely on metrics of scholarly authority to find the prominent authors but these authority indices do not differentiate authority based on research topics. We present Latent Topical-Authority Indexing (LTAI) for jointly modeling the topics, citations, and topical authority in a corpus of academic papers. Compared to previous models, LTAI differs in two main aspects. First, it explicitly models the generative process of the citations, rather than treating the citations as given. Second, it models each author’s influence on citations of a paper based on the topics of the cited papers, as well as the citing papers. We fit LTAI into four academic corpora: CORA, Arxiv Physics, PNAS, and Citeseer. We compare the performance of LTAI against various baselines, starting with the latent Dirichlet allocation, to the more advanced models including author-link topic model and dynamic author citation topic model. The results show that LTAI achieves improved accuracy over other similar models when predicting words, citations and authors of publications.

2018 ◽  
Vol 36 (3) ◽  
pp. 400-410 ◽  
Author(s):  
Debin Fang ◽  
Haixia Yang ◽  
Baojun Gao ◽  
Xiaojun Li

Purpose Discovering the research topics and trends from a large quantity of library electronic references is essential for scientific research. Current research of this kind mainly depends on human justification. The purpose of this paper is to demonstrate how to identify research topics and evolution in trends from library electronic references efficiently and effectively by employing automatic text analysis algorithms. Design/methodology/approach The authors used the latent Dirichlet allocation (LDA), a probabilistic generative topic model to extract the latent topic from the large quantity of research abstracts. Then, the authors conducted a regression analysis on the document-topic distributions generated by LDA to identify hot and cold topics. Findings First, this paper discovers 32 significant research topics from the abstracts of 3,737 articles published in the six top accounting journals during the period of 1992-2014. Second, based on the document-topic distributions generated by LDA, the authors identified seven hot topics and six cold topics from the 32 topics. Originality/value The topics discovered by LDA are highly consistent with the topics identified by human experts, indicating the validity and effectiveness of the methodology. Therefore, this paper provides novel knowledge to the accounting literature and demonstrates a methodology and process for topic discovery with lower cost and higher efficiency than the current methods.


2021 ◽  
Vol 4 ◽  
pp. 205920432110488
Author(s):  
Fabian C. Moss ◽  
Martin Rohrmeier

Music analysis, in particular harmonic analysis, is concerned with the way pitches are organized in pieces of music, and a range of empirical applications have been developed, for example, for chord recognition or key finding. Naturally, these approaches rely on some operationalization of the concepts they aim to investigate. In this study, we take a complementary approach and discover latent tonal structures in an unsupervised manner. We use the topic model Latent Dirichlet Allocation and apply it to a large historical corpus of musical pieces from the Western classical tradition. This method conceives topics as distributions of pitch classes without assuming a priori that they correspond to either chords, keys, or other harmonic phenomena. To illustrate the generative process assumed by the model, we create an artificial corpus with arbitrary parameter settings and compare the sampled pieces to real compositions. The results we obtain by applying the topic model to the musical corpus show that the inferred topics have music-theoretically meaningful interpretations. In particular, topics cover contiguous segments on the line of fifths and mostly correspond to diatonic sets. Moreover, tracing the prominence of topics over the course of music history over [Formula: see text]600 years reflects changes in the ways pitch classes are employed in musical compositions and reveals particularly strong changes at the transition from common-practice to extended tonality in the 19th century.


2020 ◽  
Vol 125 (3) ◽  
pp. 2561-2595
Author(s):  
Xiaoyao Han

AbstractThis study investigated the evolution of library and information science (LIS) by analyzing research topics in LIS journal articles. The analysis is divided into five periods covering the years 1996–2019. Latent Dirichlet allocation modeling was used to identify underlying topics based on 14,035 documents. An improved data-selection method was devised in order to generate a dynamic journal list that included influential journals for each period. Results indicate that (a) library science has become less prevalent over time, as there are no top topic clusters relevant to library issues since the period 2000–2005; (b) bibliometrics, especially citation analysis, is highly stable across periods, as reflected by the stable subclusters and consistent keywords; and (c) information retrieval has consistently been the dominant domain with interests gradually shifting to model-based text processing. Information seeking and behavior is also a stable field that tends to be dispersed among various topics rather than presented as its own subject. Information systems and organizational activities have been continuously discussed and have developed a closer relationship with e-commerce. Topics that occurred only once have undergone a change of technological context from the networks and Internet to social media and mobile applications.


Author(s):  
Xi Liu ◽  
Yongfeng Yin ◽  
Haifeng Li ◽  
Jiabin Chen ◽  
Chang Liu ◽  
...  

AbstractExisting software intelligent defect classification approaches do not consider radar characters and prior statistics information. Thus, when applying these appaoraches into radar software testing and validation, the precision rate and recall rate of defect classification are poor and have effect on the reuse effectiveness of software defects. To solve this problem, a new intelligent defect classification approach based on the latent Dirichlet allocation (LDA) topic model is proposed for radar software in this paper. The proposed approach includes the defect text segmentation algorithm based on the dictionary of radar domain, the modified LDA model combining radar software requirement, and the top acquisition and classification approach of radar software defect based on the modified LDA model. The proposed approach is applied on the typical radar software defects to validate the effectiveness and applicability. The application results illustrate that the prediction precison rate and recall rate of the poposed approach are improved up to 15 ~ 20% compared with the other defect classification approaches. Thus, the proposed approach can be applied in the segmentation and classification of radar software defects effectively to improve the identifying adequacy of the defects in radar software.


10.2196/15511 ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. e15511 ◽  
Author(s):  
Bach Xuan Tran ◽  
Son Nghiem ◽  
Oz Sahin ◽  
Tuan Manh Vu ◽  
Giang Hai Ha ◽  
...  

Background Artificial intelligence (AI)–based technologies develop rapidly and have myriad applications in medicine and health care. However, there is a lack of comprehensive reporting on the productivity, workflow, topics, and research landscape of AI in this field. Objective This study aimed to evaluate the global development of scientific publications and constructed interdisciplinary research topics on the theory and practice of AI in medicine from 1977 to 2018. Methods We obtained bibliographic data and abstract contents of publications published between 1977 and 2018 from the Web of Science database. A total of 27,451 eligible articles were analyzed. Research topics were classified by latent Dirichlet allocation, and principal component analysis was used to identify the construct of the research landscape. Results The applications of AI have mainly impacted clinical settings (enhanced prognosis and diagnosis, robot-assisted surgery, and rehabilitation), data science and precision medicine (collecting individual data for precision medicine), and policy making (raising ethical and legal issues, especially regarding privacy and confidentiality of data). However, AI applications have not been commonly used in resource-poor settings due to the limit in infrastructure and human resources. Conclusions The application of AI in medicine has grown rapidly and focuses on three leading platforms: clinical practices, clinical material, and policies. AI might be one of the methods to narrow down the inequality in health care and medicine between developing and developed countries. Technology transfer and support from developed countries are essential measures for the advancement of AI application in health care in developing countries.


2020 ◽  
Vol 32 (4) ◽  
pp. 577-603
Author(s):  
Gustavo Cesário ◽  
Ricardo Lopes Cardoso ◽  
Renato Santos Aranha

PurposeThis paper aims to analyse how the supreme audit institution (SAI) monitors related party transactions (RPTs) in the Brazilian public sector. It considers definitions and disclosure policies of RPTs by international accounting and auditing standards and their evolution since 1980.Design/methodology/approachBased on archival research on international standards and using an interpretive approach, the authors investigated definitions and disclosure policies. Using a topic model based on latent Dirichlet allocation, the authors performed a content analysis on over 59,000 SAI decisions to assess how the SAI monitors RPTs.FindingsThe SAI investigates nepotism (a kind of RPT) and conflicts of interest up to eight times more frequently than related parties. Brazilian laws prevent nepotism and conflicts of interest, but not RPTs in general. Indeed, Brazilian public-sector accounting standards have not converged towards IPSAS 20, and ISSAI 1550 does not adjust auditing procedures to suit the public sector.Research limitations/implicationsThe SAI follows a legalistic auditing approach, indicating a need for regulation of related public-sector parties to improve surveillance. In addition to Brazil, other code law countries might face similar circumstances.Originality/valuePublic-sector RPTs are an under-investigated field, calling for attention by academics and standard-setters. Text mining and latent Dirichlet allocation, while mature techniques, are underexplored in accounting and auditing studies. Additionally, the Python script created to analyse the audit reports is available at Mendeley Data and may be used to perform similar analyses with minor adaptations.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0243208
Author(s):  
Leacky Muchene ◽  
Wende Safari

Unsupervised statistical analysis of unstructured data has gained wide acceptance especially in natural language processing and text mining domains. Topic modelling with Latent Dirichlet Allocation is one such statistical tool that has been successfully applied to synthesize collections of legal, biomedical documents and journalistic topics. We applied a novel two-stage topic modelling approach and illustrated the methodology with data from a collection of published abstracts from the University of Nairobi, Kenya. In the first stage, topic modelling with Latent Dirichlet Allocation was applied to derive the per-document topic probabilities. To more succinctly present the topics, in the second stage, hierarchical clustering with Hellinger distance was applied to derive the final clusters of topics. The analysis showed that dominant research themes in the university include: HIV and malaria research, research on agricultural and veterinary services as well as cross-cutting themes in humanities and social sciences. Further, the use of hierarchical clustering in the second stage reduces the discovered latent topics to clusters of homogeneous topics.


Sign in / Sign up

Export Citation Format

Share Document