Discovering research topics from library electronic references using latent Dirichlet allocation

Purpose Discovering the research topics and trends from a large quantity of library electronic references is essential for scientific research. Current research of this kind mainly depends on human justification. The purpose of this paper is to demonstrate how to identify research topics and evolution in trends from library electronic references efficiently and effectively by employing automatic text analysis algorithms. Design/methodology/approach The authors used the latent Dirichlet allocation (LDA), a probabilistic generative topic model to extract the latent topic from the large quantity of research abstracts. Then, the authors conducted a regression analysis on the document-topic distributions generated by LDA to identify hot and cold topics. Findings First, this paper discovers 32 significant research topics from the abstracts of 3,737 articles published in the six top accounting journals during the period of 1992-2014. Second, based on the document-topic distributions generated by LDA, the authors identified seven hot topics and six cold topics from the 32 topics. Originality/value The topics discovered by LDA are highly consistent with the topics identified by human experts, indicating the validity and effectiveness of the methodology. Therefore, this paper provides novel knowledge to the accounting literature and demonstrates a methodology and process for topic discovery with lower cost and higher efficiency than the current methods.

Download Full-text

The surveillance of a supreme audit institution on related party transactions

Journal of Public Budgeting Accounting & Financial Management ◽

10.1108/jpbafm-12-2019-0181 ◽

2020 ◽

Vol 32 (4) ◽

pp. 577-603

Author(s):

Gustavo Cesário ◽

Ricardo Lopes Cardoso ◽

Renato Santos Aranha

Keyword(s):

Public Sector ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Conflicts Of Interest ◽

International Standards ◽

Content Type ◽

Related Party Transactions ◽

The Public ◽

Audit Reports ◽

Dirichlet Allocation

PurposeThis paper aims to analyse how the supreme audit institution (SAI) monitors related party transactions (RPTs) in the Brazilian public sector. It considers definitions and disclosure policies of RPTs by international accounting and auditing standards and their evolution since 1980.Design/methodology/approachBased on archival research on international standards and using an interpretive approach, the authors investigated definitions and disclosure policies. Using a topic model based on latent Dirichlet allocation, the authors performed a content analysis on over 59,000 SAI decisions to assess how the SAI monitors RPTs.FindingsThe SAI investigates nepotism (a kind of RPT) and conflicts of interest up to eight times more frequently than related parties. Brazilian laws prevent nepotism and conflicts of interest, but not RPTs in general. Indeed, Brazilian public-sector accounting standards have not converged towards IPSAS 20, and ISSAI 1550 does not adjust auditing procedures to suit the public sector.Research limitations/implicationsThe SAI follows a legalistic auditing approach, indicating a need for regulation of related public-sector parties to improve surveillance. In addition to Brazil, other code law countries might face similar circumstances.Originality/valuePublic-sector RPTs are an under-investigated field, calling for attention by academics and standard-setters. Text mining and latent Dirichlet allocation, while mature techniques, are underexplored in accounting and auditing studies. Additionally, the Python script created to analyse the audit reports is available at Mendeley Data and may be used to perform similar analyses with minor adaptations.

Download Full-text

Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00055 ◽

2017 ◽

Vol 5 ◽

pp. 191-204 ◽

Cited By ~ 2

Author(s):

Jooyeon Kim ◽

Dongwoo Kim ◽

Alice Oh

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Scientific Progress ◽

Joint Modeling ◽

Research Topics ◽

Scientific Publications ◽

Generative Process ◽

Author Citation ◽

Improved Accuracy ◽

Dirichlet Allocation

Much of scientific progress stems from previously published findings, but searching through the vast sea of scientific publications is difficult. We often rely on metrics of scholarly authority to find the prominent authors but these authority indices do not differentiate authority based on research topics. We present Latent Topical-Authority Indexing (LTAI) for jointly modeling the topics, citations, and topical authority in a corpus of academic papers. Compared to previous models, LTAI differs in two main aspects. First, it explicitly models the generative process of the citations, rather than treating the citations as given. Second, it models each author’s influence on citations of a paper based on the topics of the cited papers, as well as the citing papers. We fit LTAI into four academic corpora: CORA, Arxiv Physics, PNAS, and Citeseer. We compare the performance of LTAI against various baselines, starting with the latent Dirichlet allocation, to the more advanced models including author-link topic model and dynamic author citation topic model. The results show that LTAI achieves improved accuracy over other similar models when predicting words, citations and authors of publications.

Download Full-text

Examining research topics with a dependency-based noun phrase extraction method: a case in accounting

Library Hi Tech ◽

10.1108/lht-12-2019-0247 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Lei ◽

Yaochen Deng ◽

Dilin Liu

Keyword(s):

Design Methodology ◽

Latent Dirichlet Allocation ◽

Noun Phrases ◽

Specific Area ◽

Large Set ◽

Important Research ◽

Research Topics ◽

Content Type ◽

Noun Phrase Extraction ◽

Dirichlet Allocation

PurposeExamining research topics in a specific area such as accounting is important to both novice and veteran researchers. The present study aims to identify the research topics in the area of accounting and to investigate the research trends by finding hot and cold topics from all those identified ones in the field.Design/methodology/approachA new dependency-based method focusing on noun phrases, which efficiently extracts research topics from a large set of library data, was proposed. An AR(1) autoregressive model was used to identify topics that have received significantly more or less attention from the researchers. The data used in the study included a total of 4,182 abstracts published in six leading (or premier) accounting journals from 2000 to May 2019.FindingsThe study identified 48 important research topics across the examined period as well as eight hot topics and one cold topic from the 48 topics.Originality/valueThe research topics identified based on the dependency-based method are similar to those found with the technique of latent Dirichlet allocation latent Dirichlet allocation (LDA) topic modelling. In addition, the method seems highly efficient, and the results are easier to interpret. Last, the research topics and trends found in the study provide reference to the researchers in the area of accounting.

Download Full-text

Latent Dirichlet allocation-based temporal summarization

International Journal of Web Information Systems ◽

10.1108/ijwis-04-2018-0023 ◽

2019 ◽

Vol 15 (1) ◽

pp. 83-102 ◽

Cited By ~ 1

Author(s):

Ahmed Amir Tazibt ◽

Farida Aoughlis

Keyword(s):

Topic Modeling ◽

Design Methodology ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

External Source ◽

Decision Makers ◽

Content Type ◽

Available Information ◽

Different Sources ◽

Dirichlet Allocation

Purpose During crises such as accidents or disasters, an enormous volume of information is generated on the Web. Both people and decision-makers often need to identify relevant and timely content that can help in understanding what happens and take right decisions, as soon it appears online. However, relevant content can be disseminated in document streams. The available information can also contain redundant content published by different sources. Therefore, the need of automatic construction of summaries that aggregate important, non-redundant and non-outdated pieces of information is becoming critical. Design/methodology/approach The aim of this paper is to present a new temporal summarization approach based on a popular topic model in the information retrieval field, the Latent Dirichlet Allocation. The approach consists of filtering documents over streams, extracting relevant parts of information and then using topic modeling to reveal their underlying aspects to extract the most relevant and novel pieces of information to be added to the summary. Findings The performance evaluation of the proposed temporal summarization approach based on Latent Dirichlet Allocation, performed on the TREC Temporal Summarization 2014 framework, clearly demonstrates its effectiveness to provide short and precise summaries of events. Originality/value Unlike most of the state of the art approaches, the proposed method determines the importance of the pieces of information to be added to the summaries solely relying on their representation in the topic space provided by Latent Dirichlet Allocation, without the use of any external source of evidence.

Download Full-text

Evolution of research topics in LIS between 1996 and 2019: an analysis based on latent Dirichlet allocation topic model

Scientometrics ◽

10.1007/s11192-020-03721-0 ◽

2020 ◽

Vol 125 (3) ◽

pp. 2561-2595

Author(s):

Xiaoyao Han

Keyword(s):

Information Seeking ◽

Latent Dirichlet Allocation ◽

Information Science ◽

Topic Model ◽

Text Processing ◽

Library Science ◽

Research Topics ◽

Processing Information ◽

And Behavior ◽

Dirichlet Allocation

AbstractThis study investigated the evolution of library and information science (LIS) by analyzing research topics in LIS journal articles. The analysis is divided into five periods covering the years 1996–2019. Latent Dirichlet allocation modeling was used to identify underlying topics based on 14,035 documents. An improved data-selection method was devised in order to generate a dynamic journal list that included influential journals for each period. Results indicate that (a) library science has become less prevalent over time, as there are no top topic clusters relevant to library issues since the period 2000–2005; (b) bibliometrics, especially citation analysis, is highly stable across periods, as reflected by the stable subclusters and consistent keywords; and (c) information retrieval has consistently been the dominant domain with interests gradually shifting to model-based text processing. Information seeking and behavior is also a stable field that tends to be dispersed among various topics rather than presented as its own subject. Information systems and organizational activities have been continuously discussed and have developed a closer relationship with e-commerce. Topics that occurred only once have undergone a change of technological context from the networks and Internet to social media and mobile applications.

Download Full-text

Intelligent radar software defect classification approach based on the latent Dirichlet allocation topic model

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00761-3 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Xi Liu ◽

Yongfeng Yin ◽

Haifeng Li ◽

Jiabin Chen ◽

Chang Liu ◽

...

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Recall Rate ◽

Defect Classification ◽

Software Defects ◽

Classification Approach ◽

Software Defect ◽

Model Combining ◽

Dirichlet Allocation

AbstractExisting software intelligent defect classification approaches do not consider radar characters and prior statistics information. Thus, when applying these appaoraches into radar software testing and validation, the precision rate and recall rate of defect classification are poor and have effect on the reuse effectiveness of software defects. To solve this problem, a new intelligent defect classification approach based on the latent Dirichlet allocation (LDA) topic model is proposed for radar software in this paper. The proposed approach includes the defect text segmentation algorithm based on the dictionary of radar domain, the modified LDA model combining radar software requirement, and the top acquisition and classification approach of radar software defect based on the modified LDA model. The proposed approach is applied on the typical radar software defects to validate the effectiveness and applicability. The application results illustrate that the prediction precison rate and recall rate of the poposed approach are improved up to 15 ~ 20% compared with the other defect classification approaches. Thus, the proposed approach can be applied in the segmentation and classification of radar software defects effectively to improve the identifying adequacy of the defects in radar software.

Download Full-text

Research progress and trend of leader member exchange based on social complex network and latent dirichlet allocation topic model

2020 2nd International Conference on Economic Management and Model Engineering (ICEMME) ◽

10.1109/icemme51517.2020.00090 ◽

2020 ◽

Author(s):

Zhang chunyang ◽

Ding kun ◽

Zhang chunbo ◽

Zhang li

Keyword(s):

Complex Network ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Research Progress ◽

Leader Member Exchange ◽

Member Exchange ◽

Dirichlet Allocation

Download Full-text

Citation context-based topic models: discovering cited and citing topics from full text

Library Hi Tech ◽

10.1108/lht-01-2021-0041 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lixue Zou ◽

Xiwen Liu ◽

Wray Buntine ◽

Yanli Liu

Keyword(s):

Full Text ◽

Topic Models ◽

Content Type ◽

Reference Information ◽

Citation Context ◽

Comparable Performance ◽

Automatic Text Analysis ◽

Collapsed Gibbs Sampling ◽

Source Of Information ◽

Automatic Text

PurposeFull text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.Design/methodology/approachThe authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.FindingsThe results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.Originality/valueThe automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.

Download Full-text

Mining numerical measure of consumers’ product evaluation expressed in words based on latent Dirichlet allocation

Journal of Modelling in Management ◽

10.1108/jm2-07-2021-0163 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Ziang Wang ◽

Feng Yang

Keyword(s):

Latent Dirichlet Allocation ◽

Product Evaluation ◽

Online Reviews ◽

Product Evaluations ◽

Product Attributes ◽

Data Set ◽

Content Type ◽

Face To Face ◽

Online Retailers ◽

Dirichlet Allocation

Purpose It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word. Design/methodology/approach First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model. Findings The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment. Originality/value This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.

Download Full-text

Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462003 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kedar S. Prabhudesai ◽

Boyla O. Mainsah ◽

Leslie M. Collins ◽

Chandra S. Throckmorton

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Gaussian Mixture ◽

Dirichlet Allocation

Download Full-text