scholarly journals Determining the Number of Topics to Retain using Tools from Factor Analysis

2022 ◽  
pp. 1-14
Author(s):  
Finch Holmes

Abstract Determining the optimal number of topics to retain in the conduct of topic modeling (TM) has received much attention over the last decade. Despite this work, issues remain regarding the best methods to use for making such determinations. Approaches involving the use of relatively simple statistics, most notably perplexity, have proven to be somewhat inconsistent. Recently, researchers have suggested the use of change in perplexity scores as a useful heuristic for determining the optimal number of topics to retain. The current study builds on this earlier work by assessing the utility of several methods borrowed from factor analysis and applied to statistics commonly used in topic modeling, including perplexity and Alpha. These new approaches are applied to several textual datasets and compared with more traditional methods for determining the number of topics to retain. Results of these analyses demonstrate that application of these methods borrowed from factor analysis does appear to be effective for identifying the number of topics to retain.

2013 ◽  
Vol 10 (1) ◽  
pp. 173-195 ◽  
Author(s):  
George Lagogiannis ◽  
Nikos Lorentzos ◽  
Alexander Sideridis

Indexing moving objects usually involves a great amount of updates, caused by objects reporting their current position. In order to keep the present and past positions of the objects in secondary memory, each update introduces an I/O and this process is sometimes creating a bottleneck. In this paper we deal with the problem of minimizing the number of I/Os in such a way that queries concerning the present and past positions of the objects can be answered efficiently. In particular we propose two new approaches that achieve an asymptotically optimal number of I/Os for performing the necessary updates. The approaches are based on the assumption that the primary memory suffices for storing the current positions of the objects.


2015 ◽  
Vol 1 (311) ◽  
Author(s):  
Piotr Tarka

Abstract: The objective article is the comparative analysis of Likert rating scale based on the following range of response categories, i.e. 5, 7, 9 and 11 in context of the appropriate process of factors extraction in exploratory factor analysis (EFA). The problem which is being addressed in article relates primarily to the methodological aspects, both in selection of the optimal number of response categories of the measured items (constituting the Likert scale) and identification of possible changes, differences or similarities associated (as a result of the impact of four types of scales) with extraction and determination the appropriate number of factors in EFA model.Keywords: Exploratory factor analysis, Likert scale, experiment research, marketing


2020 ◽  
Author(s):  
Mashrekur Rahman ◽  
Grey Nearing ◽  
Jonathan Frame

<p>Hydrologic research generates massive volumes of peer-reviewed literature across a plethora of evolving topics and sub-topics. It’s becoming increasingly difficult for scientists and practitioners to synthesize and leverage the full body of scientific literature. Recent advancement of computational linguistics, machine learning, including a variety of toolboxes for Natural Language Processing (NLP), help facilitate analysis of vast electronic corpuses for a multitude of objectives. Research papers published as electronic text files in different journals offer windows into trending topics and developments, and NLP allows us to extract information and insight about these trends. </p><p> </p><p>This project applies Latent Dirichlet Allocation (LDA) Topic Modeling for bibliometric analyses of all peer-reviewed articles in selected high-impact (Impact Factor > 0.9) journals in hydrology (<em>Water Resources Research, Hydrology and Earth System Sciences, Journal of Hydrology,  Hydrological Processes, Advances in Water Resources, Hydrological Sciences Journal, Journal of Hydrometeorology</em>). Topic modeling uses statistical algorithms to extract semantic information from a collection of texts and has become an emerging quantitative method to assess substantial textual data. After acquiring all the papers published in the aforementioned journals and applying multiple pre-processing routines including removing punctuations, nonsensical texts, stopwords, and tokenizing, stemming, lemmatization etc., the resultant corpus was fed to the LDA model for ‘learning’ latent intellectual topics. We achieved this using <em>Gensim</em>, an open-source Python library widely used for unsupervised semantic modeling with LDA. The optimal number of topics (<em>k</em>) and model hyperparameters were decided using coherence and perplexity values for multiple LDA models with varying <em>k</em>.  The resulting generated topics are interpretable based on our prior knowledge of hydrology and related sub-disciplines. Comparative topic trend, term, and document level cluster analyses based on different time periods, journals and authors were performed. These analyses revealed topics such as climate change research gaining popularity in Hydrology over the last decade. </p><p> </p><p>We aim to use these results combined with probability distribution between topics, journals and authors to create an interactive ontology map that is useful for research scientists and environmental consultants for exploring relevant literature based on topics and topic relationships. The primary objective of this work is to allow science practitioners to explore new branches and connections in the Hydrology literature, and to facilitate comprehensive and inclusive literature reviews. Second-order beneficiaries are decision and policy makers: the proposed project will provide insights into current research trends and help identify transitions and argumentative viewpoints in hydrologic research. The outcomes of this project will also serve as tools to facilitate effective science communication and aid in bridging gaps between scientists and stakeholders of their research.</p><p><br><br></p>


2016 ◽  
Vol 21 (1) ◽  
pp. 43-64 ◽  
Author(s):  
Ryan Light ◽  
Jeanine Cunningham

Social movement frames are dynamic, shifting and embedded within an already existent cultural milieu—a milieu that affects mobilization opportunities. In this article, we invoke the concept of the “cultural clearinghouse” to tackle how broader cultural structures translate to frames or influence frame resonance. Our illustrative case, the Nobel Peace Prize, along with our use of topic modeling, a computational technique that identifies commonalities between texts, offer an important methodological advance for social movement scholars interested in culture, frame formation and resonance, and dynamic approaches to social movement discourse. Our findings show how peace discourse—as represented by Peace Prize acceptance speeches—increasingly has become embedded within broader cultural emphases on globalization and neoliberalism, versus earlier Christian and global institutional schemas. We conclude by discussing the usefulness of our conceptual and methodological advance for movement scholars with special attention to the coupling of new computational techniques and more traditional methods.


2020 ◽  
Vol 12 (16) ◽  
pp. 6673 ◽  
Author(s):  
Kiattipoom Kiatkawsin ◽  
Ian Sutherland ◽  
Jin-Young Kim

Airbnb has emerged as a platform where unique accommodation options can be found. Due to the uniqueness of each accommodation unit and host combination, each listing offers a one-of-a-kind experience. As consumers increasingly rely on text reviews of other customers, managers are also increasingly gaining insight from customer reviews. Thus, this present study aimed to extract those insights from reviews using latent Dirichlet allocation, an unsupervised type of topic modeling that extracts latent discussion topics from text data. Findings of Hong Kong’s 185,695 and Singapore’s 93,571 Airbnb reviews, two long-term rival destinations, were compared. Hong Kong produced 12 total topics that can be categorized into four distinct groups whereas Singapore’s optimal number of topics was only five. Topics produced from both destinations covered the same range of attributes, but Hong Kong’s 12 topics provide a greater degree of precision to formulate managerial recommendations. While many topics are similar to established hotel attributes, topics related to the host and listing management are unique to the Airbnb experience. The findings also revealed keywords used when evaluating the experience that provide more insight beyond typical numeric ratings.


2016 ◽  
Vol 15 (4) ◽  
pp. 6672-6680
Author(s):  
Nabiollah Bayatmoghadam ◽  
Ashkan Sami

 In regard with designing software, users play key role. In order to design software, it is necessary to observe standard principles of designation, using templates and using modern methods. Over the decades, using development methods, XP and one of the XP methodologies of paired programming used to design software. These methods have been designed for purpose of enhancing quality of product and rapidresponse to need of market and customer and overcoming weaknesses of traditional methods based on long-term programming and waterfall method. Therefore, every programmer and developer can pass a series of processes for constructing computer software. The processes can be changed daily and efficient processes maynot be effective and useful; although they can be considered as process. The main objective of the present study is multi-factor analysis of pair programming based on PSP methodology. Practical and analytical methodand two PSP methods have been applied for investigations.


Author(s):  
Oleksandr Krasheninin ◽  
M. O. Myklaschuk

In recent years, the technical condition of railway equipment has reached a critical limit that appliesto railway cranes, used as a depot and on line. At the same financial and material capabilities as railwaysand some stores are limited, requiring new approaches to support the technical condition of railwayequipment.In article one approach to optimization of repair stations in terms of their capacity, which requiresdetermining the optimal number of repair stations.


Sign in / Sign up

Export Citation Format

Share Document