probabilistic latent semantic analysis Latest Research Papers

Geospatial data is an indispensable data resource for research and applications in many fields. The technologies and applications related to geospatial data are constantly advancing and updating, so identifying the technologies and applications among them will help foster and fund further innovation. Through topic analysis, new research hotspots can be discovered by understanding the whole development process of a topic. At present, the main methods to determine topics are peer review and bibliometrics, however they just review relevant literature or perform simple frequency analysis. This paper proposes a new topic discovery method, which combines a word embedding method, based on a pre-trained model, Bert, and a spherical k-means clustering algorithm, and applies the similarity between literature and topics to assign literature to different topics. The proposed method was applied to 266 pieces of literature related to geospatial data over the past five years. First, according to the number of publications, the trend analysis of technologies and applications related to geospatial data in several leading countries was conducted. Then, the consistency of the proposed method and the existing method PLSA (Probabilistic Latent Semantic Analysis) was evaluated by using two similar consistency evaluation indicators (i.e., U-Mass and NMPI). The results show that the method proposed in this paper can well reveal text content, determine development trends, and produce more coherent topics, and that the overall performance of Bert-LSA is better than PLSA using NPMI and U-Mass. This method is not limited to trend analysis using the data in this paper; it can also be used for the topic analysis of other types of texts.

Download Full-text

Topic Modeling as a Tool for Analyzing Library Chat Transcripts

Information Technology and Libraries ◽

10.6017/ital.v40i3.13333 ◽

2021 ◽

Vol 40 (3) ◽

Author(s):

HyunSeung Koh ◽

Mark Fienup

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Academic Library ◽

Qualitative Evaluation ◽

Probabilistic Latent Semantic Analysis ◽

Analysis Tool ◽

Chat Reference ◽

Better Than

Library chat services are an increasingly important communication channel to connect patrons to library resources and services. Analysis of chat transcripts could provide librarians with insights into improving services. Unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. As a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from April 10, 2015, to May 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. In this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. Additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. This study found that from a human’s qualitative evaluation, Probabilistic Latent Semantic Analysis (pLSA) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. Interestingly, the commonly used technique Latent Dirichlet Allocation (LDA) did not necessarily perform better than pLSA. Also, semi-supervised techniques with human-curated anchor words of Correlation Explanation (CorEx) or guided LDA (GuidedLDA) did not necessarily perform better than an unsupervised technique of Dirichlet Multinomial Mixture (DMM). Last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes.

Download Full-text

Topic prediction and knowledge discovery based on integrated topic modeling and deep neural networks approaches

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202545 ◽

2021 ◽

pp. 1-17

Author(s):

Zeinab Shahbazi ◽

Yung-Cheol Byun

Keyword(s):

Machine Learning ◽

Knowledge Discovery ◽

Real World ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Machine Learning Techniques ◽

Probabilistic Latent Semantic Analysis ◽

Learning Techniques ◽

Dirichlet Allocation

Understanding the real-world short texts become an essential task in the recent research area. The document deduction analysis and latent coherent topic named as the important aspect of this process. Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) are suggested to model huge information and documents. This type of contexts’ main problem is the information limitation, words relationship, sparsity, and knowledge extraction. The knowledge discovery and machine learning techniques integrated with topic modeling were proposed to overcome this issue. The knowledge discovery was applied based on the hidden information extraction to increase the suitable dataset for further analysis. The integration of machine learning techniques, Artificial Neural Network (ANN) and Long Short-Term (LSTM) are applied to anticipate topic movements. LSTM layers are fed with latent topic distribution learned from the pre-trained Latent Dirichlet Allocation (LDA) model. We demonstrate general information from different techniques applied in short text topic modeling. We proposed three categories based on Dirichlet multinomial mixture, global word co-occurrences, and self-aggregation using representative design and analysis of all categories’ performance in different tasks. Finally, the proposed system evaluates with state-of-art methods on real-world datasets, comprises them with long document topic modeling algorithms, and creates a classification framework that considers further knowledge and represents it in the machine learning pipeline.

Download Full-text

Verification of Probabilistic Latent Semantic Analysis Clustering Solution Stability and Proposal of Optimal Initial Values Setting Method

Social Computing and Social Media: Applications in Marketing, Learning, and Health - Lecture Notes in Computer Science ◽

10.1007/978-3-030-77685-5_11 ◽

2021 ◽

pp. 130-146

Author(s):

Shinnosuke Terasawa ◽

Kohei Otake ◽

Takashi Namatame

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Solution Stability ◽

Initial Values

Download Full-text

Probabilistic latent semantic analysis of composite excitation-emission matrix fluorescence spectra of multicomponent system

Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy ◽

10.1016/j.saa.2020.118518 ◽

2020 ◽

Vol 239 ◽

pp. 118518

Author(s):

Keshav Kumar

Keyword(s):

Latent Semantic Analysis ◽

Fluorescence Spectra ◽

Semantic Analysis ◽

Multicomponent System ◽

Probabilistic Latent Semantic Analysis ◽

Excitation Emission Matrix

Download Full-text

The use of probabilistic latent semantic analysis to identify scientific subject spaces and to evaluate the completeness of covering the results of dissertation studies

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2020.209886 ◽

2020 ◽

Vol 4 (4 (106)) ◽

pp. 21-28

Author(s):

Petro Lizunov ◽

Andrii Biloshchytskyi ◽

Alexander Kuchansky ◽

Yurii Andrashko ◽

Svitlana Biloshchytska

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Scientific Subject

Download Full-text

Probabilistic Latent Semantic Analysis-Based Gear Fault Diagnosis Under Variable Working Conditions

IEEE Transactions on Instrumentation and Measurement ◽

10.1109/tim.2019.2925410 ◽

2020 ◽

Vol 69 (6) ◽

pp. 2845-2857 ◽

Cited By ~ 1

Author(s):

Chao Chen ◽

Fei Shen ◽

Jiawen Xu ◽

Ruqiang Yan

Keyword(s):

Fault Diagnosis ◽

Working Conditions ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Gear Fault ◽

Gear Fault Diagnosis

Download Full-text

Renormalization Analysis of Topic Models

Entropy ◽

10.3390/e22050556 ◽

2020 ◽

Vol 22 (5) ◽

pp. 556

Author(s):

Sergei Koltcov ◽

Vera Ignatenko

Keyword(s):

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Optimal Number ◽

Probabilistic Latent Semantic Analysis ◽

Model Parameters ◽

Grid Search ◽

Renormalization Procedure ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this process. The paper shows that a function of the output of topic modeling demonstrates self-similar behavior under variation of the number of clusters. Such behavior allows using a renormalization technique. A combination of renormalization procedure with the Renyi entropy approach allows for quick searching of the optimal number of topics. In this paper, the renormalization procedure is developed for the probabilistic Latent Semantic Analysis (pLSA), and the Latent Dirichlet Allocation model with variational Expectation–Maximization algorithm (VLDA) and the Latent Dirichlet Allocation model with granulated Gibbs sampling procedure (GLDA). The experiments were conducted on two test datasets with a known number of topics in two different languages and on one unlabeled test dataset with an unknown number of topics. The paper shows that the renormalization procedure allows for finding an approximation of the optimal number of topics at least 30 times faster than the grid search without significant loss of quality.

Download Full-text

A reformulation of pLSA for uncertainty estimation and hypothesis testing in bio-imaging

Bioinformatics ◽

10.1093/bioinformatics/btaa270 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4080-4087

Author(s):

P D Tar ◽

N A Thacker ◽

S Deepaisarn ◽

J P B O’Connor ◽

A W McMahon

Keyword(s):

Hypothesis Testing ◽

Degrees Of Freedom ◽

Semantic Analysis ◽

High Sensitivity ◽

Supplementary Information ◽

Probabilistic Latent Semantic Analysis ◽

Source Image ◽

Model Parameters ◽

Bio Imaging ◽

Magnetic Resonance Imaging Mri

Abstract Motivation Probabilistic latent semantic analysis (pLSA) is commonly applied to describe mass spectra (MS) images. However, the method does not provide certain outputs necessary for the quantitative scientific interpretation of data. In particular, it lacks assessment of statistical uncertainty and the ability to perform hypothesis testing. We show how linear Poisson modelling advances pLSA, giving covariances on model parameters and supporting χ2 testing for the presence/absence of MS signal components. As an example, this is useful for the identification of pathology in MALDI biological samples. We also show potential wider applicability, beyond MS, using magnetic resonance imaging (MRI) data from colorectal xenograft models. Results Simulations and MALDI spectra of a stroke-damaged rat brain show MS signals from pathological tissue can be quantified. MRI diffusion data of control and radiotherapy-treated tumours further show high sensitivity hypothesis testing for treatment effects. Successful χ2 and degrees-of-freedom are computed, allowing null-hypothesis thresholding at high levels of confidence. Availability and implementation Open-source image analysis software available from TINA Vision, www.tina-vision.net. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A two-step approach for interest estimation from gaze behavior in digital catalog browsing

Journal of Eye Movement Research ◽

10.16910/jemr.13.1.4 ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Kei Shimonishi ◽

Hiroaki Kawashima

Keyword(s):

Semantic Analysis ◽

Eye Gaze ◽

Probabilistic Latent Semantic Analysis ◽

Analysis Method ◽

Gaze Behavior ◽

Short Term ◽

Analysis Techniques ◽

Subsequent Step ◽

Conventional Behavior ◽

Term Analysis

While eye gaze data contain promising clues for inferring the interests of viewers of digital catalog content, viewers often dynamically switch their focus of attention. As a result, a direct application of conventional behavior analysis techniques, such as topic models, tends to be affected by items or attributes of little or no interest to the viewer. To overcome this limitation, we need to identify “when” the user compares items and to detect “which attribute types/values” reflect the user’s interest. This paper proposes a novel two-step approach to addressing these needs. Specifically, we introduce a likelihood-based short-term analysis method as the first step of the approach to simultaneously determine comparison phases of browsing and detect the attributes on which the viewer focuses, even when the attributes cannot be directly obtained from gaze points. Using probabilistic latent semantic analysis, we show that this short-term analysis step greatly improves the results of the subsequent step. The effectiveness of the framework is demonstrated in terms of the capability to extract combinations of attributes relevant to the viewer’s interest, which we call aspects, and also to estimate the interest described by these aspects.

Download Full-text

probabilistic latent semantic analysis
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Bert-Based Latent Semantic Analysis (Bert-LSA): A Case Study on Geospatial Data Technology and Application Trend Analysis

Topic Modeling as a Tool for Analyzing Library Chat Transcripts

Topic prediction and knowledge discovery based on integrated topic modeling and deep neural networks approaches

Verification of Probabilistic Latent Semantic Analysis Clustering Solution Stability and Proposal of Optimal Initial Values Setting Method

Probabilistic latent semantic analysis of composite excitation-emission matrix fluorescence spectra of multicomponent system

The use of probabilistic latent semantic analysis to identify scientific subject spaces and to evaluate the completeness of covering the results of dissertation studies

Probabilistic Latent Semantic Analysis-Based Gear Fault Diagnosis Under Variable Working Conditions

Renormalization Analysis of Topic Models

A reformulation of pLSA for uncertainty estimation and hypothesis testing in bio-imaging

A two-step approach for interest estimation from gaze behavior in digital catalog browsing

Export Citation Format

probabilistic latent semantic analysisRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Bert-Based Latent Semantic Analysis (Bert-LSA): A Case Study on Geospatial Data Technology and Application Trend Analysis

Topic Modeling as a Tool for Analyzing Library Chat Transcripts

Topic prediction and knowledge discovery based on integrated topic modeling and deep neural networks approaches

Verification of Probabilistic Latent Semantic Analysis Clustering Solution Stability and Proposal of Optimal Initial Values Setting Method

Probabilistic latent semantic analysis of composite excitation-emission matrix fluorescence spectra of multicomponent system

The use of probabilistic latent semantic analysis to identify scientific subject spaces and to evaluate the completeness of covering the results of dissertation studies

Probabilistic Latent Semantic Analysis-Based Gear Fault Diagnosis Under Variable Working Conditions

Renormalization Analysis of Topic Models

A reformulation of pLSA for uncertainty estimation and hypothesis testing in bio-imaging

A two-step approach for interest estimation from gaze behavior in digital catalog browsing

probabilistic latent semantic analysis
Recently Published Documents