Latent acoustic topic models for unstructured audio classification

We propose the notion of latent acoustic topics to capture contextual information embedded within a collection of audio signals. The central idea is to learn a probability distribution over a set of latent topics of a given audio clip in an unsupervised manner, assuming that there exist latent acoustic topics and each audio clip can be described in terms of those latent acoustic topics. In this regard, we use the latent Dirichlet allocation (LDA) to implement the acoustic topic models over elemental acoustic units, referred as acoustic words, and perform text-like audio signal processing. Experiments on audio tag classification with the BBC sound effects library demonstrate the usefulness of the proposed latent audio context modeling schemes. In particular, the proposed method is shown to be superior to other latent structure analysis methods, such as latent semantic analysis and probabilistic latent semantic analysis. We also demonstrate that topic models can be used as complementary features to content-based features and offer about 9% relative improvement in audio classification when combined with the traditional Gaussian mixture model (GMM)–Support Vector Machine (SVM) technique.

Download Full-text

Topic models: adding bigrams and taking account of the similarity between unigrams and bigrams

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v16r222 ◽

2015 ◽

pp. 215-234

Author(s):

М.А. Нокель ◽

Н.В. Лукашевич

Keyword(s):

Computational Linguistics ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Word Association ◽

Topic Models ◽

Computational Experiments ◽

Probabilistic Latent Semantic Analysis ◽

Parallel Corpora ◽

Text Collections

Представлены результаты экспериментов по добавлению биграмм в тематические модели и учету сходства между ними и униграммами. Предложен новый алгоритм PLSA-SIM, являющийся модификацией алгоритма построения тематических моделей PLSA (Probabilistic Latent Semantic Analysis). Предложенный алгоритм позволяет добавлять биграммы и учитывать сходство между ними и униграммными компонентами. Исследована возможность применения ассоциативных мер для выбора и последующего включения биграмм в тематические модели. В качестве текстовых коллекций взяты русскоязычная подборка статей из электронных банковских журналов, английские части корпусов параллельных текстов Europarl и JRC-Acquiz и англоязычный архив исследовательских работ по компьютерной лингвистике ACL Anthology. Выполненные эксперименты показывают, что существует подгруппа тестируемых мер, упорядочивающих биграммы таким образом, что при последующем их добавлении в предложенный алгоритм PLSA-SIM качество получающихся тематических моделей значительно повышается. Предложен новый итеративный алгоритм PLSA-ITER без учителя, позволяющий добавлять наиболее подходящие биграммы. Эксперименты показывают дальнейшее улучшение качества тематических моделей по сравнению с исходным алгоритмом PLSA. The results of experimental study of adding bigrams and taking account of the similarity between them and unigrams are discussed. A novel PLSA-SIM algorithm based on a modification of the original PLSA (Probabilistic Latent Semantic Analysis) algorithm is proposed. The proposed algorithm incorporates bigrams and takes into account the similarity between them and unigram components. Various word association measures are analyzed to integrate top-ranked bigrams into topic models. As target text collections, articles from various Russian electronic banking magazines, English parts of parallel corpora Europarl and JRC-Acquiz, and the English digital archive of research papers in computational linguistics (ACL Anthology) are chosen. The computational experiments show that there exists a subgroup of tested measures that produce top-ranked bigrams in such a way that their inclusion into the PLSA-SIM algorithm significantly improves the quality of topic models for all collections. A novel unsupervised iterative algorithm named PLSA-ITER is also proposed for adding the most relevant bigrams. The computational experiments show a further improvement in the quality of topic models compared to the PLSA algorithm.

Download Full-text

Issues and Methods for Access, Storage, and Analysis of Data From Online Social Communities

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch015 ◽

2018 ◽

pp. 402-432

Author(s):

Christopher John Quinn ◽

Matthew James Quinn ◽

Alan Olinsky ◽

John Thomas Quinn

Keyword(s):

Social Network ◽

Data Storage ◽

Latent Semantic Analysis ◽

Information Diffusion ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Online Social Network ◽

Network Models ◽

Probabilistic Latent Semantic Analysis ◽

User Interactions

This chapter provides an overview for a number of important issues related to studying user interactions in an online social network. The approach of social network analysis is detailed along with important basic concepts for network models. The different ways of indicating influence within a network are provided by describing various measures such as degree centrality, betweenness centrality and closeness centrality. Network structure as represented by cliques and components with measures of connectedness defined by clustering and reciprocity are also included. With the large volume of data associated with social networks, the significance of data storage and sampling are discussed. Since verbal communication is significant within networks, textual analysis is reviewed with respect to classification techniques such as sentiment analysis and with respect to topic modeling specifically latent semantic analysis, probabilistic latent semantic analysis, latent Dirichlet allocation and alternatives. Another important area that is provided in detail is information diffusion.

Download Full-text

Estimating Topic Modeling Performance with Sharma–Mittal Entropy

Entropy ◽

10.3390/e21070660 ◽

2019 ◽

Vol 21 (7) ◽

pp. 660 ◽

Cited By ~ 6

Author(s):

Sergei Koltcov ◽

Vera Ignatenko ◽

Olessia Koltsova

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Statistical Physics ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Model Parameters ◽

Theoretical Ground ◽

Text Documents ◽

Optimizing Model

Topic modeling is a popular approach for clustering text documents. However, current tools have a number of unsolved problems such as instability and a lack of criteria for selecting the values of model parameters. In this work, we propose a method to solve partially the problems of optimizing model parameters, simultaneously accounting for semantic stability. Our method is inspired by the concepts from statistical physics and is based on Sharma–Mittal entropy. We test our approach on two models: probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA) with Gibbs sampling, and on two datasets in different languages. We compare our approach against a number of standard metrics, each of which is able to account for just one of the parameters of our interest. We demonstrate that Sharma–Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do. Furthermore, we show that concepts from statistical physics can be used to contribute to theory construction for machine learning, a rapidly-developing sphere that currently lacks a consistent theoretical ground.

Download Full-text

Application of Support Vector Machine (SVM) in the Sentiment Analysis of Twitter DataSet

Applied Sciences ◽

10.3390/app10031125 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1125 ◽

Cited By ~ 1

Author(s):

Kai-Xu Han ◽

Wei Chien ◽

Chien-Ching Chiu ◽

Yu-Ting Cheng

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Kernel Function ◽

Latent Semantic Analysis ◽

Semantic Information ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Support Vector ◽

Analysis Model ◽

Fisher Kernel

At present, in the mainstream sentiment analysis methods represented by the Support Vector Machine, the vocabulary and the latent semantic information involved in the text are not well considered, and sentiment analysis of text is dependent overly on the statistics of sentiment words. Thus, a Fisher kernel function based on Probabilistic Latent Semantic Analysis is proposed in this paper for sentiment analysis by Support Vector Machine. The Fisher kernel function based on the model is derived from the Probabilistic Latent Semantic Analysis model. By means of this method, latent semantic information involving the probability characteristics can be used as the classification characteristics, along with the improvement of the effect of classification for support vector machine, and the problem of ignoring the latent semantic characteristics in text sentiment analysis can be addressed. The results show that the effect of the method proposed in this paper, compared with the comparison method, is obviously improved.

Download Full-text

Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/11430919_72 ◽

2005 ◽

pp. 622-631 ◽

Cited By ~ 5

Author(s):

Luo Si ◽

Rong Jin

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Gaussian Mixture ◽

Probabilistic Latent Semantic Analysis

Download Full-text

Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy

Entropy ◽

10.3390/e22040394 ◽

2020 ◽

Vol 22 (4) ◽

pp. 394

Author(s):

Sergei Koltcov ◽

Vera Ignatenko ◽

Zeyd Boukhers ◽

Steffen Staab

Keyword(s):

Gibbs Sampling ◽

Dirichlet Process ◽

Topic Modeling ◽

Statistical Physics ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Topic Models ◽

Renyi Entropy ◽

Rényi Entropy ◽

Probabilistic Latent Semantic Analysis

Topic modeling is a popular technique for clustering large collections of text documents. A variety of different types of regularization is implemented in topic modeling. In this paper, we propose a novel approach for analyzing the influence of different regularization types on results of topic modeling. Based on Renyi entropy, this approach is inspired by the concepts from statistical physics, where an inferred topical structure of a collection can be considered an information statistical system residing in a non-equilibrium state. By testing our approach on four models—Probabilistic Latent Semantic Analysis (pLSA), Additive Regularization of Topic Models (BigARTM), Latent Dirichlet Allocation (LDA) with Gibbs sampling, LDA with variational inference (VLDA)—we, first of all, show that the minimum of Renyi entropy coincides with the “true” number of topics, as determined in two labelled collections. Simultaneously, we find that Hierarchical Dirichlet Process (HDP) model as a well-known approach for topic number optimization fails to detect such optimum. Next, we demonstrate that large values of the regularization coefficient in BigARTM significantly shift the minimum of entropy from the topic number optimum, which effect is not observed for hyper-parameters in LDA with Gibbs sampling. We conclude that regularization may introduce unpredictable distortions into topic models that need further research.

Download Full-text

Similarity Detection Using Latent Semantic Analysis Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i8.124 ◽

2018 ◽

Vol 6 (8) ◽

pp. 102

Author(s):

Priyanka R. Patil ◽

Shital A. Patil

Keyword(s):

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Mining Method ◽

Research Papers ◽

Information Measures ◽

Automated Software ◽

Day By Day ◽

Ways Of Life ◽

Dirichlet Allocation

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.

Download Full-text