probabilistic topic model Latest Research Papers

A Probabilistic Topic Model based on Short Distance Co-occurrences

Expert Systems with Applications ◽

10.1016/j.eswa.2022.116518 ◽

2022 ◽

pp. 116518

Author(s):

Marziea Rahimi ◽

Morteza Zahedi ◽

Hoda Mashayekhi

Keyword(s):

Short Distance ◽

Topic Model ◽

Model Based ◽

Probabilistic Topic Model

PF : Website Fingerprinting Attack Using Probabilistic Topic Model

Security and Communication Networks ◽

10.1155/2021/3265300 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Hongcheng Zou ◽

Ziling Wei ◽

Jinshu Su ◽

Baokang Zhao ◽

Yusheng Xia ◽

...

Keyword(s):

Topic Model ◽

False Positive Rate ◽

True Positive Rate ◽

The Other ◽

Open World ◽

Closed World ◽

Probabilistic Topic Model ◽

High True Positive Rate ◽

Positive Rate ◽

Fingerprinting Attack

Website fingerprinting (WFP) attack enables identifying the websites a user is browsing even under the protection of privacy-enhancing technologies (PETs). Previous studies demonstrate that most machine-learning attacks need multiple types of features as input, thus inducing tremendous feature engineering work. However, we show the other alternative. That is, we present Probabilistic Fingerprinting (PF), a new website fingerprinting attack that merely leverages one type of features. They are produced by using a mathematical model PWFP that combines a probabilistic topic model with WFP for the first time, due to a finding that a plain text and the sequence file generated from a traffic instance are essentially the same. Experimental results show that the proposed new features are more distinguishing than the existing features. In a closed-world setting, PF attains a better accuracy performance (99.79% at most) than prior attacks on various datasets gathered in the scenarios of Shadowsocks, SSH, and TLS, respectively. Besides, even when the number of training instances drops to as few as 4, PF still reaches an accuracy of above 90%. In the more realistic open-world setting, PF attains a high true positive rate (TPR) and Bayes detection rate (BDR), and a low false positive rate (FPR) in all evaluations, which outperforms the other attacks. These results highlight that it is meaningful and possible to explore new features to improve the accuracy of WFP attacks.

SPUCL (Scientific Publication Classifier): A Human-Readable Labelling System for Scientific Publications

Applied Sciences ◽

10.3390/app11199154 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9154

Author(s):

Noemi Scarpato ◽

Alessandra Pieroni ◽

Michela Montorsi

Keyword(s):

Latent Dirichlet Allocation ◽

Scientific Literature ◽

Topic Model ◽

Scientific Publication ◽

Research Field ◽

Classifier Systems ◽

Scientific Publications ◽

Probabilistic Topic Model ◽

Probabilistic Machine Learning ◽

Labelling System

To assess critically the scientific literature is a very challenging task; in general it requires analysing a lot of documents to define the state-of-the-art of a research field and classifying them. The documents classifier systems have tried to address this problem by different techniques such as probabilistic, machine learning and neural networks models. One of the most popular document classification approaches is the LDA (Latent Dirichlet Allocation), a probabilistic topic model. One of the main issues of the LDA approach is that the retrieved topics are a collection of terms with their probabilities and it does not have a human-readable form. This paper defines an approach to make LDA topics comprehensible for humans by the exploitation of the Word2Vec approach.

What Are the Salient and Memorable Green-Restaurant Attributes? Capturing Customer Perceptions From User-Generated Content

SAGE Open ◽

10.1177/21582440211031546 ◽

2021 ◽

Vol 11 (3) ◽

pp. 215824402110315

Author(s):

Eunhye Park ◽

Junehee Kwon ◽

Bongsug (Kevin) Chae ◽

Sung-Bum Kim

Keyword(s):

Qualitative Data ◽

Topic Model ◽

Online Reviews ◽

Marketing Strategies ◽

User Generated Content ◽

Customer Perceptions ◽

Modeling Methodology ◽

Probabilistic Topic Model ◽

Practical Implications ◽

Structural Topic Model

This study aims to survey user-generated content (UGC) from diners in certified green restaurants, discover the green images they recall, and demonstrate the usefulness of applying a probabilistic topic model to comprehend customers’ perceptions. Postvisit online reviews ( N = 28,098), in the form of unstructured texts from the TripAdvisor.com website, were used to find freely recalled green-restaurant images. These data were preprocessed with a structural topic model (STM) algorithm to select 51 relevant categories of images. These image categories were compared with the findings of previous studies to discover unique restaurant attributes. Furthermore, a topic-level network and a green-restaurant network were drawn to discover the most easily recallable image categories and their attributes. This machine-learning-based approach improved the reproducibility of unstructured data analyses, overcoming the subjectivity of qualitative data analysis. Theoretical and practical implications are offered for topic modeling methodology along with marketing strategies for restaurateurs.

Identifying User Interests In An Online Discussion Forum With Deep Learning

10.32920/ryerson.14654349.v1 ◽

2021 ◽

Author(s):

Nicholas Buhagiar ◽

Bahram Zahir ◽

Abdolreza Abhari

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Model Framework ◽

User Interests ◽

Online Discussion Forum ◽

Probabilistic Topic Model ◽

Average Accuracy ◽

Discussion Threads ◽

Validation Set ◽

Evaluation Metric

The probabilistic topic model Latent Dirichlet Allocation (LDA) was deployed to model the themes of discourse in discussion threads on the social media aggregation website Reddit. Abstracting discussion threads as vectors of topic weights, these vectors were fed into several neural network architectures, each with a different number of hidden layers, to train machine learning models that could identify which discussion would be of interest for a given user to contribute. Using accuracy as the evaluation metric to determine which model framework achieved the best performance on a given user’s validation set, these selected models achieved an average accuracy of 66.1% on the test data for a sample set of 30 users. Using the predicted probabilities of interest made by these neural networks, recommender systems were further built and analyzed for each user.

Identifying User Interests In An Online Discussion Forum With Deep Learning

10.32920/ryerson.14654349 ◽

2021 ◽

Author(s):

Nicholas Buhagiar ◽

Bahram Zahir ◽

Abdolreza Abhari

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Model Framework ◽

User Interests ◽

Online Discussion Forum ◽

Probabilistic Topic Model ◽

Average Accuracy ◽

Discussion Threads ◽

Validation Set ◽

Evaluation Metric

The probabilistic topic model Latent Dirichlet Allocation (LDA) was deployed to model the themes of discourse in discussion threads on the social media aggregation website Reddit. Abstracting discussion threads as vectors of topic weights, these vectors were fed into several neural network architectures, each with a different number of hidden layers, to train machine learning models that could identify which discussion would be of interest for a given user to contribute. Using accuracy as the evaluation metric to determine which model framework achieved the best performance on a given user’s validation set, these selected models achieved an average accuracy of 66.1% on the test data for a sample set of 30 users. Using the predicted probabilities of interest made by these neural networks, recommender systems were further built and analyzed for each user.

A Text-Mining Analysis on the Review of the Non-Financial Reporting Directive: Bringing Value Creation for Stakeholders into Accounting

Sustainability ◽

10.3390/su13020763 ◽

2021 ◽

Vol 13 (2) ◽

pp. 763

Author(s):

Simona Fiandrino ◽

Alberto Tonelli

Keyword(s):

Text Mining ◽

Financial Reporting ◽

Value Creation ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Current Debate ◽

The Core ◽

Probabilistic Topic Model ◽

Integrated Logic

The recent Review of the Non-Financial Reporting Directive (NFRD) aims to enhance adequate non-financial information (NFI) disclosure and improve accountability for stakeholders. This study focuses on this regulatory intervention and has a twofold objective: First, it aims to understand the main underlying issues at stake; second, it suggests areas of possible amendment considering the current debates on sustainability accounting and accounting for stakeholders. In keeping with these aims, the research analyzes the documents annexed to the contribution on the Review of the NFRD by conducting a text-mining analysis with latent Dirichlet allocation (LDA) probabilistic topic model (PTM). Our findings highlight four main topics at the core of the current debate: quality of NFI, standardization, materiality, and assurance. The research suggests ways of improving managerial policies to achieve more comparable, relevant, and reliable information by bringing value creation for stakeholders into accounting. It further addresses an integrated logic of accounting for stakeholders that contributes to sustainable development.

Evaluating individual genome similarity with a topic model

Bioinformatics ◽

10.1093/bioinformatics/btaa583 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4757-4764

Author(s):

Liran Juan ◽

Yongtian Wang ◽

Jingyi Jiang ◽

Qi Yang ◽

Guohua Wang ◽

...

Keyword(s):

Latent Dirichlet Allocation ◽

Rare Variants ◽

Topic Model ◽

Principal Component ◽

Data Access ◽

Supplementary Information ◽

Sequencing Technology ◽

Individual Genome ◽

Individual Level ◽

Probabilistic Topic Model

Abstract Motivation Evaluating genome similarity among individuals is an essential step in data analysis. Advanced sequencing technology detects more and rarer variants for massive individual genomes, thus enabling individual-level genome similarity evaluation. However, the current methodologies, such as the principal component analysis (PCA), lack the capability to fully leverage rare variants and are also difficult to interpret in terms of population genetics. Results Here, we introduce a probabilistic topic model, latent Dirichlet allocation, to evaluate individual genome similarity. A total of 2535 individuals from the 1000 Genomes Project (KGP) were used to demonstrate our method. Various aspects of variant choice and model parameter selection were studied. We found that relatively rare (0.001<allele frequency < 0.175) and sparse (average interval > 20 000 bp) variants are more efficient for genome similarity evaluation. At least 100 000 such variants are necessary. In our results, the populations show significantly less mixed and more cohesive visualization than the PCA results. The global similarities among the KGP genomes are consistent with known geographical, historical and cultural factors. Availability and implementation The source code and data access are available at: https://github.com/lrjuan/LDA_genome. Supplementary information Supplementary data are available at Bioinformatics online.

Understanding the University-Sustainability Link through Media: A Spanish Perspective

Sustainability ◽

10.3390/su12124830 ◽

2020 ◽

Vol 12 (12) ◽

pp. 4830 ◽

Cited By ~ 1

Author(s):

Cecilia Elizabeth Bayas Aldaz ◽

Jesus Rodriguez-Pomeda ◽

Leyla Angélica Sandoval Hamón ◽

Fernando Casani

Keyword(s):

Higher Education ◽

Social Perception ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Higher Education Institutions ◽

News Coverage ◽

News Sources ◽

University Funding ◽

Probabilistic Topic Model ◽

The Social

This article provides a procedure to universities for understanding the social perception of their activities in the sustainability field, through the analysis of news published in the printed media. It identifies the Spanish news sources that have covered this issue the most and the topics that appear in that news coverage. Using a probabilistic topic model called Latent Dirichlet Allocation, the study includes the nine dominant topics within a corpus with more than seventeen thousand published news items (totaling approximately five and a quarter million words) from a database of almost thirteen hundred national press sources between 2014 and 2017. The study identifies the news sources that published the most news on the issue. It is also found that the amount of news on sustainability and universities declined during the covered period. The nine identified topics point towards the relevance of higher education institutions’ activities as drivers of sustainability. The social perception encapsulated within the topics signals how the public is interested in these activities. Therefore, we find some interesting relationships between sustainable development, higher education institutions’ missions and behaviors, governmental policies, university funding and governance, social and economic innovation, and green campuses in terms of the overall goal of sustainability.

Probabilistic Topic Model for Context-Driven Visual Attention Understanding

IEEE Transactions on Circuits and Systems for Video Technology ◽

10.1109/tcsvt.2019.2909427 ◽

2020 ◽

Vol 30 (6) ◽

pp. 1653-1667 ◽

Cited By ~ 1

Author(s):

Miguel-Angel Fernandez-Torres ◽

Ivan Gonzalez-Diaz ◽

Fernando Diaz-de-Maria

Keyword(s):

Visual Attention ◽

Topic Model ◽

Probabilistic Topic Model

probabilistic topic model
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Probabilistic Topic Model based on Short Distance Co-occurrences

PF : Website Fingerprinting Attack Using Probabilistic Topic Model

SPUCL (Scientific Publication Classifier): A Human-Readable Labelling System for Scientific Publications

What Are the Salient and Memorable Green-Restaurant Attributes? Capturing Customer Perceptions From User-Generated Content

Identifying User Interests In An Online Discussion Forum With Deep Learning

Identifying User Interests In An Online Discussion Forum With Deep Learning

A Text-Mining Analysis on the Review of the Non-Financial Reporting Directive: Bringing Value Creation for Stakeholders into Accounting

Evaluating individual genome similarity with a topic model

Understanding the University-Sustainability Link through Media: A Spanish Perspective

Probabilistic Topic Model for Context-Driven Visual Attention Understanding

Export Citation Format

probabilistic topic modelRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Probabilistic Topic Model based on Short Distance Co-occurrences

PF : Website Fingerprinting Attack Using Probabilistic Topic Model

SPUCL (Scientific Publication Classifier): A Human-Readable Labelling System for Scientific Publications

What Are the Salient and Memorable Green-Restaurant Attributes? Capturing Customer Perceptions From User-Generated Content

Identifying User Interests In An Online Discussion Forum With Deep Learning

Identifying User Interests In An Online Discussion Forum With Deep Learning

A Text-Mining Analysis on the Review of the Non-Financial Reporting Directive: Bringing Value Creation for Stakeholders into Accounting

Evaluating individual genome similarity with a topic model

Understanding the University-Sustainability Link through Media: A Spanish Perspective

Probabilistic Topic Model for Context-Driven Visual Attention Understanding

probabilistic topic model
Recently Published Documents