probabilistic topic model
Recently Published Documents


TOTAL DOCUMENTS

68
(FIVE YEARS 24)

H-INDEX

9
(FIVE YEARS 3)

2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Hongcheng Zou ◽  
Ziling Wei ◽  
Jinshu Su ◽  
Baokang Zhao ◽  
Yusheng Xia ◽  
...  

Website fingerprinting (WFP) attack enables identifying the websites a user is browsing even under the protection of privacy-enhancing technologies (PETs). Previous studies demonstrate that most machine-learning attacks need multiple types of features as input, thus inducing tremendous feature engineering work. However, we show the other alternative. That is, we present Probabilistic Fingerprinting (PF), a new website fingerprinting attack that merely leverages one type of features. They are produced by using a mathematical model PWFP that combines a probabilistic topic model with WFP for the first time, due to a finding that a plain text and the sequence file generated from a traffic instance are essentially the same. Experimental results show that the proposed new features are more distinguishing than the existing features. In a closed-world setting, PF attains a better accuracy performance (99.79% at most) than prior attacks on various datasets gathered in the scenarios of Shadowsocks, SSH, and TLS, respectively. Besides, even when the number of training instances drops to as few as 4, PF still reaches an accuracy of above 90%. In the more realistic open-world setting, PF attains a high true positive rate (TPR) and Bayes detection rate (BDR), and a low false positive rate (FPR) in all evaluations, which outperforms the other attacks. These results highlight that it is meaningful and possible to explore new features to improve the accuracy of WFP attacks.


2021 ◽  
Vol 11 (19) ◽  
pp. 9154
Author(s):  
Noemi Scarpato ◽  
Alessandra Pieroni ◽  
Michela Montorsi

To assess critically the scientific literature is a very challenging task; in general it requires analysing a lot of documents to define the state-of-the-art of a research field and classifying them. The documents classifier systems have tried to address this problem by different techniques such as probabilistic, machine learning and neural networks models. One of the most popular document classification approaches is the LDA (Latent Dirichlet Allocation), a probabilistic topic model. One of the main issues of the LDA approach is that the retrieved topics are a collection of terms with their probabilities and it does not have a human-readable form. This paper defines an approach to make LDA topics comprehensible for humans by the exploitation of the Word2Vec approach.


SAGE Open ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 215824402110315
Author(s):  
Eunhye Park ◽  
Junehee Kwon ◽  
Bongsug (Kevin) Chae ◽  
Sung-Bum Kim

This study aims to survey user-generated content (UGC) from diners in certified green restaurants, discover the green images they recall, and demonstrate the usefulness of applying a probabilistic topic model to comprehend customers’ perceptions. Postvisit online reviews ( N = 28,098), in the form of unstructured texts from the TripAdvisor.com website, were used to find freely recalled green-restaurant images. These data were preprocessed with a structural topic model (STM) algorithm to select 51 relevant categories of images. These image categories were compared with the findings of previous studies to discover unique restaurant attributes. Furthermore, a topic-level network and a green-restaurant network were drawn to discover the most easily recallable image categories and their attributes. This machine-learning-based approach improved the reproducibility of unstructured data analyses, overcoming the subjectivity of qualitative data analysis. Theoretical and practical implications are offered for topic modeling methodology along with marketing strategies for restaurateurs.


2021 ◽  
Author(s):  
Nicholas Buhagiar ◽  
Bahram Zahir ◽  
Abdolreza Abhari

The probabilistic topic model Latent Dirichlet Allocation (LDA) was deployed to model the themes of discourse in discussion threads on the social media aggregation website Reddit. Abstracting discussion threads as vectors of topic weights, these vectors were fed into several neural network architectures, each with a different number of hidden layers, to train machine learning models that could identify which discussion would be of interest for a given user to contribute. Using accuracy as the evaluation metric to determine which model framework achieved the best performance on a given user’s validation set, these selected models achieved an average accuracy of 66.1% on the test data for a sample set of 30 users. Using the predicted probabilities of interest made by these neural networks, recommender systems were further built and analyzed for each user.


2021 ◽  
Author(s):  
Nicholas Buhagiar ◽  
Bahram Zahir ◽  
Abdolreza Abhari

The probabilistic topic model Latent Dirichlet Allocation (LDA) was deployed to model the themes of discourse in discussion threads on the social media aggregation website Reddit. Abstracting discussion threads as vectors of topic weights, these vectors were fed into several neural network architectures, each with a different number of hidden layers, to train machine learning models that could identify which discussion would be of interest for a given user to contribute. Using accuracy as the evaluation metric to determine which model framework achieved the best performance on a given user’s validation set, these selected models achieved an average accuracy of 66.1% on the test data for a sample set of 30 users. Using the predicted probabilities of interest made by these neural networks, recommender systems were further built and analyzed for each user.


2021 ◽  
Vol 13 (2) ◽  
pp. 763
Author(s):  
Simona Fiandrino ◽  
Alberto Tonelli

The recent Review of the Non-Financial Reporting Directive (NFRD) aims to enhance adequate non-financial information (NFI) disclosure and improve accountability for stakeholders. This study focuses on this regulatory intervention and has a twofold objective: First, it aims to understand the main underlying issues at stake; second, it suggests areas of possible amendment considering the current debates on sustainability accounting and accounting for stakeholders. In keeping with these aims, the research analyzes the documents annexed to the contribution on the Review of the NFRD by conducting a text-mining analysis with latent Dirichlet allocation (LDA) probabilistic topic model (PTM). Our findings highlight four main topics at the core of the current debate: quality of NFI, standardization, materiality, and assurance. The research suggests ways of improving managerial policies to achieve more comparable, relevant, and reliable information by bringing value creation for stakeholders into accounting. It further addresses an integrated logic of accounting for stakeholders that contributes to sustainable development.


2020 ◽  
Vol 36 (18) ◽  
pp. 4757-4764
Author(s):  
Liran Juan ◽  
Yongtian Wang ◽  
Jingyi Jiang ◽  
Qi Yang ◽  
Guohua Wang ◽  
...  

Abstract Motivation Evaluating genome similarity among individuals is an essential step in data analysis. Advanced sequencing technology detects more and rarer variants for massive individual genomes, thus enabling individual-level genome similarity evaluation. However, the current methodologies, such as the principal component analysis (PCA), lack the capability to fully leverage rare variants and are also difficult to interpret in terms of population genetics. Results Here, we introduce a probabilistic topic model, latent Dirichlet allocation, to evaluate individual genome similarity. A total of 2535 individuals from the 1000 Genomes Project (KGP) were used to demonstrate our method. Various aspects of variant choice and model parameter selection were studied. We found that relatively rare (0.001<allele frequency < 0.175) and sparse (average interval > 20 000 bp) variants are more efficient for genome similarity evaluation. At least 100 000 such variants are necessary. In our results, the populations show significantly less mixed and more cohesive visualization than the PCA results. The global similarities among the KGP genomes are consistent with known geographical, historical and cultural factors. Availability and implementation The source code and data access are available at: https://github.com/lrjuan/LDA_genome. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 12 (12) ◽  
pp. 4830 ◽  
Author(s):  
Cecilia Elizabeth Bayas Aldaz ◽  
Jesus Rodriguez-Pomeda ◽  
Leyla Angélica Sandoval Hamón ◽  
Fernando Casani

This article provides a procedure to universities for understanding the social perception of their activities in the sustainability field, through the analysis of news published in the printed media. It identifies the Spanish news sources that have covered this issue the most and the topics that appear in that news coverage. Using a probabilistic topic model called Latent Dirichlet Allocation, the study includes the nine dominant topics within a corpus with more than seventeen thousand published news items (totaling approximately five and a quarter million words) from a database of almost thirteen hundred national press sources between 2014 and 2017. The study identifies the news sources that published the most news on the issue. It is also found that the amount of news on sustainability and universities declined during the covered period. The nine identified topics point towards the relevance of higher education institutions’ activities as drivers of sustainability. The social perception encapsulated within the topics signals how the public is interested in these activities. Therefore, we find some interesting relationships between sustainable development, higher education institutions’ missions and behaviors, governmental policies, university funding and governance, social and economic innovation, and green campuses in terms of the overall goal of sustainability.


2020 ◽  
Vol 30 (6) ◽  
pp. 1653-1667 ◽  
Author(s):  
Miguel-Angel Fernandez-Torres ◽  
Ivan Gonzalez-Diaz ◽  
Fernando Diaz-de-Maria

Sign in / Sign up

Export Citation Format

Share Document