A word embedding topic model for topic detection and summary in social networks

The aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of massive event texts and the short-text sparsity problems of social networks. The problem also exists of unclear topics caused by the sparse distribution of topics. To solve the above challenge, we propose a novel word embedding topic model by combining the topic model and the continuous bag-of-words mode (Cbow) method in word embedding method, named Cbow Topic Model (CTM), for topic detection and summary in social networks. We conduct similar word clustering of the target social network text dataset by introducing the classic Cbow word vectorization method, which can effectively learn the internal relationship between words and reduce the dimensionality of the input texts. We employ the topic model-to-model short text for effectively weakening the sparsity problem of social network texts. To detect and summarize the topic, we propose a topic detection method by leveraging similarity computing for social networks. We collected a Sina microblog dataset to conduct various experiments. The experimental results demonstrate that the CTM method is superior to the existing topic model method.

Download Full-text

STABILITY OF TOPIC MODELING VIA MODALITY REGULARIZATION

Computational Linguistics and Intellectual Technologies ◽

10.28995/2075-7182-2020-19-198-210 ◽

2020 ◽

Author(s):

R. Derbanosov ◽

◽

M. Bakhanova ◽

◽

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Topic Model ◽

Side Information ◽

Auxiliary Information ◽

Discrete Distributions ◽

Probabilistic Latent Semantic Analysis ◽

Probabilistic Topic Modeling ◽

Random Initialization

Probabilistic topic modeling is a tool for statistical text analysis that can give us information about the inner structure of a large corpus of documents. The most popular models—Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation—produce topics in a form of discrete distributions over the set of all words of the corpus. They build topics using an iterative algorithm that starts from some random initialization and optimizes a loss function. One of the main problems of topic modeling is sensitivity to random initialization that means producing significantly different solutions from different initial points. Several studies showed that side information about documents may improve the overall quality of a topic model. In this paper, we consider the use of additional information in the context of the stability problem. We represent auxiliary information as an additional modality and use BigARTM library in order to perform experiments on several text collections. We show that using side information as an additional modality improves topics stability without significant quality loss of the model.

Download Full-text

Issues and Methods for Access, Storage, and Analysis of Data From Online Social Communities

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch015 ◽

2018 ◽

pp. 402-432

Author(s):

Christopher John Quinn ◽

Matthew James Quinn ◽

Alan Olinsky ◽

John Thomas Quinn

Keyword(s):

Social Network ◽

Data Storage ◽

Latent Semantic Analysis ◽

Information Diffusion ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Online Social Network ◽

Network Models ◽

Probabilistic Latent Semantic Analysis ◽

User Interactions

This chapter provides an overview for a number of important issues related to studying user interactions in an online social network. The approach of social network analysis is detailed along with important basic concepts for network models. The different ways of indicating influence within a network are provided by describing various measures such as degree centrality, betweenness centrality and closeness centrality. Network structure as represented by cliques and components with measures of connectedness defined by clustering and reciprocity are also included. With the large volume of data associated with social networks, the significance of data storage and sampling are discussed. Since verbal communication is significant within networks, textual analysis is reviewed with respect to classification techniques such as sentiment analysis and with respect to topic modeling specifically latent semantic analysis, probabilistic latent semantic analysis, latent Dirichlet allocation and alternatives. Another important area that is provided in detail is information diffusion.

Download Full-text

Multi-Task Topic Analysis Framework for Hallmarks of Cancer with Weak Supervision

Applied Sciences ◽

10.3390/app10030834 ◽

2020 ◽

Vol 10 (3) ◽

pp. 834

Author(s):

Erdenebileg Batbaatar ◽

Van-Huy Pham ◽

Keun Ho Ryu

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Topic Model ◽

Label Propagation ◽

Probabilistic Latent Semantic Analysis ◽

Hallmarks Of Cancer ◽

Weak Supervision ◽

Topic Analysis ◽

Cancer Data

The hallmarks of cancer represent an essential concept for discovering novel knowledge about cancer and for extracting the complexity of cancer. Due to the lack of topic analysis frameworks optimized specifically for cancer data, the studies on topic modeling in cancer research still have a strong challenge. Recently, deep learning (DL) based approaches were successfully employed to learn semantic and contextual information from scientific documents using word embeddings according to the hallmarks of cancer (HoC). However, those are only applicable to labeled data. There is a comparatively small number of documents that are labeled by experts. In the real world, there is a massive number of unlabeled documents that are available online. In this paper, we present a multi-task topic analysis (MTTA) framework to analyze cancer hallmark-specific topics from documents. The MTTA framework consists of three main subtasks: (1) cancer hallmark learning (CHL)—used to learn cancer hallmarks on existing labeled documents; (2) weak label propagation (WLP)—used to classify a large number of unlabeled documents with the pre-trained model in the CHL task; and (3) topic modeling (ToM)—used to discover topics for each hallmark category. In the CHL task, we employed a convolutional neural network (CNN) with pre-trained word embedding that represents semantic meanings obtained from an unlabeled large corpus. In the ToM task, we employed a latent topic model such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) model to catch the semantic information learned by the CNN model for topic analysis. To evaluate the MTTA framework, we collected a large number of documents related to lung cancer in a case study. We also conducted a comprehensive performance evaluation for the MTTA framework, comparing it with several approaches.

Download Full-text

Research on Sentiment Tendency and Evolution of Public Opinions in Social Networks of Smart City

Complexity ◽

10.1155/2020/9789431 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Yanni Liu ◽

Dongsheng Liu ◽

Yuwei Chen

Keyword(s):

Social Networks ◽

Social Network ◽

Sentiment Analysis ◽

Smart City ◽

Topic Model ◽

Rapid Development ◽

Mobile Internet ◽

Time Curve ◽

Public Opinions ◽

The Social

With the rapid development of mobile Internet, the social network has become an important platform for users to receive, release, and disseminate information. In order to get more valuable information and implement effective supervision on public opinions, it is necessary to study the public opinions, sentiment tendency, and the evolution of the hot events in social networks of a smart city. In view of social networks’ characteristics such as short text, rich topics, diverse sentiments, and timeliness, this paper conducts text modeling with words co-occurrence based on the topic model. Besides, the sentiment computing and the time factor are incorporated to construct the dynamic topic-sentiment mixture model (TSTS). Then, four hot events were randomly selected from the microblog as datasets to evaluate the TSTS model in terms of topic feature extraction, sentiment analysis, and time change. The results show that the TSTS model is better than the traditional models in topic extraction and sentiment analysis. Meanwhile, by fitting the time curve of hot events, the change rules of comments in the social network is obtained.

Download Full-text

A Study of Rumor Detection based on Social Network Topic Models Relationship

10.5753/brasnam.2020.11172 ◽

2020 ◽

Author(s):

Diogo Nolasco ◽

Jonice Oliveira

Keyword(s):

Social Networks ◽

Social Network ◽

Local Community ◽

Topic Model ◽

Fake News ◽

Detection Problem ◽

Model Method ◽

Show Evidence ◽

Rumor Detection ◽

Scientific Topic

The rumor detection problem on social networks has attracted considerable attention in recent years with the rise of concerns about fake news and disinformation. Most previous works focused on detecting rumors by individual messages, classifying whether a post or blog entry is considered a rumor or not. This paper proposes a method for rumor detection on topic-level that identifies whether a social topic related to a scientific topic is a rumor. We propose the use of a topic model method on social and scientific domains and correlate the topics found to detect the most prone to be rumors. Results applied in the Zika epidemic scenario show evidence that the least correlated topics contain a mix of rumors and local community discussions.

Download Full-text

Marketing and social networks: a criterion for detecting opinion leaders

European Journal of Management and Business Economics ◽

10.1108/ejmbe-10-2017-020 ◽

2017 ◽

Vol 26 (3) ◽

pp. 347-366 ◽

Cited By ~ 19

Author(s):

Arnaldo Mario Litterio ◽

Esteban Alberto Nantes ◽

Juan Manuel Larrosa ◽

Liliana Julia Gómez

Keyword(s):

Social Networks ◽

Social Network ◽

Online Social Networks ◽

Semantic Analysis ◽

Point Of View ◽

Eigenvector Centrality ◽

Content Type ◽

Sporting Event ◽

Centrality Metrics ◽

Proposed Model

Purpose The purpose of this paper is to use the practical application of tools provided by social network theory for the detection of potential influencers from the point of view of marketing within online communities. It proposes a method to detect significant actors based on centrality metrics. Design/methodology/approach A matrix is proposed for the classification of the individuals that integrate a social network based on the combination of eigenvector centrality and betweenness centrality. The model is tested on a Facebook fan page for a sporting event. NodeXL is used to extract and analyze information. Semantic analysis and agent-based simulation are used to test the model. Findings The proposed model is effective in detecting actors with the potential to efficiently spread a message in relation to the rest of the community, which is achieved from their position within the network. Social network analysis (SNA) and the proposed model, in particular, are useful to detect subgroups of components with particular characteristics that are not evident from other analysis methods. Originality/value This paper approaches the application of SNA to online social communities from an empirical and experimental perspective. Its originality lies in combining information from two individual metrics to understand the phenomenon of influence. Online social networks are gaining relevance and the literature that exists in relation to this subject is still fragmented and incipient. This paper contributes to a better understanding of this phenomenon of networks and the development of better tools to manage it through the proposal of a novel method.

Download Full-text

Identifying target audience on enterprise social network

Industrial Management & Data Systems ◽

10.1108/imds-01-2018-0007 ◽

2019 ◽

Vol 119 (1) ◽

pp. 111-128 ◽

Cited By ~ 3

Author(s):

Jianhong Luo ◽

Xuwei Pan ◽

Shixiong Wang ◽

Yujing Huang

Keyword(s):

Social Media ◽

Social Network ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Information Service ◽

Target Audience ◽

Business Practices ◽

Content Type ◽

Enterprise Social Media ◽

Enterprise Social Network

Purpose Delivering messages and information to potentially interested users is one of the distinguishing applications of online enterprise social network (ESN). The purpose of this paper is to provide insights to better understand the repost preferences of users and provide personalized information service in enterprise social media marketing. Design/methodology/approach It is accomplished by constructing a target audience identification framework. Repost preference latent Dirichlet allocation (RPLDA) topic model topic model is proposed to understand the mass user online repost preferences toward different contents. A topic-oriented preference metric is proposed to measure the preference degree of individual users. And the function of reposting forecasting is formulated to identify target audience. Findings The empirical research shows the following: a total of 20 percent of the repost users in ESN represent the key active users who are particularly interested in the latent topic of messages in ESN and fits Pareto distribution; and the target audience identification framework can successfully identify different target key users for messages with different latent topics. Practical implications The findings should motivate marketing managers to improve enterprise brand by identifying key target audience in ESN and marketing in a way that truthfully reflects personalized preferences. Originality/value This study runs counter to most current business practices, which tend to use simple popularity to seek important users. Adaptively and dynamically identifying target audience appears to have considerable potential, especially in the rapidly growing area of enterprise social media information service.

Download Full-text

Hashtag2Vec: Learning Hashtag Representation with Relational Hierarchical Embedding Model

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/480 ◽

2018 ◽

Cited By ~ 3

Author(s):

Jie Liu ◽

Zhicheng He ◽

Yalou Huang

Keyword(s):

Social Networks ◽

Information Retrieval ◽

Social Network ◽

Real World ◽

Heterogeneous Network ◽

Event Analysis ◽

Short Text ◽

Theme Discovery ◽

Real World Datasets ◽

Text Content

Hashtags have always been important elements in many social network platforms and micro-blog services. Semantic understanding of hashtags is a critical and fundamental task for many applications on social networks, such as event analysis, theme discovery, information retrieval, etc. However, this task is challenging due to the sparsity, polysemy, and synonymy of hashtags. In this paper, we investigate the problem of hashtag embedding by combining the short text content with the various heterogeneous relations in social networks. Specifically, we first establish a network with hashtags as its nodes. Hierarchically, each of the hashtag nodes is associated with a set of tweets and each tweet contains a set of words. Then we devise an embedding model, called Hashtag2Vec, which exploits multiple relations of hashtag-hashtag, hashtag-tweet, tweet-word, and word-word relations based on the hierarchical heterogeneous network. In addition to embedding the hashtags, our proposed framework is capable of embedding the short social texts as well. Extensive experiments are conducted on two real-world datasets, and the results demonstrate the effectiveness of the proposed method.

Download Full-text

Filtering and Classifying Relevant Short Text with a Few Seed Words

Data and Information Management ◽

10.2478/dim-2019-0011 ◽

2019 ◽

Vol 3 (3) ◽

pp. 165-186 ◽

Cited By ~ 1

Author(s):

Chenliang Li ◽

Shiqian Chen ◽

Yan Qi

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

State Of The Art ◽

Superior Performance ◽

Support Vector ◽

Short Text ◽

Text Filtering ◽

Supervised Classifiers ◽

Real World Datasets ◽

Weakly Supervised

Abstract Filtering out irrelevant documents and classifying the relevant ones into topical categories is a de facto task in many applications. However, supervised learning solutions require extravagant human efforts on document labeling. In this paper, we propose a novel seed-guided topic model for dataless short text classification and filtering, named SSCF. Without using any labeled documents, SSCF takes a few “seed words” for each category of interest, and conducts short text filtering and classification in a weakly supervised manner. To overcome the issues of data sparsity and imbalance, the short text collection is mapped to a collection of pseudodocuments, one for each word. SSCF infers two kinds of topics on pseudo-documents: category-topics and general-topics. Each category-topic is associated with one category of interest, covering the meaning of the latter. In SSCF, we devise a novel word relevance estimation process based on the seed words, for hidden topic inference. The dominating topic of a short text is identified through post inference and then used for filtering and classification. On two real-world datasets in two languages, experimental results show that our proposed SSCF consistently achieves better classification accuracy than state-of-the-art baselines. We also observe that SSCF can even achieve superior performance than the supervised classifiers supervised latent dirichlet allocation (sLDA) and support vector machine (SVM) on some testing tasks.

Download Full-text

A New Vector Representation of Short Texts for Classification

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/2/12 ◽

2019 ◽

Vol 17 (2) ◽

pp. 241-249

Author(s):

Yangyang Li ◽

Bo Liu

Keyword(s):

Text Classification ◽

Web Search ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Classification Performance ◽

New Method ◽

Data Sets ◽

Text Data ◽

Short Text ◽

Space Model

Short and sparse characteristics and synonyms and homonyms are main obstacles for short-text classification. In recent years, research on short-text classification has focused on expanding short texts but has barely guaranteed the validity of expanded words. This study proposes a new method to weaken these effects without external knowledge. The proposed method analyses short texts by using the topic model based on Latent Dirichlet Allocation (LDA), represents each short text by using a vector space model and presents a new method to adjust the vector of short texts. In the experiments, two open short-text data sets composed of google news and web search snippets are utilised to evaluate the classification performance and prove the effectiveness of our method.

Download Full-text