Database “Pro-family (pronatalist) communities in the social network VKontakte”

The database contains uploading text comments from the social network VKontakte in .csv format (UTF-8 encoding). The comments are collected from communities discussing pregnancy, childhood, motherhood, etc. Uploading contains comments to posts with which the interaction took place. The absolute number of likes was used as a criterion (comments were collected where the number of likes is greater than or equal to 5). Text data was pre-processed (stemmization and lemmatization). The data is suitable for thematic analysis (e.g. LDA – Latent Dirichlet Allocation), for modelling the graph structure of communities (the link_comment variable contains a unique post identifier, link_author contains a unique user identifier), for analysis of tonalities of statements and formation of a dictionary of demographic connotation in Russian. Analysis of the tonalities of statements enables measuring the dynamics of “demographic temperature” in pro-family (pronatalist) communities.

Download Full-text

Database “Childfree (antinatalist) communities in the social network VKontakte”

Population and Economics ◽

10.3897/popecon.5.e70786 ◽

2021 ◽

Vol 5 (2) ◽

pp. 92-96

Author(s):

Irina E. Kalabikhina ◽

Evgeny P. Banin

Keyword(s):

Social Network ◽

Sentiment Analysis ◽

Thematic Analysis ◽

Latent Dirichlet Allocation ◽

Absolute Amount ◽

Graph Structure ◽

Unique Identifier ◽

Text Data ◽

Structure Of Communities ◽

The Social

The database contains an upload of text comments in Russian from the social network VKontakte in .csv format (UTF-8 encoding). The comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. The upload contains comments under the posts with which the interaction took place. The absolute amount of likes is used as a criterion (comments are collected where the number of likes is greater than or equal to 5). The text data is processed (stemmization and lemmatization). The data are suitable for thematic analysis (e.g. LDA — Latent Dirichlet Allocation), sentiment analysis of statements, modelling the graph structure of communities (the link_comment variable contains a unique identifier of the post, link_author contains a unique user identifier), and forming a dictionary of demographic connotation in Russian. Sentiment analysis of statements enables measuring the dynamics of «demographic temperature» in antinatalist communities. The database is a supplement to the publication Kalabikhina IE, Banin EP (2020) Database «Pro-family (pronatalist) communities in the social network VKontakte». Population and Economics 4(3): 98–130. https://doi.org/10.3897/popecon.4.e60915.

Download Full-text

Predicting the Mental State of the Social Network Users based on the Latent Dirichlet Allocation and fastText

10.1109/idaacs53288.2021.9661061 ◽

2021 ◽

Author(s):

Igor Kotenko ◽

Yash Sharma ◽

Alexander Branitskiy

Keyword(s):

Social Network ◽

Mental State ◽

Latent Dirichlet Allocation ◽

The Social ◽

Dirichlet Allocation

Download Full-text

A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.2.3811 ◽

2020 ◽

Vol 15 (2) ◽

Author(s):

Jia Luo ◽

Dongwen Yu ◽

Zong Dai

Keyword(s):

Machine Learning ◽

Fuzzy Clustering ◽

Latent Dirichlet Allocation ◽

Learning Model ◽

Machine Learning Algorithms ◽

Text Data ◽

Huge Data ◽

Machine Learning Model ◽

N Gram ◽

Dirichlet Allocation

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

Download Full-text

Determining the proximity of groups in social networks based on text analysis using big data

Information Technology and Nanotechnology ◽

10.18287/1613-0073-2019-2416-521-526 ◽

2019 ◽

pp. 521-526

Author(s):

A S Mukhin ◽

I A Rytsarev ◽

R A Paringer ◽

A V Kupriyanov ◽

D V Kirsh

Keyword(s):

Social Networks ◽

Big Data ◽

Social Network ◽

Data Collection ◽

Text Analysis ◽

Software Tool ◽

Text Data ◽

The Social ◽

Information Research ◽

Definition Of

The article is devoted to the definition of such groups in social networks. The object of the study was selected data social network Vk. Text data was collected, processed and analyzed. To solve the problem of obtaining the necessary information, research was conducted in the field of optimization of data collection of the social network Vk. A software tool that provides the collection and subsequent processing of the necessary data from the specified resources has been developed. The existing algorithms of text analysis, mainly of large volume, were investigated and applied.

Download Full-text

Filtering of Mobile Short Messaging Service Communication Using Latent Dirichlet Allocation with Social Network Analysis

Transactions on Engineering Technologies ◽

10.1007/978-94-017-8832-8_48 ◽

2014 ◽

pp. 671-686 ◽

Cited By ~ 3

Author(s):

Abiodun Modupe ◽

Oludayo O. Olugbara ◽

Sunday O. Ojo

Keyword(s):

Social Network ◽

Social Network Analysis ◽

Network Analysis ◽

Latent Dirichlet Allocation ◽

Short Messaging Service ◽

Dirichlet Allocation

Download Full-text

To the evolution model of network graph structure construction

Keldysh Institute Preprints ◽

10.20948/prepr-2021-24 ◽

2021 ◽

pp. 1-16

Author(s):

Yurii Nikolaevich Orlov ◽

Alexander Seraphimovich Pankratov

Keyword(s):

Social Network ◽

High Dimension ◽

Liouville Equation ◽

Evolution Model ◽

Structure Evolution ◽

Graph Structure ◽

Network Graph ◽

The Social

In this paper the investigation of the structure of network graph is presented. The social network between the Russian towns is considered. It is shown, that the distribution of vertex powers is uniform. As a consequence there is a high dimension region with whole connection. The probability of special sub-graphs is estimated. The Liouville equation is used for modeling of the graph structure evolution.

Download Full-text

Topic Categorization on Social Network Using Latent Dirichlet Allocation

Bonfring International Journal of Software Engineering and Soft Computing ◽

10.9756/bijsesc.8390 ◽

2018 ◽

Vol 8 (2) ◽

pp. 16-20 ◽

Cited By ~ 1

Author(s):

Ramyadharshni S.S. ◽

Pabitha Dr.P.

Keyword(s):

Social Network ◽

Latent Dirichlet Allocation ◽

Dirichlet Allocation

Download Full-text

Text data analysis using Latent Dirichlet Allocation: an application to FOMC transcripts

Applied Economics Letters ◽

10.1080/13504851.2020.1730748 ◽

2020 ◽

Vol 28 (1) ◽

pp. 38-42

Author(s):

Hali Edison ◽

Hector Carcel

Keyword(s):

Data Analysis ◽

Latent Dirichlet Allocation ◽

Text Data ◽

Text Data Analysis ◽

Dirichlet Allocation

Download Full-text

A Comparative Automated Text Analysis of Airbnb Reviews in Hong Kong and Singapore Using Latent Dirichlet Allocation

Sustainability ◽

10.3390/su12166673 ◽

2020 ◽

Vol 12 (16) ◽

pp. 6673 ◽

Cited By ~ 1

Author(s):

Kiattipoom Kiatkawsin ◽

Ian Sutherland ◽

Jin-Young Kim

Keyword(s):

Hong Kong ◽

Text Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Optimal Number ◽

Text Data ◽

Customer Reviews ◽

Gaining Insight ◽

Dirichlet Allocation

Airbnb has emerged as a platform where unique accommodation options can be found. Due to the uniqueness of each accommodation unit and host combination, each listing offers a one-of-a-kind experience. As consumers increasingly rely on text reviews of other customers, managers are also increasingly gaining insight from customer reviews. Thus, this present study aimed to extract those insights from reviews using latent Dirichlet allocation, an unsupervised type of topic modeling that extracts latent discussion topics from text data. Findings of Hong Kong’s 185,695 and Singapore’s 93,571 Airbnb reviews, two long-term rival destinations, were compared. Hong Kong produced 12 total topics that can be categorized into four distinct groups whereas Singapore’s optimal number of topics was only five. Topics produced from both destinations covered the same range of attributes, but Hong Kong’s 12 topics provide a greater degree of precision to formulate managerial recommendations. While many topics are similar to established hotel attributes, topics related to the host and listing management are unique to the Airbnb experience. The findings also revealed keywords used when evaluating the experience that provide more insight beyond typical numeric ratings.

Download Full-text

Ldagibbs: A Command for Topic Modeling in Stata Using Latent Dirichlet Allocation

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1801800107 ◽

2018 ◽

Vol 18 (1) ◽

pp. 101-117 ◽

Cited By ~ 10

Author(s):

Carlo Schwarz

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Topic Models ◽

Text Documents ◽

Text Data ◽

Dirichlet Allocation

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Download Full-text