scholarly journals Database “Pro-family (pronatalist) communities in the social network VKontakte”

2020 ◽  
Vol 4 (3) ◽  
pp. 98-130
Author(s):  
Irina E. Kalabikhina ◽  
Evgeny P. Banin

The database contains uploading text comments from the social network VKontakte in .csv format (UTF-8 encoding). The comments are collected from communities discussing pregnancy, childhood, motherhood, etc. Uploading contains comments to posts with which the interaction took place. The absolute number of likes was used as a criterion (comments were collected where the number of likes is greater than or equal to 5). Text data was pre-processed (stemmization and lemmatization). The data is suitable for thematic analysis (e.g. LDA – Latent Dirichlet Allocation), for modelling the graph structure of communities (the link_comment variable contains a unique post identifier, link_author contains a unique user identifier), for analysis of tonalities of statements and formation of a dictionary of demographic connotation in Russian. Analysis of the tonalities of statements enables measuring the dynamics of “demographic temperature” in pro-family (pronatalist) communities.

2021 ◽  
Vol 5 (2) ◽  
pp. 92-96
Author(s):  
Irina E. Kalabikhina ◽  
Evgeny P. Banin

The database contains an upload of text comments in Russian from the social network VKontakte in .csv format (UTF-8 encoding). The comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. The upload contains comments under the posts with which the interaction took place. The absolute amount of likes is used as a criterion (comments are collected where the number of likes is greater than or equal to 5). The text data is processed (stemmization and lemmatization). The data are suitable for thematic analysis (e.g. LDA — Latent Dirichlet Allocation), sentiment analysis of statements, modelling the graph structure of communities (the link_comment variable contains a unique identifier of the post, link_author contains a unique user identifier), and forming a dictionary of demographic connotation in Russian. Sentiment analysis of statements enables measuring the dynamics of «demographic temperature» in antinatalist communities. The database is a supplement to the publication Kalabikhina IE, Banin EP (2020) Database «Pro-family (pronatalist) communities in the social network VKontakte». Population and Economics 4(3): 98–130. https://doi.org/10.3897/popecon.4.e60915.


Author(s):  
Jia Luo ◽  
Dongwen Yu ◽  
Zong Dai

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.


Author(s):  
A S Mukhin ◽  
I A Rytsarev ◽  
R A Paringer ◽  
A V Kupriyanov ◽  
D V Kirsh

The article is devoted to the definition of such groups in social networks. The object of the study was selected data social network Vk. Text data was collected, processed and analyzed. To solve the problem of obtaining the necessary information, research was conducted in the field of optimization of data collection of the social network Vk. A software tool that provides the collection and subsequent processing of the necessary data from the specified resources has been developed. The existing algorithms of text analysis, mainly of large volume, were investigated and applied.


2021 ◽  
pp. 1-16
Author(s):  
Yurii Nikolaevich Orlov ◽  
Alexander Seraphimovich Pankratov

In this paper the investigation of the structure of network graph is presented. The social network between the Russian towns is considered. It is shown, that the distribution of vertex powers is uniform. As a consequence there is a high dimension region with whole connection. The probability of special sub-graphs is estimated. The Liouville equation is used for modeling of the graph structure evolution.


2020 ◽  
Vol 12 (16) ◽  
pp. 6673 ◽  
Author(s):  
Kiattipoom Kiatkawsin ◽  
Ian Sutherland ◽  
Jin-Young Kim

Airbnb has emerged as a platform where unique accommodation options can be found. Due to the uniqueness of each accommodation unit and host combination, each listing offers a one-of-a-kind experience. As consumers increasingly rely on text reviews of other customers, managers are also increasingly gaining insight from customer reviews. Thus, this present study aimed to extract those insights from reviews using latent Dirichlet allocation, an unsupervised type of topic modeling that extracts latent discussion topics from text data. Findings of Hong Kong’s 185,695 and Singapore’s 93,571 Airbnb reviews, two long-term rival destinations, were compared. Hong Kong produced 12 total topics that can be categorized into four distinct groups whereas Singapore’s optimal number of topics was only five. Topics produced from both destinations covered the same range of attributes, but Hong Kong’s 12 topics provide a greater degree of precision to formulate managerial recommendations. While many topics are similar to established hotel attributes, topics related to the host and listing management are unique to the Airbnb experience. The findings also revealed keywords used when evaluating the experience that provide more insight beyond typical numeric ratings.


Author(s):  
Carlo Schwarz

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.


Sign in / Sign up

Export Citation Format

Share Document