scholarly journals Social Bots Detection via Fusing BERT and Graph Convolutional Networks

Symmetry ◽  
2021 ◽  
Vol 14 (1) ◽  
pp. 30
Author(s):  
Qinglang Guo ◽  
Haiyong Xie ◽  
Yangyang Li ◽  
Wen Ma ◽  
Chao Zhang

The online social media ecosystem is becoming more and more confused because of more and more fake information and the social media of malicious users’ fake content; at the same time, unspeakable pain has been brought to mankind. Social robot detection uses supervised classification based on artificial feature extraction. However, user privacy is also involved in using these methods, and the hidden feature information is also ignored, such as semi-supervised algorithms with low utilization rates and graph features. In this work, we symmetrically combine BERT and GCN (Graph Convolutional Network, GCN) and propose a novel model that combines large scale pretraining and transductive learning for social robot detection, BGSRD. BGSRD constructs a heterogeneous graph over the dataset and represents Twitter as nodes using BERT representations. Corpus learning via text graph convolution network is a single text graph, which is mainly built for corpus-based on word co-occurrence and document word relationship. BERT and GCN modules can be jointly trained in BGSRD to achieve the best of merit, training data and unlabeled test data can spread label influence through graph convolution and can be carried out in the large-scale pre-training of massive raw data and the transduction learning of joint learning representation. The experiment shows that a better performance can also be achieved by BGSRD on a wide range of social robot detection datasets.

2021 ◽  
pp. 120633122110193
Author(s):  
Max Holleran

Brutalist architecture is an object of fascination on social media that has taken on new popularity in recent years. This article, drawing on 3,000 social media posts in Russian and English, argues that the buildings stand out for their arresting scale and their association with the expanding state in the 1960s and 1970s. In both North Atlantic and Eastern European contexts, the aesthetic was employed in publicly financed urban planning projects, creating imposing concrete structures for universities, libraries, and government offices. While some online social media users associate the style with the overreach of both socialist and capitalist governments, others are more nostalgic. They use Brutalist buildings as a means to start conversations about welfare state goals of social housing, free university, and other services. They also lament that many municipal governments no longer have the capacity or vision to take on large-scale projects of reworking the built environment to meet contemporary challenges.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sakthi Kumar Arul Prakash ◽  
Conrad Tucker

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.


2020 ◽  
Vol 2020 ◽  
pp. 1-7 ◽  
Author(s):  
Aboubakar Nasser Samatin Njikam ◽  
Huan Zhao

This paper introduces an extremely lightweight (with just over around two hundred thousand parameters) and computationally efficient CNN architecture, named CharTeC-Net (Character-based Text Classification Network), for character-based text classification problems. This new architecture is composed of four building blocks for feature extraction. Each of these building blocks, except the last one, uses 1 × 1 pointwise convolutional layers to add more nonlinearity to the network and to increase the dimensions within each building block. In addition, shortcut connections are used in each building block to facilitate the flow of gradients over the network, but more importantly to ensure that the original signal present in the training data is shared across each building block. Experiments on eight standard large-scale text classification and sentiment analysis datasets demonstrate CharTeC-Net’s superior performance over baseline methods and yields competitive accuracy compared with state-of-the-art methods, although CharTeC-Net has only between 181,427 and 225,323 parameters and weighs less than 1 megabyte.


2018 ◽  
Author(s):  
Guangyu Wang ◽  
Hongyan Yin ◽  
Boyang Li ◽  
Chunlei Yu ◽  
Fan Wang ◽  
...  

ABSTRACTThe significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Here we first characterize lncRNAs by contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between ORF (open reading frame) length and GC content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.


2015 ◽  
Vol 26 (02) ◽  
pp. 197-204 ◽  
Author(s):  
Rajeev C. Saxena ◽  
Ashton E. Lehmann ◽  
A. Ed Hight ◽  
Keith Darrow ◽  
Aaron Remenschneider ◽  
...  

Background: More than 200,000 individuals worldwide have received a cochlear implant (CI). Social media Websites may provide a paramedical community for those who possess or are interested in a CI. The utilization patterns of social media by the CI community, however, have not been thoroughly investigated. Purpose: The purpose of this study was to investigate participation of the CI community in social media Websites. Research Design: We conducted a systematic survey of online CI-related social media sources. Using standard search engines, the search terms cochlear implant, auditory implant, forum, and blog identified relevant social media platforms and Websites. Social media participation was quantified by indices of membership and posts. Study Sample: Social media sources included Facebook, Twitter, YouTube, blogs, and online forums. Each source was assigned one of six functional categories based on its description. Intervention: No intervention was performed. Data Collection and Analysis: We conducted all online searches in February 2014. Total counts of each CI-related social media source were summed, and descriptive statistics were calculated. Results: More than 350 sources were identified, including 60 Facebook groups, 36 Facebook pages, 48 Twitter accounts, 121 YouTube videos, 13 forums, and 95 blogs. The most active online communities were Twitter accounts, which totaled 35,577 members, and Facebook groups, which totaled 17,971 members. CI users participated in Facebook groups primarily for general information/support (68%). Online forums were the next most active online communities by membership. The largest forum contained approximately 9,500 topics with roughly 127,000 posts. CI users primarily shared personal stories through blogs (92%), Twitter (71%), and YouTube (62%). Conclusions: The CI community engages in the use of a wide range of online social media sources. The CI community uses social media for support, advocacy, rehabilitation information, research endeavors, and sharing of personal experiences. Future studies are needed to investigate how social media Websites may be harnessed to improve patient-provider relationships and potentially used to augment patient education.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249993
Author(s):  
Paul X. McCarthy ◽  
Xian Gong ◽  
Sina Eghbal ◽  
Daniel S. Falster ◽  
Marian-Andrei Rizoiu

Ever since the web began, the number of websites has been growing exponentially. These websites cover an ever-increasing range of online services that fill a variety of social and economic functions across a growing range of industries. Yet the networked nature of the web, combined with the economics of preferential attachment, increasing returns and global trade, suggest that over the long run a small number of competitive giants are likely to dominate each functional market segment, such as search, retail and social media. Here we perform a large scale longitudinal study to quantify the distribution of attention given in the online environment to competing organisations. In two large online social media datasets, containing more than 10 billion posts and spanning more than a decade, we tally the volume of external links posted towards the organisations’ main domain name as a proxy for the online attention they receive. We also use the Common Crawl dataset—which contains the linkage patterns between more than a billion different websites—to study the patterns of link concentration over the past three years across the entire web. Lastly, we showcase the linking between economic, financial and market data by exploring the relationships between online attention on social media and the growth in enterprise value in the electric carmaker Tesla. Our analysis shows that despite the fact that we observe consistent growth in all the macro indicators—the total amount of online attention, in the number of organisations with an online presence, and in the functions they perform—we also observe that a smaller number of organisations account for an ever-increasing proportion of total user attention, usually with one large player dominating each function. These results highlight how evolution of the online economy involves innovation, diversity, and then competitive dominance.


GigaScience ◽  
2021 ◽  
Vol 10 (6) ◽  
Author(s):  
Sen Li ◽  
Zeyu Du ◽  
Xiangjie Meng ◽  
Yang Zhang

Abstract Motivation Malaria, a mosquito-borne infectious disease affecting humans and other animals, is widespread in tropical and subtropical regions. Microscopy is the most common method for diagnosing the malaria parasite from stained blood smear samples. However, this technique is time consuming and must be performed by a well-trained professional, yet it remains prone to errors. Distinguishing the multiple growth stages of parasites remains an especially challenging task. Results In this article, we develop a novel deep learning approach for the recognition of malaria parasites of various stages in blood smear images using a deep transfer graph convolutional network (DTGCN). To our knowledge, this is the first application of graph convolutional network (GCN) on multi-stage malaria parasite recognition in such images. The proposed DTGCN model is based on unsupervised learning by transferring knowledge learnt from source images that contain the discriminative morphology characteristics of multi-stage malaria parasites. This transferred information guarantees the effectiveness of the target parasite recognition. This approach first learns the identical representations from the source to establish topological correlations between source class groups and the unlabelled target samples. At this stage, the GCN is implemented to extract graph feature representations for multi-stage malaria parasite recognition. The proposed method showed higher accuracy and effectiveness in publicly available microscopic images of multi-stage malaria parasites compared to a wide range of state-of-the-art approaches. Furthermore, this method is also evaluated on a large-scale dataset of unseen malaria parasites and the Babesia dataset. Availability Code and dataset are available at https://github.com/senli2018/DTGCN_2021 under a MIT license.


Author(s):  
Lewis Mitchell ◽  
Joshua Dent ◽  
Joshua Ross

It is widely accepted that different online social media platforms produce different modes of communication, however the ways in which these modalities are shaped by the constraints of a particular platform remain difficult to quantify. On 7 November 2017 Twitter doubled the character limit for users to 280 characters, presenting a unique opportunity to study the response of this population to an exogenous change to the communication medium. Here we analyse a large dataset comprising 387 million English-language tweets (10% of all public tweets) collected over the September 2017--January 2018 period to quantify and explain large-scale changes in individual behaviour and communication patterns precipitated by the character-length change. Using statistical and natural language processing techniques we find that linguistic complexity increased after the change, with individuals writing at a significantly higher reading level. However, we find that some textual properties such as statistical language distribution remain invariant across the change, and are no different to writings in different online media. By fitting a generative mathematical model to the data we find a surprisingly slow response of the Twitter population to this exogenous change, with a substantial number of users taking a number of weeks to adjust to the new medium. In the talk we describe the model and Bayesian parameter estimation techniques used to make these inferences. Furthermore, we argue for mathematical models as an alternative exploratory methodology for "Big" social media datasets, empowering the researcher to make inferences about the human behavioural processes which underlie large-scale patterns and trends.


Author(s):  
Abishek Kashyap. S

Social networking is becoming so essential nowadays and is playing a bigger role in every man’s life for sharing information and knowledge. Also social network is used to see the everyday activities, photos, videos, political agendas and propagandas. Therefore, it is now becoming an important tool to stay updated in this dynamic world. With large chunks of data being generated every second, there is a growing concern about Data protection and user privacy in the Social media network. One of the major concerns being, ‘Fake Users’ - misusing the authorized user’s information like photos and videos without the authorized user’s permission and disguising oneself as a legitimate user. In our contemporary world, many fake profiles are being created for fraudulent activities like money making, malware / virus / Trojan distribution to use user data, especially with malicious intent. In this paper Java static watermarking is proposed. Java static watermark is used in our social media website in order to associate each user's footprint with respect to their unique ID, eliminating the crux of fake users. It is also very evident to say that data present on the cloud is no less prone to cyber-attacks. In this paper, integration of steganography methods for protection of sensitive data on the public cloud server is also proposed to validate its viability and its increased security. The Algorithms used ensure the individual information is kept secret and transmitted in a secure manner with user privacy preserving.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Muhammad Rivza Adrian ◽  
Muhammad Papuandivitama Putra ◽  
Muhammad Hilman Rafialdy ◽  
Nur Aini Rakhmawati

COVID-19 in Indonesia, has made the local government not remain silent. Several local governments in Indonesia have enacted regulations to reduce the growth of COVID-19 victims by limiting public meetings with Large-Scale Social Restrictions or LSSR. However, the implementation of this LSSR has received many comments from social media users, especially from Twitter. This research was conducted with the aim of analyzing the sentiment of implementing the LSSR with media tweets on the Twitter social media platform. The data that were successfully extracted were 466 tweet data with training data and test data having a ratio of 7 to 3. Then the data was calculated into 2 different algorithms to be compared, the first algorithm used was the Support Vector Machine (SVM) algorithm and Random Forest with the aim get the most accurate sentiment analysis results.


Sign in / Sign up

Export Citation Format

Share Document