Tackling Cyber-Aggression: Identification and Fine-Grained Categorization of Aggressive Texts on Social Media using Weighted Ensemble of Transformers

2021 ◽  
Author(s):  
Omar Sharif ◽  
Mohammed Moshiul Hoque
Author(s):  
Yifan Gao ◽  
Yang Zhong ◽  
Daniel Preoţiuc-Pietro ◽  
Junyi Jessy Li

In computational linguistics, specificity quantifies how much detail is engaged in text. It is an important characteristic of speaker intention and language style, and is useful in NLP applications such as summarization and argumentation mining. Yet to date, expert-annotated data for sentence-level specificity are scarce and confined to the news genre. In addition, systems that predict sentence specificity are classifiers trained to produce binary labels (general or specific).We collect a dataset of over 7,000 tweets annotated with specificity on a fine-grained scale. Using this dataset, we train a supervised regression model that accurately estimates specificity in social media posts, reaching a mean absolute error of 0.3578 (for ratings on a scale of 1-5) and 0.73 Pearson correlation, significantly improving over baselines and previous sentence specificity prediction systems. We also present the first large-scale study revealing the social, temporal and mental health factors underlying language specificity on social media.


2018 ◽  
Vol 27 (3) ◽  
pp. 222-229 ◽  
Author(s):  
Faye Mishna ◽  
Cheryl Regehr ◽  
Ashley Lacombe-Duncan ◽  
Joanne Daciuk ◽  
Gwendolyn Fearing ◽  
...  

2020 ◽  
Vol 9 (9) ◽  
pp. 497
Author(s):  
Haydn Lawrence ◽  
Colin Robertson ◽  
Rob Feick ◽  
Trisalyn Nelson

Social media and other forms of volunteered geographic information (VGI) are used frequently as a source of fine-grained big data for research. While employing geographically referenced social media data for a wide array of purposes has become commonplace, the relevant scales over which these data apply to is typically unknown. For researchers to use VGI appropriately (e.g., aggregated to areal units (e.g., neighbourhoods) to elicit key trend or demographic information), general methods for assessing the quality are required, particularly, the explicit linkage of data quality and relevant spatial scales, as there are no accepted standards or sampling controls. We present a data quality metric, the Spatial-comprehensiveness Index (S-COM), which can delineate feasible study areas or spatial extents based on the quality of uneven and dynamic geographically referenced VGI. This scale-sensitive approach to analyzing VGI is demonstrated over different grains with data from two citizen science initiatives. The S-COM index can be used both to assess feasible study extents based on coverage, user-heterogeneity, and density and to find feasible sub-study areas from a larger, indefinite area. The results identified sub-study areas of VGI for focused analysis, allowing for a larger adoption of a similar methodology in multi-scale analyses of VGI.


2016 ◽  
Vol 35 (4) ◽  
pp. 444-461 ◽  
Author(s):  
Martin Hilbert ◽  
Javier Vásquez ◽  
Daniel Halpern ◽  
Sebastián Valenzuela ◽  
Eduardo Arriagada

The article analyzes the nature of communication flows during social conflicts via the digital platform Twitter. We gathered over 150,000 tweets from citizen protests for nine environmental social movements in Chile and used a mixed methods approach to show that long-standing paradigms for social mobilization and participation are neither replicated nor replaced but reshaped. In digital platforms, long-standing communication theories, like the 1955 two-step flow model, are still valid, while direct one-step flows and more complex network flows are also present. For example, we show that it is no contradiction that social media participants mainly refer to intermediating amplifiers of communicated messages (39% of the mentions from participants go through this two-step communication flow), while at the same time, traditional media outlets and official protest voices receive 80–90% of their mentions directly through a direct one-step flow from the same participants. While nonintuitive at first sight, Bayes’s theorem allows to detangle the different perspectives on the arising communication channel. We identify the strategic importance of a group of amplifying intermediaries in local positions of the networks, who coexist with specialized voices and professional media outlets at the center of the global network. We also show that direct personalized messages represent merely 20% of the total communication. This shows that the fine-grained digital footprint from social media enables us to go beyond simplistic views of a single all-encompassing step flow model for social communication. The resulting research agenda builds on long-standing theories with a new set of tools.


2021 ◽  
pp. 104893
Author(s):  
Mingxuan Dou ◽  
Yandong Wang ◽  
Yanyan Gu ◽  
Shihai Dong ◽  
Mengling Qiao ◽  
...  

2016 ◽  
Vol 18 (11) ◽  
pp. 2685-2702 ◽  
Author(s):  
Sonja Utz

This article uses a social capital framework to examine whether and how the use of three types of publicly accessible social media (LinkedIn, Twitter, Facebook) is related to professional informational benefits among a representative sample of Dutch online users. Professional informational benefits were conceptualized as the (timely) access to relevant information and being referred to career opportunities. The effect of content and structure of the respective online network on professional informational benefits was examined on the general (users vs. non-users of a platform) and more fine-grained level (within users of a specific platform). Overall, users of LinkedIn and Twitter reported higher informational benefits than non-users, whereas the Facebook users reported lower informational benefits. Posting about work and strategically selecting ties consistently predicted informational benefits. The network composition mattered most on LinkedIn; strong and weak ties predicted informational benefits. The results demonstrate the usefulness of the social capital framework.


Author(s):  
Zhaoxia Wang ◽  
Chee Seng Chong ◽  
Landy Lan ◽  
Yinping Yang ◽  
Seng Beng Ho ◽  
...  

2021 ◽  
Vol 11 (22) ◽  
pp. 10694
Author(s):  
Nora Alturayeif ◽  
Hamzah Luqman

The outbreak of coronavirus disease (COVID-19) has affected almost all of the countries of the world, and has had significant social and psychological effects on the population. Nowadays, social media platforms are being used for emotional self-expression towards current events, including the COVID-19 pandemic. The study of people’s emotions in social media is vital to understand the effect of this pandemic on mental health, in order to protect societies. This work aims to investigate to what extent deep learning models can assist in understanding society’s attitude in social media toward COVID-19 pandemic. We employ two transformer-based models for fine-grained sentiment detection of Arabic tweets, considering that more than one emotion can co-exist in the same tweet. We also show how the textual representation of emojis can boost the performance of sentiment analysis. In addition, we propose a dynamically weighted loss function (DWLF) to handle the issue of imbalanced datasets. The proposed approach has been evaluated on two datasets and the attained results demonstrate that the proposed BERT-based models with emojis replacement and DWLF technique can improve the sentiment detection of multi-dialect Arabic tweets with an F1-Micro score of 0.72.


2022 ◽  
Vol 40 (4) ◽  
pp. 1-28
Author(s):  
Peng Zhang ◽  
Baoxi Liu ◽  
Tun Lu ◽  
Xianghua Ding ◽  
Hansu Gu ◽  
...  

User-generated contents (UGC) in social media are the direct expression of users’ interests, preferences, and opinions. User behavior prediction based on UGC has increasingly been investigated in recent years. Compared to learning a person’s behavioral patterns in each social media site separately, jointly predicting user behavior in multiple social media sites and complementing each other (cross-site user behavior prediction) can be more accurate. However, cross-site user behavior prediction based on UGC is a challenging task due to the difficulty of cross-site data sampling, the complexity of UGC modeling, and uncertainty of knowledge sharing among different sites. For these problems, we propose a Cross-Site Multi-Task (CSMT) learning method to jointly predict user behavior in multiple social media sites. CSMT mainly derives from the hierarchical attention network and multi-task learning. Using this method, the UGC in each social media site can obtain fine-grained representations in terms of words, topics, posts, hashtags, and time slices as well as the relevances among them, and prediction tasks in different social media sites can be jointly implemented and complement each other. By utilizing two cross-site datasets sampled from Weibo, Douban, Facebook, and Twitter, we validate our method’s superiority on several classification metrics compared with existing related methods.


Sign in / Sign up

Export Citation Format

Share Document