social media text
Recently Published Documents


TOTAL DOCUMENTS

246
(FIVE YEARS 152)

H-INDEX

12
(FIVE YEARS 3)

Author(s):  
Jiatong Meng ◽  
Yucheng Chen

The traditional quasi-social relationship type prediction model obtains prediction results by analyzing and clustering the direct data. The prediction results are easily disturbed by noisy data, and the problems of low processing efficiency and accuracy of the traditional prediction model gradually appear as the amount of user data increases. To address the above problems, the research constructs a prediction model of user quasi-social relationship type based on social media text big data. After pre-processing the collected social media text big data, the interference data that affect the accuracy of non-model prediction are removed. The interaction information in the text data is mined based on the principle of similarity calculation, and semantic analysis and sentiment annotation are performed on the information content. On the basis of BP neural network, we construct a prediction model of user’s quasi-social relationship type. The performance test data of the model shows that the average prediction accuracy of the constructed model is 89.84%, and the model has low time complexity and higher processing efficiency, which is better than other traditional models.


Author(s):  
Edward Ombui ◽  
Lawrence Muchemi ◽  
Peter Wagacha

This study uses natural language processing to identify hate speech in social media codeswitched text. It trains nine models and tests their predictiveness in recognizing hate speech in a 50k human-annotated dataset. The article proposes a novel hierarchical approach that leverages Latent Dirichlet Analysis to develop topic models that assist build a high-level Psychosocial feature set we call PDC. PDC organizes words into word families, which helps capture codeswitching during preprocessing for supervised learning models. Informed by the duplex theory of hate, the PDC features are based on a hate speech annotation framework. Frequency-based models employing the PDC feature on tweets from the 2012 and 2017 Kenyan presidential elections yielded an f-score of 83 percent (precision: 81 percent, recall: 85 percent) in recognizing hate speech. The study is notable because it publicly exposes a rich codeswitched dataset for comparative studies. Second, it describes how to create a novel PDC feature set to detect subtle types of hate speech hidden in codeswitched data that previous approaches could not detect.


Author(s):  
B. Mounica ◽  
K. Lavanya

Due to urbanization Traffic management is one of the major issues in contemporary civic management, considering this circumstance traffic analysis is turning into the need of the present world. Text data generated by Twitter, Facebook and other social media platforms can be used for traffic management. Big data helps in traffic prediction and traffic analysis of advancing metropolitan zones. Constant traffic investigation requires preparing of information streams that are produced persistently to increase fast experiences. To measures stream information at a fast rate advancements on high figuring limit is required. Social media text data can be processed by using batch processing and stream processing with big data architecture through Spark and Hadoop framework. In this paper big data architecture is proposed for real time traffic text data analysis. In architecture Spark and Kafka are used in combination. Kafka helps in pipelines text data used in conjunction with spark stream processing engine. Big data architecture using Spark, Kafka with ability for processing and preparing huge measure of information, have settled the serious issue of handling and putting away constantly streaming data. The traffic information from Twitter API is streamed. In The proposed model pointed toward ensemble neural network model to reduce the variance in results for better prediction foreseeing traffic stream text data by incorporating Spark and Kafka that will be of an extraordinary incentive to the public authority for traffic management and analysis.


2021 ◽  
Vol 2070 (1) ◽  
pp. 012079
Author(s):  
V Jagadishwari ◽  
A Indulekha ◽  
Kiran Raghu ◽  
P Harshini

Abstract Social Media is an arena in recent times for people to share their perspectives on a variety of topics. Most of the social interactions are through the Social Media. Though all the Online Social Networks allow users to express their views and opinions in many forms like audio, video, text etc, the most popular form of expression is text, Emoticons and Emojis. The work presented in this paper aims at detecting the sentiments expressed in the Social Media posts. The Machine Learning Models namely Bernoulli Bayes, Multinomial Bayes, Regression and SVM were implemented. All these models were trained and tested with Twitter Data sets. Users on Twitter express their opinions in the form of tweets with limited characters. Tweets also contain Emoticons and Emojis therefore Twitter data sets are best suited for the sentiment analysis. The effect of emoticons present in the tweet is also analyzed. The models are first trained only with the text and then they are trained with text and emoticon in the tweet. The performance of all the four models in both cases are tested and the results are presented in the paper.


2021 ◽  
Author(s):  
Lilian Yanqing Li ◽  
Jason Schiffman ◽  
Elizabeth A Martin

There is a critical need for identifying time-sensitive and cost-effective markers of psychosis riskearly in the illness course. One solution may lie in affect dynamics, or the fluctuations of affect across time, which have been demonstrated to predict transitions in psychopathology. Across three studies, the current research is the first to comprehensively investigate affect dynamics in relation to subthreshold positive symptoms (perceptual aberration and magical ideation) and negative symptoms (social anhedonia) of the psychosis spectrum. Across multiple timescales and contexts, affect dynamics were modeled from inexpensive laboratory paradigms and social media text. Findings provided strong evidence for positive symptoms linked to heightened magnitude and frequency of affective fluctuations in response to emotional materials. Alternatively, negative symptoms showed modest association with heightened persistence of baseline states. These affect dynamic signatures of psychosis risk provide insight on the distinct developmental pathways to psychosis and could facilitate current risk detection approaches.


2021 ◽  
Vol 3 (4) ◽  
pp. 732-751
Author(s):  
Yusara Anwar ◽  
Nor Liza Ali

The language pyramid in the post-colonial territories as propounded by Melchers and Shaw in 2003 aptly reflects on the status of different languages in Pakistan. At the top is English, with its heritage as a colonial language. Then is the ‘national’ language Urdu which has nationalist value and is spoken by the majority as a lingua franca; while at the bottom of this hierarchy are the regional languages and their dialects. This hierarchy of languages has deeper repercussions enrooting stratification based on social class and commodification of languages. In this paper, this claim is substantiated by semiotic analysis of a social media text− an amateur video clip that went viral on Facebook in January 2021− in which owners of a high-end cafe in Islamabad mock their manager’s English. The video is only the tip of the iceberg of the symbolic and linguistic capitals of English in Pakistan. This analysis is further pleaded by the literature on the critical approach to language policy and planning (LPP). This critical approach can be traced back to the 1980s and Tollefson’s oft-cited book in 1991 that endeavors to situate LPP as a part of ongoing conflicts between the elites and the common masses. He regards that the evolution of the critical approach has widened its scope rendering it primarily sociocultural, dealing with the dynamics of status and prestige. Thus stated, this research attempts to converge the critical relational theory of Bourdieu with semiotics to address this issue of class discrimination based on the hegemony of English in Pakistan through a multimethodological approach.


2021 ◽  
Author(s):  
Yuting Guo ◽  
Yao Ge ◽  
Yuan-Chi Yang ◽  
Mohammed Ali Al-Garadi ◽  
Abeed Sarker

Motivation Pretrained contextual language models proposed in the recent past have been reported to achieve state-of-the-art performances in many natural language processing (NLP) tasks. There is a need to benchmark such models for targeted NLP tasks, and to explore effective pretraining strategies to improve machine learning performance. Results In this work, we addressed the task of health-related social media text classification. We benchmarked five models-RoBERTa, BERTweet, TwitterBERT, BioClinical_BERT, and BioBERT on 22 tasks. We attempted to boost performance for the best models by comparing distinct pretraining strategies-domain-adaptive pretraining (DAPT), source-adaptive pretraining (SAPT), and topic-specific pretraining (TSPT). RoBERTa and BERTweet performed comparably in most tasks, and better than others. For pretraining strategies, SAPT performed better or comparable to the off-the-shelf models, and significantly outperformed DAPT. SAPT+TSPT showed consistently high performance, with statistically significant improvement in one task. Our findings demonstrate that RoBERTa and BERTweet are excellent off-the-shelf models for health-related social media text classification, and extended pretraining using SAPT and TSPT can further improve performance.


Author(s):  
Shashi Shekhar ◽  
Hitendra Garg ◽  
Rohit Agrawal ◽  
Shivendra Shivani ◽  
Bhisham Sharma

AbstractThe paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data. The Hierarchical LSTM-based learning is a novel learning architecture inspired from the neural learning models. The proposed HLSTM model is trained to identify the hatred and trolling words available in social media contents. The proposed HLSTM systems model is equipped with self-learning and predicting mechanism for annotating hatred words in transliteration domain. The Hindi–English data are ordered into Hindi, English, and hatred labels for classification. The mechanism of word embedding and character-embedding features are used here for word representation in the sentence to detect hatred words. The method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence. Wide experiments suggests that the HLSTM-based classification model gives the accuracy of 97.49% when evaluated against the standard parameters like BLSTM, CRF, LR, SVM, Random Forest and Decision Tree models especially when there are some hatred and trolling words in the social media data.


Sign in / Sign up

Export Citation Format

Share Document