scholarly journals Automatic Categorization of LGBT User Profiles on Twitter with Machine Learning

Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1822
Author(s):  
Amir Karami ◽  
Morgan Lundy ◽  
Frank Webb ◽  
Hannah R. Boyajieff ◽  
Michael Zhu ◽  
...  

Privacy needs and stigma pose significant barriers to lesbian, gay, bisexual, and transgender (LGBT) people sharing information related to their identities in traditional settings and research methods such as surveys and interviews. Fortunately, social media facilitates people’s belonging to and exchanging information within online LGBT communities. Compared to heterosexual respondents, LGBT users are also more likely to have accounts on social media websites and access social media daily. However, the current relevant LGBT studies on social media are not efficient or assume that any accounts that utilize LGBT-related words in their profile belong to individuals who identify as LGBT. Our human coding of over 16,000 accounts instead proposes the following three categories of LGBT Twitter users: individual, sexual worker/porn, and organization. This research develops a machine learning classifier based on the profile and bio features of these Twitter accounts. To have an efficient and effective process, we use a feature selection method to reduce the number of features and improve the classifier’s performance. Our approach achieves a promising result with around 88% accuracy. We also develop statistical analyses to compare the three categories based on the average weight of top features.

Author(s):  
Giandomenico Di Domenico ◽  
Annamaria Tuan ◽  
Marco Visentin

AbstractIn the wake of the COVID-19 pandemic, unprecedent amounts of fake news and hoax spread on social media. In particular, conspiracy theories argued on the effect of specific new technologies like 5G and misinformation tarnished the reputation of brands like Huawei. Language plays a crucial role in understanding the motivational determinants of social media users in sharing misinformation, as people extract meaning from information based on their discursive resources and their skillset. In this paper, we analyze textual and non-textual cues from a panel of 4923 tweets containing the hashtags #5G and #Huawei during the first week of May 2020, when several countries were still adopting lockdown measures, to determine whether or not a tweet is retweeted and, if so, how much it is retweeted. Overall, through traditional logistic regression and machine learning, we found different effects of the textual and non-textual cues on the retweeting of a tweet and on its ability to accumulate retweets. In particular, the presence of misinformation plays an interesting role in spreading the tweet on the network. More importantly, the relative influence of the cues suggests that Twitter users actually read a tweet but not necessarily they understand or critically evaluate it before deciding to share it on the social media platform.


2021 ◽  
Author(s):  
Alexey Bessudnov ◽  
Denis Tarasov ◽  
Viacheslav Panasovets ◽  
Veronica Kostenko ◽  
Ivan Smirnov ◽  
...  

In this paper we develop a machine learning classifier that predicts perceived ethnicity from data on personal names for major ethnic groups populating Russia. We collect data from VK, the largest Russian social media website. Ethnicity has been determined from languages spoken by users and their geographical location, with the data manually cleaned by crowd workers. The classifier shows the accuracy of 0.82 for a scheme with 24 ethnic groups and 0.92 for 15 aggregated ethnic groups. It can be used for research on ethnicity and ethnic relations in Russia, in particular with VK and other social media data.


Author(s):  
Shahzad Qaiser ◽  
Nooraini Yusoff ◽  
Farzana Kabir Ahmad ◽  
Ramsha Ali

Many different studies are in progress to analyze the content created by the users on social media due to its influence and social ripple effect. Various content created on social media has pieces of information and user’s sentiments about social issues. This study aims to analyze people’s sentiments about the impact of technology on employment and advancements in technologies and build a machine learning classifier to classify the sentiments. People are getting nervous, depressed and even doing suicides due to unemployment; hence, it is essential to explore this relatively new area of research. The study has two main objectives 1) to preprocess text collected from Twitter concerning the impact of technology on employment and analyze its sentiment, 2) to evaluate the performance of machine learning Naïve Bayes (NB) classifier on the text. To achieve this, a methodology is proposed that includes 1) data collection and preprocessing 2) analyze sentiment, 3) building machine learning classifier and 4) compare the performance of NB and support vector machine (SVM). NB and SVM achieved 87.18% and 82.05% accuracy respectively. The study found that 65% of the people hold negative sentiment regarding the impact of technology on employment and technological advancements; hence people must acquire new skills to minimize the effect of structural unemployment.


2017 ◽  
Vol 23 (12) ◽  
pp. 1475-1485 ◽  
Author(s):  
Sharath Chandra Guntuku ◽  
J. Russell Ramsay ◽  
Raina M. Merchant ◽  
Lyle H. Ungar

Objective: We computationally analyze the language of social media users diagnosed with ADHD to understand what they talk about, and how their language is correlated with users’ characteristics such as personality and temporal orientation. Method: We analyzed approximately 1.3 million tweets written by 1,399 Twitter users with self-reported diagnoses of ADHD, comparing their posts with those used by a control set matched by age, gender, and period of activity. Results: Users with ADHD are found to be less agreeable, more open, to post more often, and to use more negations, hedging, and swear words. Posts are suggestive of themes of emotional dysregulation, self-criticism, substance abuse, and exhaustion. A machine learning model can predict which of these Twitter users has ADHD with an out-of-sample AUC of .836. Conclusion: Based on this emerging technology, conjectures of future uses of social media by researchers and clinicians to better understand the naturalistic manifestations and sequelae of ADHD.


2016 ◽  
Vol 14 ◽  
Author(s):  
Koen Hallmann ◽  
Florian Kunneman ◽  
Christine Liebrecht ◽  
Antal Van den Bosch ◽  
Margot Van Mulken

Verbal irony, or sarcasm, presents a significant technical and conceptual challenge when it comes to automatic detection. Moreover, it can be a disruptive factor in sentiment analysis and opinion mining, because it changes the polarity of a message implicitly. Extant methods for automatic detection are mostly based on overt clues to ironic intent such as hashtags, also known as irony markers. In this paper, we investigate whether people who know each other make use of irony markers less often than people who do not know each other. We trained a machine-learning classifier to detect sarcasm in Twitter messages (tweets) that were addressed to specific users, and in tweets that were not addressed to a particular user. Human coders analyzed the top-1000 features found to be most discriminative into ten categories of irony markers. The classifier was also tested within and across the two categories. We find that tweets with a user mention contain fewer irony markers than tweets not addressed to a particular user. Classification experiments confirm that the irony in the two types of tweets is signaled differently. The within-category performance of the classifier is about 91% for both categories, while cross-category experiments yield substantially lower generalization performance scores of 75% and 71%. We conclude that irony markers are used more often when there is less mutual knowledge between sender and receiver. Senders addressing other Twitter users less often use irony markers, relying on mutual knowledge which should lead the receiver to infer ironic intent from more implicit clues. With regard to automatic detection, we conclude that our classifier is able to detect ironic tweets addressed at another user as reliably as tweets that are not addressed at a particular person.


Spam has become one of the growing issues in social media websites. Some of the users in these websites creates spam news. Coming to twitter, Users inject tweets in trending topics and replies with promotional messages providing links. A large amount of spam has been noticied in twitter. It is necessary to identify these spams tweets in a twitter stream. Now a days ,a big part of people rely on content available in social media in their decisions, so detecting and deleting these spam details is very important. A basic framework is suggested to detect malicious account holders in twitter..At present to detect these spam users or accounts there are methods which are based on content based features, Graph based features. The system which is going to be created works on machine learning based algorithms. These algorithms help to give accurate results. In this system algorithm named Naïve Bayes classifier algorithm is going to be used. This algorithm is said to be combination of many other principles relyingupon “Bayes theorem” wherein the methods share a common mode of working.


2019 ◽  
Vol 23 (1) ◽  
pp. 52-71 ◽  
Author(s):  
Siyoung Chung ◽  
Mark Chong ◽  
Jie Sheng Chua ◽  
Jin Cheon Na

PurposeThe purpose of this paper is to investigate the evolution of online sentiments toward a company (i.e. Chipotle) during a crisis, and the effects of corporate apology on those sentiments.Design/methodology/approachUsing a very large data set of tweets (i.e. over 2.6m) about Company A’s food poisoning case (2015–2016). This case was selected because it is widely known, drew attention from various stakeholders and had many dynamics (e.g. multiple outbreaks, and across different locations). This study employed a supervised machine learning approach. Its sentiment polarity classification and relevance classification consisted of five steps: sampling, labeling, tokenization, augmentation of semantic representation, and the training of supervised classifiers for relevance and sentiment prediction.FindingsThe findings show that: the overall sentiment of tweets specific to the crisis was neutral; promotions and marketing communication may not be effective in converting negative sentiments to positive sentiments; a corporate crisis drew public attention and sparked public discussion on social media; while corporate apologies had a positive effect on sentiments, the effect did not last long, as the apologies did not remove public concerns about food safety; and some Twitter users exerted a significant influence on online sentiments through their popular tweets, which were heavily retweeted among Twitter users.Research limitations/implicationsEven with multiple training sessions and the use of a voting procedure (i.e. when there was a discrepancy in the coding of a tweet), there were some tweets that could not be accurately coded for sentiment. Aspect-based sentiment analysis and deep learning algorithms can be used to address this limitation in future research. This analysis of the impact of Chipotle’s apologies on sentiment did not test for a direct relationship. Future research could use manual coding to include only specific responses to the corporate apology. There was a delay between the time social media users received the news and the time they responded to it. Time delay poses a challenge to the sentiment analysis of Twitter data, as it is difficult to interpret which peak corresponds with which incident/s. This study focused solely on Twitter, which is just one of several social media sites that had content about the crisis.Practical implicationsFirst, companies should use social media as official corporate news channels and frequently update them with any developments about the crisis, and use them proactively. Second, companies in crisis should refrain from marketing efforts. Instead, they should focus on resolving the issue at hand and not attempt to regain a favorable relationship with stakeholders right away. Third, companies can leverage video, images and humor, as well as individuals with large online social networks to increase the reach and diffusion of their messages.Originality/valueThis study is among the first to empirically investigate the dynamics of corporate reputation as it evolves during a crisis as well as the effects of corporate apology on online sentiments. It is also one of the few studies that employs sentiment analysis using a supervised machine learning method in the area of corporate reputation and communication management. In addition, it offers valuable insights to both researchers and practitioners who wish to utilize big data to understand the online perceptions and behaviors of stakeholders during a corporate crisis.


2014 ◽  
Vol 631-632 ◽  
pp. 1219-1223
Author(s):  
Jia Hao Chen ◽  
Jian Hua Wu

With the rapid development of Internet and occurrence of social media services, many users are becoming the creators of social information. However, the normal manual work can't deal with a large number of subjective messages. As a new kind of social media service, micro blog has been widely accepted and can be used for sentiment analysis. This paper compared performances of three machine learning methods on sentiment analysis of Chinese micro blog. We also proposed an improved feature selection method that increases the accuracy of classification. Experiment results show that SVM is closed to Naïve Bayes and they are better than logistic regression in most cases.


2021 ◽  
Vol 13 (1) ◽  
pp. 19
Author(s):  
Ola Karajeh ◽  
Dirar Darweesh ◽  
Omar Darwish ◽  
Noor Abu-El-Rub ◽  
Belal Alsinglawi ◽  
...  

Social media sites are considered one of the most important sources of data in many fields, such as health, education, and politics. While surveys provide explicit answers to specific questions, posts in social media have the same answers implicitly occurring in the text. This research aims to develop a method for extracting implicit answers from large tweet collections, and to demonstrate this method for an important concern: the problem of heart attacks. The approach is to collect tweets containing “heart attack” and then select from those the ones with useful information. Informational tweets are those which express real heart attack issues, e.g., “Yesterday morning, my grandfather had a heart attack while he was walking around the garden.” On the other hand, there are non-informational tweets such as “Dropped my iPhone for the first time and almost had a heart attack.” The starting point was to manually classify around 7000 tweets as either informational (11%) or non-informational (89%), thus yielding a labeled dataset to use in devising a machine learning classifier that can be applied to our large collection of over 20 million tweets. Tweets were cleaned and converted to a vector representation, suitable to be fed into different machine-learning algorithms: Deep neural networks, support vector machine (SVM), J48 decision tree and naïve Bayes. Our experimentation aimed to find the best algorithm to use to build a high-quality classifier. This involved splitting the labeled dataset, with 2/3 used to train the classifier and 1/3 used for evaluation besides cross-validation methods. The deep neural network (DNN) classifier obtained the highest accuracy (95.2%). In addition, it obtained the highest F1-scores with (73.6%) and (97.4%) for informational and non-informational classes, respectively.


Author(s):  
Hadj Ahmed Bouarara

A recent British study of people between the ages of 14 and 35 has shown that social media has a negative impact on mental health. The purpose of the paper is to detect people with mental disorders' behavior in social media in order to help Twitter users in overcoming their mental health problems such as anxiety, phobia, depression, paranoia, etc. For this, the author used text mining and machine learning algorithms (naïve Bayes, k-nearest neighbours) to analyse tweets. The obtained results were validated using different evaluation measures such as f-measure, recall, precision, entropy, etc.


Sign in / Sign up

Export Citation Format

Share Document