Collecting Pennsylvania Political Twitter Data

During the two most recent elections we have seen the importance of social media, and Twitter in particular, for political discourse. This paper describes the effort of an academic library to collect election-related Twitter data from Pennsylvania-specific organizational accounts and hashtags for 2018 and 2020 in the run-up and aftermath of both election cycles. Because of its importance to understanding contemporary politics and its historic value, libraries need to consider the opportunity to collect and make this data accessible to Pennsylvanians.

Download Full-text

Fake It “Till You Make It”: Debunking Fake News in a Post-Truth America

eJournal of Public Affairs ◽

10.21768/8.1.6 ◽

2019 ◽

Vol 8 (1) ◽

pp. 114-133

Keyword(s):

Social Media ◽

First Amendment ◽

Presidential Election ◽

Political Discourse ◽

Cutting Edge ◽

American People ◽

Fake News ◽

News Sources ◽

The Media ◽

The U.S

Since the 2016 U.S. presidential election, attacks on the media have been relentless. “Fake news” has become a household term, and repeated attempts to break the trust between reporters and the American people have threatened the validity of the First Amendment to the U.S. Constitution. In this article, the authors trace the development of fake news and its impact on contemporary political discourse. They also outline cutting-edge pedagogies designed to assist students in critically evaluating the veracity of various news sources and social media sites.

Download Full-text

Utilizing Twitter Data Analysis and Deep Learning to Identify Drug Use (Preprint)

10.2196/preprints.14681 ◽

2019 ◽

Author(s):

Joseph Tassone ◽

Peizhi Yan ◽

Mackenzie Simpson ◽

Chetan Mendhe ◽

Vijay Mago ◽

...

Keyword(s):

Social Media ◽

Logistic Regression ◽

Deep Learning ◽

Decision Tree ◽

Semantic Meaning ◽

Predictive Capability ◽

Logistic Regression Models ◽

Twitter Data ◽

Data Points ◽

Positive Classification

BACKGROUND The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. OBJECTIVE Through the analysis of a collected set of Twitter data, a model will be developed for predicting positively referenced, drug-related tweets. From this, trends and correlations can be determined. METHODS Twitter social media tweets and attribute data were collected and processed using topic pertaining keywords, such as drug slang and use-conditions (methods of drug consumption). Potential candidates were preprocessed resulting in a dataset 3,696,150 rows. The predictive classification power of multiple methods was compared including regression, decision trees, and CNN-based classifiers. For the latter, a deep learning approach was implemented to screen and analyze the semantic meaning of the tweets. RESULTS The logistic regression and decision tree models utilized 12,142 data points for training and 1041 data points for testing. The results calculated from the logistic regression models respectively displayed an accuracy of 54.56% and 57.44%, and an AUC of 0.58. While an improvement, the decision tree concluded with an accuracy of 63.40% and an AUC of 0.68. All these values implied a low predictive capability with little to no discrimination. Conversely, the CNN-based classifiers presented a heavy improvement, between the two models tested. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Using association rule mining in conjunction with the CNN-based classifier showed a high likelihood for keywords such as “smoke”, “cocaine”, and “marijuana” triggering a drug-positive classification. CONCLUSIONS Predictive analysis without a CNN is limited and possibly fruitless. Attribute-based models presented little predictive capability and were not suitable for analyzing this type of data. The semantic meaning of the tweets needed to be utilized, giving the CNN-based classifier an advantage over other solutions. Additionally, commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of this system. Lastly, the synthetically generated set provided increased scores, improving the predictive capability. CLINICALTRIAL None

Download Full-text

Dynamics of convergence behaviour in social media crisis communication – a complexity perspective

Information Technology and People ◽

10.1108/itp-10-2019-0537 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Milad Mirbabaie ◽

Stefan Stieglitz ◽

Felix Brünker

Keyword(s):

Social Media ◽

Emergency Management ◽

Crisis Communication ◽

Design Methodology ◽

Content Type ◽

The Public ◽

Analysis Techniques ◽

Twitter Data ◽

Convergence Behaviour ◽

Over Time

PurposeThe purpose of this study is to investigate communication on Twitter during two unpredicted crises (the Manchester bombings and the Munich shooting) and one natural disaster (Hurricane Harvey). The study contributes to understanding the dynamics of convergence behaviour archetypes during crises.Design/methodology/approachThe authors collected Twitter data and analysed approximately 7.5 million relevant cases. The communication was examined using social network analysis techniques and manual content analysis to identify convergence behaviour archetypes (CBAs). The dynamics and development of CBAs over time in crisis communication were also investigated.FindingsThe results revealed the dynamics of influential CBAs emerging in specific stages of a crisis situation. The authors derived a conceptual visualisation of convergence behaviour in social media crisis communication and introduced the terms hidden and visible network-layer to further understanding of the complexity of crisis communication.Research limitations/implicationsThe results emphasise the importance of well-prepared emergency management agencies and support the following recommendations: (1) continuous and (2) transparent communication during the crisis event as well as (3) informing the public about central information distributors from the start of the crisis are vital.Originality/valueThe study uncovered the dynamics of crisis-affected behaviour on social media during three cases. It provides a novel perspective that broadens our understanding of complex crisis communication on social media and contributes to existing knowledge of the complexity of crisis communication as well as convergence behaviour.

Download Full-text

What the fake? Assessing the extent of networked political spamming and bots in the propagation of #fakenews on Twitter

Online Information Review ◽

10.1108/oir-02-2018-0065 ◽

2019 ◽

Vol 43 (1) ◽

pp. 53-71 ◽

Cited By ~ 4

Author(s):

Ahmed Al-Rawi ◽

Jacob Groshek ◽

Li Zhang

Keyword(s):

Social Media ◽

Data Sets ◽

Fake News ◽

Content Type ◽

Mainstream Media ◽

News Discourse ◽

Twitter Data ◽

News Organizations ◽

Twitter Users ◽

One Year

PurposeThe purpose of this paper is to examine one of the largest data sets on the hashtag use of #fakenews that comprises over 14m tweets sent by more than 2.4m users.Design/methodology/approachTweets referencing the hashtag (#fakenews) were collected for a period of over one year from January 3 to May 7 of 2018. Bot detection tools were employed, and the most retweeted posts, most mentions and most hashtags as well as the top 50 most active users in terms of the frequency of their tweets were analyzed.FindingsThe majority of the top 50 Twitter users are more likely to be automated bots, while certain users’ posts like that are sent by President Donald Trump dominate the most retweeted posts that always associate mainstream media with fake news. The most used words and hashtags show that major news organizations are frequently referenced with a focus on CNN that is often mentioned in negative ways.Research limitations/implicationsThe research study is limited to the examination of Twitter data, while ethnographic methods like interviews or surveys are further needed to complement these findings. Though the data reported here do not prove direct effects, the implications of the research provide a vital framework for assessing and diagnosing the networked spammers and main actors that have been pivotal in shaping discourses around fake news on social media. These discourses, which are sometimes assisted by bots, can create a potential influence on audiences and their trust in mainstream media and understanding of what fake news is.Originality/valueThis paper offers results on one of the first empirical research studies on the propagation of fake news discourse on social media by shedding light on the most active Twitter users who discuss and mention the term “#fakenews” in connection to other news organizations, parties and related figures.

Download Full-text

Crowdsourcing Incident Information for Emergency Response using Open Data Sources in Smart Cities

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198118798736 ◽

2018 ◽

Vol 2672 (1) ◽

pp. 198-208 ◽

Cited By ~ 3

Author(s):

Fan Zuo ◽

Abdullah Kurkcu ◽

Kaan Ozbay ◽

Jingqin Gao

Keyword(s):

Social Media ◽

Prior Knowledge ◽

Emergency Response ◽

Large Scale ◽

Latent Dirichlet Allocation ◽

Smart Cities ◽

Hurricane Sandy ◽

Support Vector ◽

Twitter Data ◽

Emergency Events

Emergency events affect human security and safety as well as the integrity of the local infrastructure. Emergency response officials are required to make decisions using limited information and time. During emergency events, people post updates to social media networks, such as tweets, containing information about their status, help requests, incident reports, and other useful information. In this research project, the Latent Dirichlet Allocation (LDA) model is used to automatically classify incident-related tweets and incident types using Twitter data. Unlike the previous social media information models proposed in the related literature, the LDA is an unsupervised learning model which can be utilized directly without prior knowledge and preparation for data in order to save time during emergencies. Twitter data including messages and geolocation information during two recent events in New York City, the Chelsea explosion and Hurricane Sandy, are used as two case studies to test the accuracy of the LDA model for extracting incident-related tweets and labeling them by incident type. Results showed that the model could extract emergency events and classify them for both small and large-scale events, and the model’s hyper-parameters can be shared in a similar language environment to save model training time. Furthermore, the list of keywords generated by the model can be used as prior knowledge for emergency event classification and training of supervised classification models such as support vector machine and recurrent neural network.

Download Full-text

Study of Automotive Brands Popularity in Indonesia Using Twitter Data

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v3i1.91 ◽

2016 ◽

Vol 3 (1) ◽

pp. 23-33

Author(s):

Stevent Efendi ◽

Alva Erwin ◽

Kho I Eng

Keyword(s):

Social Media ◽

Social Network ◽

Sentiment Analysis ◽

Automotive Industry ◽

Real World ◽

The Internet ◽

Brand Preference ◽

Twitter Data ◽

Wide Range ◽

Widespread Phenomenon

Social media has been a widespread phenomenon in the recent years. People shared a lot of thought in social media, and these data posted on the internet could be used for study and researches. As one of the fastest growing social network, Twitter is a particularly popular social media to be studied because it allows researchers to access their data. This research will look the correlation between Twitter chatter of a brand and the sales of brands in Indonesia. Factors such as sentiment and tweet rate are expected to be able to predict the popularity of a brand. Being one of the biggest industries in Indonesia, automotive industry is an interesting subject to study. A wide range of people buys vehicles, and even gather as communities based on their car or motorcycle brand preference. The Twitter results of sentiment analysis and tweet rate will be compared with real world sales results published by GAIKINDO and AISI.

Download Full-text

Classification Connection of Twitter Data using K-Means Clustering

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1004.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 14-22

Keyword(s):

Social Media ◽

Real Time ◽

Clustering Methods ◽

Time Analysis ◽

Data Set ◽

Dynamic Data ◽

Real Time Analysis ◽

Twitter Data ◽

Social Media Platforms ◽

Insight Into

The rise of social media platforms like Twitter and the increasing adoption by people in order to stay connected provide a large source of data to perform analysis based on the various trends, events and even various personalities. Such analysis also provides insight into a person’s likes and inclinations in real time independent of the data size. Several techniques have been created to retrieve such data however the most efficient technique is clustering. This paper provides an overview of the algorithms of the various clustering methods as well as looking at their efficiency in determining trending information. The clustered data may be further classified by topics for real time analysis on a large dynamic data set. In this paper, data classification is performed and analyzed for flaws followed by another classification on the same data set.

Download Full-text

HOLY COW IN INDIA: A POLITICAL DISCOURSE AND SOCIAL MEDIA ANALYSIS FOR RESTORATIVE JUSTICE

Trames Journal of the Humanities and Social Sciences ◽

10.3176/tr.2021.2.04 ◽

2021 ◽

Vol 25 (2) ◽

pp. 219

Author(s):

M Akram ◽

A Nasar ◽

M R Safdar

Keyword(s):

Social Media ◽

Restorative Justice ◽

Political Discourse ◽

Media Analysis ◽

Social Media Analysis

Download Full-text

Perang Tagar Di Ruang Virtual Diskursus Politik Capres Pasca Debat Putaran Kedua

Jurnal Komunikasi ◽

10.24912/jk.v12i1.5622 ◽

2020 ◽

Vol 12 (1) ◽

pp. 30

Author(s):

Ana Fitriana P ◽

Ema Ema ◽

Fardiah Oktariani Lubis

Keyword(s):

Social Media ◽

Critical Discourse Analysis ◽

Presidential Election ◽

Political Discourse ◽

Virtual Space ◽

The Political ◽

Virtual Communication ◽

Presidential Candidates ◽

Subject Pronouns ◽

Source Of Information

This study aims to uncover the political discourse of the Presidential Candidates after the second round of debates, Jokowi VS Prabowo in virtual space. The background of the political discourse of the 2019 Presidential Election debate in the virtual space gave rise to various responses and sentiments among the supporters. After the Presidential Election debate, the hashtag war between #BohongLagiJokowi and # 02GagapUnicorn on social media Twitter became the main discussion. The aim is to disperse the power, ideology, and interests behind the presidential political discourse through the Fairclough Critical Discourse Analysis. The research method uses qualitative methods to parse the problem in research, using critical thinking as a basis for research. The results showed the tweet of Prabowo supporters trying to show stunts through language. The use of subject pronouns such as the word Mukidi to dwarf the subject, also uses the hashtags #DeletJokowi, #UnistallJokowi, and #BohongLagiJokowi as symbols of virtual communication. In contrast, the tweet of Jokowi's supporters sent a stuttering sentiment because they didn't understand the e-comer business. Hashtag # 02GagapUnicorn as a virtual symbol for organizing texts. In the order of messo or the production of text, the two supporters make a virtual symbol through the hashtag to become a topic of discussion on Twitter. In the situational or macro aspects are influenced by the post-truth phenomenon that is vague information whose source is unclear, have an impact on the inclusion of opinions on the assassination of certain characters. The advice, wise social media, understand and thoroughly source of information, is not affected by the use of certain metaphors, and at the stage of text, production needs to pay attention to the effects that will impact on the influence of social psychology of each supporter.Penelitian ini bertujuan untuk mengungkap diskursus politik Capres pasca debat putaran kedua, Jokowi VS Prabowo di ruang virtual. Dilatarbelakangi oleh lanskap diskursus politik debat Pilpres 2019 di ruang virtual yang menimbulkan berbagai tanggapan dan sentimen di antara kedua kubu pendukung. Pasca debat Pilpres perang tagar antara #BohongLagiJokowi dan #02GagapUnicorn di Twitter menjadi pembahasan utama. Tujuannya untuk membongkar kuasa, ideologi dan kepentingan di balik wacana politik Pilpres melalui analisis wacana kritis Fairclough. Metode penelitian menggunakan metode kualitatif untuk mengurai masalah dalam penelitian, dengan menggunakan pemikiran kritis sebagai dasar pijakan penelitian. Hasil penelitian menunjukan tweet warganet pendukung Prabowo berusaha menunjukkan pengkerdilan melalui Bahasa. Penggunaan kata ganti subjek seperti kata Mukidi untuk mengkerdilkan subjek, juga menggunakan tagar #DeletJokowi, #UninstallJokowi dan #BohongLagiJokowi sebagai simbol komunikasi virtual. Namun sebaliknya, tweet warganet pendukung Jokowi membuat sentimen dengan kata-kata gagap karena tidak memahami bisnis milenial (e-commerce). Tagar #02GagapUnicorn sebagai simbol virtual untuk pengorganisasin teks, sedangkan di tahapan produksi teks (messo) kedua pendukung membuat simbol virtual melalui tagar (#) untuk menjadi tren topik pembahasan di Twitter. Pada aspek situasional (makro) dipengaruhi oleh fenomena post-truth yaitu informasi-informasi samar yang tidak jelas sumbernya, berdampak terhadap penggiringan opini terhadap pembunuhan karakter tertentu. Sarannya, bijak bermedia sosial, pahami dan teliti sumber informasinya, tidak terpengaruh terhadap penggunaaan metafora tertentu, serta pada tahap produksi teks perlu memperhatikan efeknya yang akan berdampak pada pengaruh psikologi sosial masing-masing pendukung.

Download Full-text

What Your Tweets Tell Us About You: Identity, Ownership and Privacy of Twitter Data

International Journal of Digital Curation ◽

10.2218/ijdc.v7i1.224 ◽

2012 ◽

Vol 7 (1) ◽

pp. 174-197 ◽

Cited By ~ 9

Author(s):

Heather Small ◽

Kristine Kasianovitz ◽

Ronald Blanford ◽

Ina Celaya

Keyword(s):

Social Media ◽

Social Networking Sites ◽

Data Sets ◽

Data Set ◽

Social Media Data ◽

Twitter Data ◽

Other Information ◽

Rich Data ◽

Additional Value ◽

Media Data

Social networking sites and other social media have enabled new forms of collaborative communication and participation for users, and created additional value as rich data sets for research. Research based on accessing, mining, and analyzing social media data has risen steadily over the last several years and is increasingly multidisciplinary; researchers from the social sciences, humanities, computer science and other domains have used social media data as the basis of their studies. The broad use of this form of data has implications for how curators address preservation, access and reuse for an audience with divergent disciplinary norms related to privacy, ownership, authenticity and reliability.In this paper, we explore how the characteristics of the Twitter platform, coupled with an ambiguous and evolving understanding of privacy in networked communication, and divergent disciplinary understandings of the resulting data, combine to create complex issues for curators trying to ensure broad-based and ethical reuse of Twitter data. We provide a case study of a specific data set to illustrate how data curators can engage with the topics and questions raised in the paper. While some initial suggestions are offered to librarians and other information professionals who are beginning to receive social media data from researchers, our larger goal is to stimulate discussion and prompt additional research on the curation and preservation of social media data.

Download Full-text