scholarly journals Geographies of Twitter debates

Author(s):  
Emiliano del Gobbo ◽  
Lara Fontanella ◽  
Sara Fontanella ◽  
Annalina Sarra

AbstractOver the last years, the prodigious success of online social media sites has marked a shift in the way people connect and share information. Coincident with this trend is the proliferation of location-aware devices and the consequent emergence of user-generated geospatial data. From a social scientific perspective, these location data are of incredible value as it can be mined to provide researchers with useful information about activities and opinions across time and space. However, the utilization of geo-located data is a challenging task, both in terms of data management and in terms of knowledge production, which requires a holistic approach. In this paper, we implement an integrated knowledge discovery in cyberspace framework for retrieving, processing and interpreting Twitter geolocated data for the discovery and classification of the latent opinion in user-generated debates on the internet. Text mining techniques, supervised machine learning algorithms and a cluster spatial detection technique are the building blocks of our research framework. As real-word example, we focus on Twitter conversations about Brexit, posted on Uk during the 13 months before the Brexit day. The experimental results, based on various analysis of Brexit-related tweets, demonstrate that different spatial patterns can be identified, clearly distinguishing pro- and anti-Brexit enclaves and delineating interesting Brexit geographies.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sakthi Kumar Arul Prakash ◽  
Conrad Tucker

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.


2020 ◽  
Vol 29 (1) ◽  
pp. 19-42 ◽  
Author(s):  
Pablo Barberá ◽  
Amber E. Boydstun ◽  
Suzanna Linn ◽  
Ryan McMahon ◽  
Jonathan Nagler

Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.


2020 ◽  
Author(s):  
Sakthi Kumar Arul Prakash ◽  
Conrad Tucker

Abstract This work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need forground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, thiswork leverages user-user and user-media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) beingspread, without needing to know the actual details of the information itself. To study the inception and evolution of user-userand user-media interactions over time, we create an experimental platform that mimics the functionality of real world socialmedia networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty(entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world socialmedia network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, andwith media content. The discovery that the entropy of user-user, and user-media interactions approximates fake and authenticmedia likes, enables us to classify fake media in an unsupervised learning manner.


2020 ◽  
pp. 1-26
Author(s):  
Joshua Eykens ◽  
Raf Guns ◽  
Tim C.E. Engels

We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 778
Author(s):  
Nitsa J. Herzog ◽  
George D. Magoulas

Early identification of degenerative processes in the human brain is considered essential for providing proper care and treatment. This may involve detecting structural and functional cerebral changes such as changes in the degree of asymmetry between the left and right hemispheres. Changes can be detected by computational algorithms and used for the early diagnosis of dementia and its stages (amnestic early mild cognitive impairment (EMCI), Alzheimer’s Disease (AD)), and can help to monitor the progress of the disease. In this vein, the paper proposes a data processing pipeline that can be implemented on commodity hardware. It uses features of brain asymmetries, extracted from MRI of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, for the analysis of structural changes, and machine learning classification of the pathology. The experiments provide promising results, distinguishing between subjects with normal cognition (NC) and patients with early or progressive dementia. Supervised machine learning algorithms and convolutional neural networks tested are reaching an accuracy of 92.5% and 75.0% for NC vs. EMCI, and 93.0% and 90.5% for NC vs. AD, respectively. The proposed pipeline offers a promising low-cost alternative for the classification of dementia and can be potentially useful to other brain degenerative disorders that are accompanied by changes in the brain asymmetries.


2020 ◽  
Author(s):  
Eoin Carley

<p>Solar flares are often associated with high-intensity radio emission known as `solar radio bursts' (SRBs). SRBs are generally observed in dynamic spectra and have five major spectral classes, labelled type I to type V depending on their shape and extent in frequency and time. Due to their morphological complexity, a challenge in solar radio physics is the automatic detection and classification of such radio bursts. Classification of SRBs has become necessary in recent years due to large data rates (3 Gb/s) generated by advanced radio telescopes such as the Low Frequency Array (LOFAR). Here we test the ability of several supervised machine learning algorithms to automatically classify type II and type III solar radio bursts. We test the detection accuracy of support vector machines (SVM), random forest (RF), as well as an implementation of transfer learning of the Inception and YOLO convolutional neural networks (CNNs). The training data was assembled from type II and III bursts observed by the Radio Solar Telescope Network (RSTN) from 1996 to 2018, supplemented by type II and III radio burst simulations. The CNNs were the best performers, often exceeding >90% accuracy on the validation set, with YOLO having the ability to perform radio burst burst localisation in dynamic spectra. This shows that machine learning algorithms (in particular CNNs) are capable of SRB classification, and we conclude by discussing future plans for the implementation of a CNN in the LOFAR for Space Weather (LOFAR4SW) data-stream pipelines.</p>


Sign in / Sign up

Export Citation Format

Share Document