scholarly journals Archival Data Thinking and Practices for Social Media Collections

2021 ◽  
Author(s):  
Lizhou Fan

In the Web 2.0 Era, most social media archives are born digital and large-scale. With an increasing need for processing them at a fast speed, researchers and archivists have started applying data science methods in managing social media data collections. However, many of the current computational or data-driven archival processing methods are missing the critical background understandings like “why we need to use computational methods,” and “how to evaluate and improve data-driven applications.” As a result, many computational archival science (CAS) attempts, with comparatively narrow scopes and low efficiencies, are not sufficiently holistic. In this talk, we first introduce the proposed concept of “Archival Data Thinking” that highlights the desirable comprehensiveness in mapping data science mindsets to archival practices. Next, we examine several examples of implementing “Archival Data Thinking” in processing two social media collections: (i) the COVID-19 Hate Speech Twitter Archive (CHSTA) and (ii) the Counter-anti-Asian Hate Twitter Archive (CAAHTA), both of which are with millions of records and their metadata, and needs for rapid processing. Finally, as a future research direction, we briefly discuss the standards and infrastructures that can better support the implementation of “Archival Data Thinking”.

2017 ◽  
Vol 5 (1) ◽  
pp. 70-82
Author(s):  
Soumi Paul ◽  
Paola Peretti ◽  
Saroj Kumar Datta

Building customer relationships and customer equity is the prime concern in today’s business decisions. The emergence of internet, especially social media like Facebook and Twitter, changed traditional marketing thought to a great extent. The importance of customer orientation is reflected in the axiom, “The customer is the king”. A good number of organizations are engaging customers in their new product development activities via social media platforms. Co-creation, a new perspective in which customers are active co-creators of the products they buy and use, is currently challenging the traditional paradigm. The concept of co-creation involving the customer’s knowledge, creativity and judgment to generate value is considered not only an upcoming trend that introduces new products or services but also fitting their need and increasing value for money. Knowledge and innovation are inseparable. Knowledge management competencies and capacities are essential to any organization that aspires to be distinguished and innovative. The present work is an attempt to identify the change in value creation procedure along with one area of business, where co-creation can return significant dividends. It is on extending the brand or brand category through brand extension or line extension. This article, through an in depth literature review analysis, identifies the changes in every perspective of this paradigm shift and it presents a conceptual model of company-customer-brand-based co-creation activity via social media. The main objective is offering an agenda for future research of this emerging trend and ensuring the way to move from theory to practice. The paper acts as a proposal; it allows the organization to go for this change in a large scale and obtain early feedback on the idea presented. 


2020 ◽  
Vol 8 (1) ◽  
pp. 89-119
Author(s):  
Nathalie Vissers ◽  
Pieter Moors ◽  
Dominique Genin ◽  
Johan Wagemans

Artistic photography is an interesting, but often overlooked, medium within the field of empirical aesthetics. Grounded in an art–science collaboration with art photographer Dominique Genin, this project focused on the relationship between the complexity of a photograph and its aesthetic appeal (beauty, pleasantness, interest). An artistic series of 24 semi-abstract photographs that play with multiple layers, recognisability vs unrecognizability and complexity was specifically created and selected for the project. A large-scale online study with a broad range of individuals (n = 453, varying in age, gender and art expertise) was set up. Exploratory data-driven analyses revealed two clusters of individuals, who responded differently to the photographs. Despite the semi-abstract nature of the photographs, differences seemed to be driven more consistently by the ‘content’ of the photograph than by its complexity levels. No consistent differences were found between clusters in age, gender or art expertise. Together, these results highlight the importance of exploratory, data-driven work in empirical aesthetics to complement and nuance findings from hypotheses-driven studies, as they allow to go further than a priori assumptions, to explore underlying clusters of participants with different response patterns, and to point towards new venues for future research. Data and code for the analyses reported in this article can be found at https://osf.io/2fws6/.


Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7115
Author(s):  
Amin Muhammad Sadiq ◽  
Huynsik Ahn ◽  
Young Bok Choi

A rapidly increasing growth of social networks and the propensity of users to communicate their physical activities, thoughts, expressions, and viewpoints in text, visual, and audio material have opened up new possibilities and opportunities in sentiment and activity analysis. Although sentiment and activity analysis of text streams has been extensively studied in the literature, it is relatively recent yet challenging to evaluate sentiment and physical activities together from visuals such as photographs and videos. This paper emphasizes human sentiment in a socially crucial field, namely social media disaster/catastrophe analysis, with associated physical activity analysis. We suggest multi-tagging sentiment and associated activity analyzer fused with a a deep human count tracker, a pragmatic technique for multiple object tracking, and count in occluded circumstances with a reduced number of identity switches in disaster-related videos and images. A crowd-sourcing study has been conducted to analyze and annotate human activity and sentiments towards natural disasters and related images in social networks. The crowdsourcing study outcome into a large-scale benchmark dataset with three annotations sets each resolves distinct tasks. The presented analysis and dataset will anchor a baseline for future research in the domain. We believe that the proposed system will contribute to more viable communities by benefiting different stakeholders, such as news broadcasters, emergency relief organizations, and the public in general.


Author(s):  
Samuel C. Woolley ◽  
Philip N. Howard

Computational propaganda is an emergent form of political manipulation that occurs over the Internet. The term describes the assemblage of social media platforms, autonomous agents, algorithms, and big data tasked with manipulating public opinion. Our research shows that this new mode of interrupting and influencing communication is on the rise around the globe. Advances in computing technology, especially around social automation, machine learning, and artificial intelligence, mean that computational propaganda is becoming more sophisticated and harder to track. This introduction explores the foundations of computational propaganda. It describes the key role of automated manipulation of algorithms in recent efforts to control political communication worldwide. We discuss the social data science of political communication and build upon the argument that algorithms and other computational tools now play an important political role in news consumption, issue awareness, and cultural understanding. We unpack key findings of the nine country case studies that follow—exploring the role of computational propaganda during events from local and national elections in Brazil to the ongoing security crisis between Ukraine and Russia. Our methodology in this work has been purposefully mixed, using quantitative analysis of data from several social media platforms and qualitative work that includes interviews with the people who design and deploy political bots and disinformation campaigns. Finally, we highlight original evidence about how this manipulation and amplification of disinformation is produced, managed, and circulated by political operatives and governments, and describe paths for both democratic intervention and future research in this space.


Author(s):  
Yonghong Tong ◽  
Muhammet Bakan

With the increasing application of using mobile device and social media, large amount of continuous information about human behaviors is available. Data visualization provides an insightful presentation for the large-scale social media datasets. The focus of this paper is on the development of a mobile-device based visualization and analysis platform for social media data for the purpose of retrieving and visualizing visitors’ information for a specific region. This developed platform allows users to view the “big picture” of the visitors’ locations information. The result shows that the developed platform 1) performs a satisfied data collection and data visualization on a mobile device, 2) assists users to understand the varieties of human behaviors while visiting a place, and 3) offers a feasible role in imaging immediate information from social media and leading to further policy-making in related sectors and areas. Future research opportunities and challenges for social media data visualization are discussed.Keywords: Social media, data visualization, mobile device


Author(s):  
Emad Badawi ◽  
Guy-Vincent Jourdan ◽  
Gregor Bochmann ◽  
Iosif-Viorel Onut

The “Game Hack” Scam (GHS) is a mostly unreported cyberattack in which attackers attempt to convince victims that they will be provided with free, unlimited “resources” or other advantages for their favorite game. The endgame of the scammers ranges from monetizing for themselves the victims time and resources by having them click through endless “surveys”, filing out “market research” forms, etc., to collecting personal information, getting the victims to subscribe to questionable services, up to installing questionable executable files on their machines. Other scams such as the “Technical Support Scam”, the “Survey Scam”, and the “Romance Scam” have been analyzed before but to the best of our knowledge, GHS has not been well studied so far and is indeed mostly unknown. In this paper, our aim is to investigate and gain more knowledge on this type of scam by following a data-driven approach; we formulate GHS-related search queries, and used multiple search engines to collect data about the websites to which GHS victims are directed when they search online for various game hacks and tricks. We analyze the collected data to provide new insight into GHS and research the extent of this scam. We show that despite its low profile, the click traffic generated by the scam is in the hundreds of millions. We also show that GHS attackers use social media, streaming sites, blogs, and even unrelated sites such as change.org or jeuxvideo.com to carry out their attacks and reach a large number of victims. Our data collection spans a year; in that time, we uncovered 65,905 different GHS URLs, mapped onto over 5,900 unique domains.We were able to link attacks to attackers and found that they routinely target a vast array of games. Furthermore, we find that GHS instances are on the rise, and so is the number of victims. Our low-end estimation is that these attacks have been clicked at least 150 million times in the last five years. Finally, in keeping with similar large-scale scam studies, we find that the current public blacklists are inadequate and suggest that our method is more effective at detecting these attacks.


Data ◽  
2020 ◽  
Vol 5 (4) ◽  
pp. 87 ◽  
Author(s):  
Viktoriia Shubina ◽  
Sylvia Holcer ◽  
Michael Gould ◽  
Elena Simona Lohan

Some of the recent developments in data science for worldwide disease control have involved research of large-scale feasibility and usefulness of digital contact tracing, user location tracking, and proximity detection on users’ mobile devices or wearables. A centralized solution relying on collecting and storing user traces and location information on a central server can provide more accurate and timely actions than a decentralized solution in combating viral outbreaks, such as COVID-19. However, centralized solutions are more prone to privacy breaches and privacy attacks by malevolent third parties than decentralized solutions, storing the information in a distributed manner among wireless networks. Thus, it is of timely relevance to identify and summarize the existing privacy-preserving solutions, focusing on decentralized methods, and analyzing them in the context of mobile device-based localization and tracking, contact tracing, and proximity detection. Wearables and other mobile Internet of Things devices are of particular interest in our study, as not only privacy, but also energy-efficiency, targets are becoming more and more critical to the end-users. This paper provides a comprehensive survey of user location-tracking, proximity-detection, and digital contact-tracing solutions in the literature from the past two decades, analyses their advantages and drawbacks concerning centralized and decentralized solutions, and presents the authors’ thoughts on future research directions in this timely research field.


2020 ◽  
Vol 117 (19) ◽  
pp. 10165-10171 ◽  
Author(s):  
Kokil Jaidka ◽  
Salvatore Giorgi ◽  
H. Andrew Schwartz ◽  
Margaret L. Kern ◽  
Lyle H. Ungar ◽  
...  

Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.


2017 ◽  
Vol 28 (6) ◽  
pp. 949-972 ◽  
Author(s):  
FRANCISCO BERNAL ◽  
GONÇALO DOS REIS ◽  
GREIG SMITH

The numerical solution of large-scale PDEs, such as those occurring in data-driven applications, unavoidably require powerful parallel computers and tailored parallel algorithms to make the best possible use of them. In fact, considerations about the parallelization and scalability of realistic problems are often critical enough to warrant acknowledgement in the modelling phase. The purpose of this paper is to spread awareness of the Probabilistic Domain Decomposition (PDD) method, a fresh approach to the parallelization of PDEs with excellent scalability properties. The idea exploits the stochastic representation of the PDE and its approximation via Monte Carlo in combination with deterministic high-performance PDE solvers. We describe the ingredients of PDD and its applicability in the scope of data science. In particular, we highlight recent advances in stochastic representations for non-linear PDEs using branching diffusions, which have significantly broadened the scope of PDD. We envision this work as a dictionary giving large-scale PDE practitioners references on the very latest algorithms and techniques of a non-standard, yet highly parallelizable, methodology at the interface of deterministic and probabilistic numerical methods. We close this work with an invitation to the fully non-linear case and open research questions.


Sign in / Sign up

Export Citation Format

Share Document