Applied Webscraping in Market Research

Making Inferences Using Incidentally Collected Data

10.31235/osf.io/x7bfs ◽

2016 ◽

Author(s):

Jonathan Mellon

Keyword(s):

Social Media ◽

Big Data ◽

The Internet ◽

User Interactions ◽

Social Media Data ◽

Broad Definition ◽

User Data ◽

The Cost ◽

Media Data

This chapter discusses the use of large quantities of incidentallycollected data (ICD) to make inferences about politics. This type of datais sometimes referred to as “big data” but I avoid this term because of itsconflicting definitions (Monroe, 2012; Ward & Barker, 2013). ICD is datathat was created or collected primarily for a purpose other than analysis.Within this broad definition, this chapter focuses particularly on datagenerated through user interactions with websites. While ICD has beenaround for at least half a century, the Internet greatly expanded theavailability and reduced the cost of ICD. Examples of ICD include data onInternet searches, social media data, and user data from civic platforms.This chapter briefly explains some sources and uses of ICD and thendiscusses some of the potential issues of analysis and interpretation thatarise when using ICD, including the different approaches to inference thatresearchers can use.

Download Full-text

Accurate, Fast, But Not Always Cheap: Evaluating “Crowdcoding” as an Alternative Approach to Analyze Social Media Data

Journalism & Mass Communication Quarterly ◽

10.1177/1077699019891437 ◽

2019 ◽

Vol 97 (3) ◽

pp. 811-834 ◽

Cited By ~ 1

Author(s):

Lei Guo ◽

Kate Mays ◽

Sha Lai ◽

Mona Jalal ◽

Prakash Ishwar ◽

...

Keyword(s):

Social Media ◽

Presidential Election ◽

Mass Communication ◽

The Internet ◽

Communication Research ◽

Social Media Data ◽

Quantitative Content ◽

Alternative Approach ◽

Media Data ◽

Numerous People

Crowdcoding, a method that outsources “coding” tasks to numerous people on the internet, has emerged as a popular approach for annotating texts and visuals. However, the performance of this approach for analyzing social media data in the context of journalism and mass communication research has not been systematically assessed. This study evaluated the validity and efficiency of crowdcoding based on the analysis of 4,000 tweets about the 2016 U.S. presidential election. The results show that compared with the traditional quantitative content analysis, crowdcoding yielded comparably valid results and was superior in efficiency, but was more expensive under most circumstances.

Download Full-text

Human Behavior Analysis Using Intelligent Big Data Analytics

Frontiers in Psychology ◽

10.3389/fpsyg.2021.686610 ◽

2021 ◽

Vol 12 ◽

Author(s):

Muhammad Usman Tariq ◽

Muhammad Babar ◽

Marc Poulin ◽

Akmal Saeed Khattak ◽

Mohammad Dahman Alshehri ◽

...

Keyword(s):

Social Media ◽

Big Data ◽

Human Behavior ◽

Data Analytics ◽

Data Science ◽

Big Data Analytics ◽

Apache Spark ◽

Social Media Data ◽

The Social ◽

Media Data

Intelligent big data analysis is an evolving pattern in the age of big data science and artificial intelligence (AI). Analysis of organized data has been very successful, but analyzing human behavior using social media data becomes challenging. The social media data comprises a vast and unstructured format of data sources that can include likes, comments, tweets, shares, and views. Data analytics of social media data became a challenging task for companies, such as Dailymotion, that have billions of daily users and vast numbers of comments, likes, and views. Social media data is created in a significant amount and at a tremendous pace. There is a very high volume to store, sort, process, and carefully study the data for making possible decisions. This article proposes an architecture using a big data analytics mechanism to efficiently and logically process the huge social media datasets. The proposed architecture is composed of three layers. The main objective of the project is to demonstrate Apache Spark parallel processing and distributed framework technologies with other storage and processing mechanisms. The social media data generated from Dailymotion is used in this article to demonstrate the benefits of this architecture. The project utilized the application programming interface (API) of Dailymotion, allowing it to incorporate functions suitable to fetch and view information. The API key is generated to fetch information of public channel data in the form of text files. Hive storage machinist is utilized with Apache Spark for efficient data processing. The effectiveness of the proposed architecture is also highlighted.

Download Full-text

Analyzing social media data: A mixed-methods framework combining computational and qualitative text analysis

10.31234/osf.io/bynz4 ◽

2019 ◽

Author(s):

Matthew Andreotta ◽

Robertus Nugroho ◽

Mark Hurlstone ◽

Fabio Boschetti ◽

Simon Farrell ◽

...

Keyword(s):

Social Media ◽

Qualitative Analysis ◽

Data Science ◽

Large Data ◽

Extraction Process ◽

Data Set ◽

Diverse Range ◽

Social Media Data ◽

Qualitative Thematic Analysis ◽

Media Data

To qualitative researchers, social media offers a novel opportunity to harvest a massive and diverse range of content, without the need for intrusive or intensive data collection procedures. However, performing a qualitative analysis across a massive social media data set is cumbersome and impractical. Instead, researchers often extract a subset of content to analyze, but a framework to facilitate this process is currently lacking. We present a four-phased framework for improving this extraction process, which blends the capacities of data science techniques to compress large data sets into smaller spaces, with the capabilities of qualitative analysis to address research questions. We demonstrate this framework by investigating the topics of Australian Twitter commentary on climate change, using quantitative (Non-Negative Matrix inter-joint Factorization; Topic Alignment) and qualitative (Thematic Analysis) techniques. Our approach is useful for researchers seeking to perform qualitative analyses of social media, or researchers wanting to supplement their quantitative work with a qualitative analysis of broader social context and meaning.

Download Full-text

Weather events identification in social media streams: tools to detect their evidence in Twitter

10.7287/peerj.preprints.2241 ◽

2017 ◽

Author(s):

Valentina Grasso ◽

Imad Zaza ◽

Federica Zabini ◽

Gianni Pantaleo ◽

Paolo Nesi ◽

...

Keyword(s):

Social Media ◽

Data Science ◽

Social Impact ◽

Graphical Representation ◽

Severe Weather ◽

Social Media Data ◽

Twitter Data ◽

Weather Events ◽

Crisis Events ◽

Media Data

Severe weather impact identification and monitoring through social media data is a good challenge for data science. In last years we assisted to an increase of natural disasters, also due to climate change. Many works showed that during such events people tend to share specific messages by of mean of social media platforms, especially Twitter. Not only they contribute to"situational" awareness also improving the dissemination of information during emergency but can be used to assess social impact of crisis events. We present in this work preliminary findings concerning how temporal distribution of weather related messages may help the identification of severe events that impacted a community. Severe weather events are recognizable by observing the synchronization of twitter streams volumes concerning extractions by using different but semantically graduate terms and hash-tags including the specific containing geo-content names. Impacting events seems immediately recognizable by graphical representation of weather streams and when the time-line show a specific parallel-wise pattern that we named "Half Onion Shape". Different but weather semantically linked twitter streams could exhibits different magnitude, in order to their term popularity, but they show, when a weather event occurs, the same temporal relative maximum. In reason of to these interesting indications, that needs to be confirmed through more deeper analysis, and of the great use of social media, as Twitter, during crisis events it's becoming fundamental to have a suite of suitable tools to monitor social media data. For Twitter data a comprehensive suite of tools is presented: the DISIT-Twitter Vigilance Platform for twitter data retrieve,management and visualization.

Download Full-text

Weather events identification in social media streams: tools to detect their evidence in Twitter

10.7287/peerj.preprints.2241v1 ◽

2016 ◽

Cited By ~ 2

Author(s):

Valentina Grasso ◽

Imad Zaza ◽

Federica Zabini ◽

Gianni Pantaleo ◽

Paolo Nesi ◽

...

Keyword(s):

Social Media ◽

Data Science ◽

Social Impact ◽

Graphical Representation ◽

Severe Weather ◽

Social Media Data ◽

Twitter Data ◽

Weather Events ◽

Crisis Events ◽

Media Data

Severe weather impact identification and monitoring through social media data is a good challenge for data science. In last years we assisted to an increase of natural disasters, also due to climate change. Many works showed that during such events people tend to share specific messages by of mean of social media platforms, especially Twitter. Not only they contribute to"situational" awareness also improving the dissemination of information during emergency but can be used to assess social impact of crisis events. We present in this work preliminary findings concerning how temporal distribution of weather related messages may help the identification of severe events that impacted a community. Severe weather events are recognizable by observing the synchronization of twitter streams volumes concerning extractions by using different but semantically graduate terms and hash-tags including the specific containing geo-content names. Impacting events seems immediately recognizable by graphical representation of weather streams and when the time-line show a specific parallel-wise pattern that we named "Half Onion Shape". Different but weather semantically linked twitter streams could exhibits different magnitude, in order to their term popularity, but they show, when a weather event occurs, the same temporal relative maximum. In reason of to these interesting indications, that needs to be confirmed through more deeper analysis, and of the great use of social media, as Twitter, during crisis events it's becoming fundamental to have a suite of suitable tools to monitor social media data. For Twitter data a comprehensive suite of tools is presented: the DISIT-Twitter Vigilance Platform for twitter data retrieve,management and visualization.

Download Full-text

Extended Implementation Method for Virtual Sensors: Web-Based Real-Time Transportation Data Collection and Analysis for Incident Management

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2528-04 ◽

2015 ◽

Vol 2528 (1) ◽

pp. 27-37 ◽

Cited By ~ 9

Author(s):

Abdullah Kurkcu ◽

Ender Faruk Morgul ◽

Kaan Ozbay

Keyword(s):

Social Media ◽

Data Collection ◽

Travel Time ◽

Real Time ◽

Open Data ◽

Data Sources ◽

Incident Management ◽

Virtual Sensors ◽

Social Media Data ◽

Media Data

Open data sources and social media data are gaining increasing attention as important information providers in transportation and incident management. In this paper, practical evidence for the emerging potential of online and open data sources is presented. The authors’ previous research on virtual sensors is combined and extended by integrating real-time incident information and social media network engagement. The fundamental contribution of this paper is the development of an extended virtual sensor framework to provide an automated travel time data collection method as incidents occur. In addition, social media data can be useful for more effective real-time incident response. The proposed framework can easily be modified and used to evaluate travel time effects of incidents on roadways and clearance times and to make use of social media data in obtaining time-critical incident-related information.

Download Full-text

Launching Revolution: Social Media and the Egyptian Uprising’s First Movers

British Journal of Political Science ◽

10.1017/s0007123418000194 ◽

2018 ◽

Vol 50 (3) ◽

pp. 1025-1045 ◽

Cited By ~ 2

Author(s):

Killian Clarke ◽

Korhan Kocak

Keyword(s):

Social Media ◽

The Internet ◽

Social Media Data ◽

Quantitative Evidence ◽

Qualitative And Quantitative ◽

Social Media Platforms ◽

The Arab Spring ◽

Media Data ◽

First Mover

AbstractDrawing on evidence from the 2011 Egyptian uprising, this article demonstrates how the use of two social media platforms – Facebook and Twitter – contributed to a discrete mobilizational outcome: the staging of a successful first protest in a revolutionary cascade, referred to here as ‘first-mover mobilization’. Specifically, it argues that these two platforms facilitated the staging of a large, nationwide and seemingly leaderless protest on 25 January 2011, which signaled to hesitant but sympathetic Egyptians that a revolution might be in the making. It draws on qualitative and quantitative evidence, including interviews, social media data and surveys, to analyze three mechanisms that linked these platforms to the success of the January 25 protest: (1) protester recruitment, (2) protest planning and coordination, and (3) live updating about protest logistics. The article not only contributes to debates about the role of the Internet in the Arab Spring and other recent waves of mobilization, but also demonstrates how scholarship on the Internet in politics might move toward making more discrete, empirically grounded causal claims.

Download Full-text

Artificial Intelligence and Inclusion: Formerly Gang-Involved Youth as Domain Experts for Analyzing Unstructured Twitter Data

Social Science Computer Review ◽

10.1177/0894439318788314 ◽

2018 ◽

Vol 38 (1) ◽

pp. 42-56 ◽

Cited By ~ 10

Author(s):

William R. Frey ◽

Desmond U. Patton ◽

Michael B. Gaskell ◽

Kyle A. McGregor

Keyword(s):

Social Media ◽

Young People ◽

Data Science ◽

Social Impact ◽

Human Condition ◽

Social Media Data ◽

Domain Experts ◽

Marginalized Communities ◽

Twitter Data ◽

Media Data

Mining social media data for studying the human condition has created new and unique challenges. When analyzing social media data from marginalized communities, algorithms lack the ability to accurately interpret off-line context, which may lead to dangerous assumptions about and implications for marginalized communities. To combat this challenge, we hired formerly gang-involved young people as domain experts for contextualizing social media data in order to create inclusive, community-informed algorithms. Utilizing data from the Gang Intervention and Computer Science Project—a comprehensive analysis of Twitter data from gang-involved youth in Chicago—we describe the process of involving formerly gang-involved young people in developing a new part-of-speech tagger and content classifier for a prototype natural language processing system that detects aggression and loss in Twitter data. We argue that involving young people as domain experts leads to more robust understandings of context, including localized language, culture, and events. These insights could change how data scientists approach the development of corpora and algorithms that affect people in marginalized communities and who to involve in that process. We offer a contextually driven interdisciplinary approach between social work and data science that integrates domain insights into the training of qualitative annotators and the production of algorithms for positive social impact.

Download Full-text

Real-Time Recommendation Engine for Readers

Advances in Library and Information Science - Big Data Applications for Improving Library Services ◽

10.4018/978-1-7998-3049-8.ch011 ◽

2021 ◽

pp. 165-177

Author(s):

Sangeeta Namdev Dhamdhere ◽

Deepak Mane

Keyword(s):

Decision Making ◽

Social Media ◽

Real Time ◽

Recommendation System ◽

The Internet ◽

Problem Analysis ◽

Social Media Data ◽

Good Book ◽

Data Ingestion ◽

Media Data

In today's world, every reader or social media user has different choices/hobbies in terms of reading. For example, if any social media user is searching for a book to read without any specific idea of what s/he wants, s/he wastes a lot of time browsing around on the internet and crawling/trawling through various sites hoping that s/he might get good book. To avoid confusion, the authors are building a recommendation system for every reader/user that helps to recommend a book based on his choices, hobbies, or what s/he had read previously that will be massive help for users instead wasting time on various sites. Data from social media is the powerful fuel that can be used to helps in decision making and building a recommendation engine. Social media data in the different format is biggest challenge for the business to ingest data at the reasonable speed and further process. In social media data, it is difficult to detect and capture data. Real-time recommendation engine for users, which includes data ingestion methods, challenges, metadata problem, analysis, and consumption, is discussed here.

Download Full-text