scholarly journals Applied Webscraping in Market Research

Author(s):  
Markus Herrmann ◽  
Laura Hoyden

Modern Webscraping tools and APIs facilitate the extraction of information from the Internet significantly, especially if the data is not offered for download in a structured format. In this abstract we outline, that Webscraping, as a common practice to load, prepare and statistically analyze specific structured or unstructured data from the Internet, has become an essential application in Marketing and Data Science. Furthermore, we emphasize the importance of Open Data and social media data as a scraping target and illustrate examples of Open Data and social media data integration, Sentiment Analysis and website content classification as a utilization of Webscraping in a Market Research environment. While we argue that Webscraping of internet data is an enabler and driver of product innovation in Market Research it should also be noted that there are some legal restrictions involved.

2016 ◽  
Author(s):  
Jonathan Mellon

This chapter discusses the use of large quantities of incidentallycollected data (ICD) to make inferences about politics. This type of datais sometimes referred to as “big data” but I avoid this term because of itsconflicting definitions (Monroe, 2012; Ward & Barker, 2013). ICD is datathat was created or collected primarily for a purpose other than analysis.Within this broad definition, this chapter focuses particularly on datagenerated through user interactions with websites. While ICD has beenaround for at least half a century, the Internet greatly expanded theavailability and reduced the cost of ICD. Examples of ICD include data onInternet searches, social media data, and user data from civic platforms.This chapter briefly explains some sources and uses of ICD and thendiscusses some of the potential issues of analysis and interpretation thatarise when using ICD, including the different approaches to inference thatresearchers can use.


2019 ◽  
Vol 97 (3) ◽  
pp. 811-834 ◽  
Author(s):  
Lei Guo ◽  
Kate Mays ◽  
Sha Lai ◽  
Mona Jalal ◽  
Prakash Ishwar ◽  
...  

Crowdcoding, a method that outsources “coding” tasks to numerous people on the internet, has emerged as a popular approach for annotating texts and visuals. However, the performance of this approach for analyzing social media data in the context of journalism and mass communication research has not been systematically assessed. This study evaluated the validity and efficiency of crowdcoding based on the analysis of 4,000 tweets about the 2016 U.S. presidential election. The results show that compared with the traditional quantitative content analysis, crowdcoding yielded comparably valid results and was superior in efficiency, but was more expensive under most circumstances.


2021 ◽  
Vol 12 ◽  
Author(s):  
Muhammad Usman Tariq ◽  
Muhammad Babar ◽  
Marc Poulin ◽  
Akmal Saeed Khattak ◽  
Mohammad Dahman Alshehri ◽  
...  

Intelligent big data analysis is an evolving pattern in the age of big data science and artificial intelligence (AI). Analysis of organized data has been very successful, but analyzing human behavior using social media data becomes challenging. The social media data comprises a vast and unstructured format of data sources that can include likes, comments, tweets, shares, and views. Data analytics of social media data became a challenging task for companies, such as Dailymotion, that have billions of daily users and vast numbers of comments, likes, and views. Social media data is created in a significant amount and at a tremendous pace. There is a very high volume to store, sort, process, and carefully study the data for making possible decisions. This article proposes an architecture using a big data analytics mechanism to efficiently and logically process the huge social media datasets. The proposed architecture is composed of three layers. The main objective of the project is to demonstrate Apache Spark parallel processing and distributed framework technologies with other storage and processing mechanisms. The social media data generated from Dailymotion is used in this article to demonstrate the benefits of this architecture. The project utilized the application programming interface (API) of Dailymotion, allowing it to incorporate functions suitable to fetch and view information. The API key is generated to fetch information of public channel data in the form of text files. Hive storage machinist is utilized with Apache Spark for efficient data processing. The effectiveness of the proposed architecture is also highlighted.


2019 ◽  
Author(s):  
Matthew Andreotta ◽  
Robertus Nugroho ◽  
Mark Hurlstone ◽  
Fabio Boschetti ◽  
Simon Farrell ◽  
...  

To qualitative researchers, social media offers a novel opportunity to harvest a massive and diverse range of content, without the need for intrusive or intensive data collection procedures. However, performing a qualitative analysis across a massive social media data set is cumbersome and impractical. Instead, researchers often extract a subset of content to analyze, but a framework to facilitate this process is currently lacking. We present a four-phased framework for improving this extraction process, which blends the capacities of data science techniques to compress large data sets into smaller spaces, with the capabilities of qualitative analysis to address research questions. We demonstrate this framework by investigating the topics of Australian Twitter commentary on climate change, using quantitative (Non-Negative Matrix inter-joint Factorization; Topic Alignment) and qualitative (Thematic Analysis) techniques. Our approach is useful for researchers seeking to perform qualitative analyses of social media, or researchers wanting to supplement their quantitative work with a qualitative analysis of broader social context and meaning.


2017 ◽  
Author(s):  
Valentina Grasso ◽  
Imad Zaza ◽  
Federica Zabini ◽  
Gianni Pantaleo ◽  
Paolo Nesi ◽  
...  

Severe weather impact identification and monitoring through social media data is a good challenge for data science. In last years we assisted to an increase of natural disasters, also due to climate change. Many works showed that during such events people tend to share specific messages by of mean of social media platforms, especially Twitter. Not only they contribute to"situational" awareness also improving the dissemination of information during emergency but can be used to assess social impact of crisis events. We present in this work preliminary findings concerning how temporal distribution of weather related messages may help the identification of severe events that impacted a community. Severe weather events are recognizable by observing the synchronization of twitter streams volumes concerning extractions by using different but semantically graduate terms and hash-tags including the specific containing geo-content names. Impacting events seems immediately recognizable by graphical representation of weather streams and when the time-line show a specific parallel-wise pattern that we named "Half Onion Shape". Different but weather semantically linked twitter streams could exhibits different magnitude, in order to their term popularity, but they show, when a weather event occurs, the same temporal relative maximum. In reason of to these interesting indications, that needs to be confirmed through more deeper analysis, and of the great use of social media, as Twitter, during crisis events it's becoming fundamental to have a suite of suitable tools to monitor social media data. For Twitter data a comprehensive suite of tools is presented: the DISIT-Twitter Vigilance Platform for twitter data retrieve,management and visualization.


Author(s):  
Valentina Grasso ◽  
Imad Zaza ◽  
Federica Zabini ◽  
Gianni Pantaleo ◽  
Paolo Nesi ◽  
...  

Severe weather impact identification and monitoring through social media data is a good challenge for data science. In last years we assisted to an increase of natural disasters, also due to climate change. Many works showed that during such events people tend to share specific messages by of mean of social media platforms, especially Twitter. Not only they contribute to"situational" awareness also improving the dissemination of information during emergency but can be used to assess social impact of crisis events. We present in this work preliminary findings concerning how temporal distribution of weather related messages may help the identification of severe events that impacted a community. Severe weather events are recognizable by observing the synchronization of twitter streams volumes concerning extractions by using different but semantically graduate terms and hash-tags including the specific containing geo-content names. Impacting events seems immediately recognizable by graphical representation of weather streams and when the time-line show a specific parallel-wise pattern that we named "Half Onion Shape". Different but weather semantically linked twitter streams could exhibits different magnitude, in order to their term popularity, but they show, when a weather event occurs, the same temporal relative maximum. In reason of to these interesting indications, that needs to be confirmed through more deeper analysis, and of the great use of social media, as Twitter, during crisis events it's becoming fundamental to have a suite of suitable tools to monitor social media data. For Twitter data a comprehensive suite of tools is presented: the DISIT-Twitter Vigilance Platform for twitter data retrieve,management and visualization.


Author(s):  
Abdullah Kurkcu ◽  
Ender Faruk Morgul ◽  
Kaan Ozbay

Open data sources and social media data are gaining increasing attention as important information providers in transportation and incident management. In this paper, practical evidence for the emerging potential of online and open data sources is presented. The authors’ previous research on virtual sensors is combined and extended by integrating real-time incident information and social media network engagement. The fundamental contribution of this paper is the development of an extended virtual sensor framework to provide an automated travel time data collection method as incidents occur. In addition, social media data can be useful for more effective real-time incident response. The proposed framework can easily be modified and used to evaluate travel time effects of incidents on roadways and clearance times and to make use of social media data in obtaining time-critical incident-related information.


2018 ◽  
Vol 50 (3) ◽  
pp. 1025-1045 ◽  
Author(s):  
Killian Clarke ◽  
Korhan Kocak

AbstractDrawing on evidence from the 2011 Egyptian uprising, this article demonstrates how the use of two social media platforms – Facebook and Twitter – contributed to a discrete mobilizational outcome: the staging of a successful first protest in a revolutionary cascade, referred to here as ‘first-mover mobilization’. Specifically, it argues that these two platforms facilitated the staging of a large, nationwide and seemingly leaderless protest on 25 January 2011, which signaled to hesitant but sympathetic Egyptians that a revolution might be in the making. It draws on qualitative and quantitative evidence, including interviews, social media data and surveys, to analyze three mechanisms that linked these platforms to the success of the January 25 protest: (1) protester recruitment, (2) protest planning and coordination, and (3) live updating about protest logistics. The article not only contributes to debates about the role of the Internet in the Arab Spring and other recent waves of mobilization, but also demonstrates how scholarship on the Internet in politics might move toward making more discrete, empirically grounded causal claims.


2018 ◽  
Vol 38 (1) ◽  
pp. 42-56 ◽  
Author(s):  
William R. Frey ◽  
Desmond U. Patton ◽  
Michael B. Gaskell ◽  
Kyle A. McGregor

Mining social media data for studying the human condition has created new and unique challenges. When analyzing social media data from marginalized communities, algorithms lack the ability to accurately interpret off-line context, which may lead to dangerous assumptions about and implications for marginalized communities. To combat this challenge, we hired formerly gang-involved young people as domain experts for contextualizing social media data in order to create inclusive, community-informed algorithms. Utilizing data from the Gang Intervention and Computer Science Project—a comprehensive analysis of Twitter data from gang-involved youth in Chicago—we describe the process of involving formerly gang-involved young people in developing a new part-of-speech tagger and content classifier for a prototype natural language processing system that detects aggression and loss in Twitter data. We argue that involving young people as domain experts leads to more robust understandings of context, including localized language, culture, and events. These insights could change how data scientists approach the development of corpora and algorithms that affect people in marginalized communities and who to involve in that process. We offer a contextually driven interdisciplinary approach between social work and data science that integrates domain insights into the training of qualitative annotators and the production of algorithms for positive social impact.


Author(s):  
Sangeeta Namdev Dhamdhere ◽  
Deepak Mane

In today's world, every reader or social media user has different choices/hobbies in terms of reading. For example, if any social media user is searching for a book to read without any specific idea of what s/he wants, s/he wastes a lot of time browsing around on the internet and crawling/trawling through various sites hoping that s/he might get good book. To avoid confusion, the authors are building a recommendation system for every reader/user that helps to recommend a book based on his choices, hobbies, or what s/he had read previously that will be massive help for users instead wasting time on various sites. Data from social media is the powerful fuel that can be used to helps in decision making and building a recommendation engine. Social media data in the different format is biggest challenge for the business to ingest data at the reasonable speed and further process. In social media data, it is difficult to detect and capture data. Real-time recommendation engine for users, which includes data ingestion methods, challenges, metadata problem, analysis, and consumption, is discussed here.


Sign in / Sign up

Export Citation Format

Share Document