What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data

In reaction to the Cambridge Analytica scandal, Facebook has restricted the access to its Application Programming Interface (API). This new policy has damaged the possibility for independent researchers to study relevant topics in political and social behavior. Yet, much of the public information that the researchers may be interested in is still available on Facebook, and can be still systematically collected through web scraping techniques. The goal of this article is twofold. First, we discuss some ethical and legal issues that researchers should consider as they plan their collection and possible publication of Facebook data. In particular, we discuss what kind of information can be ethically gathered about the users (public information), how published data should look like to comply with privacy regulations (like the GDPR), and what consequences violating Facebook’s terms of service may entail for the researcher. Second, we present a scraping routine for public Facebook posts, and discuss some technical adjustments that can be performed for the data to be ethically and legally acceptable. The code employs screen scraping to collect the list of reactions to a Facebook public post, and performs a one-way cryptographic hash function on the users’ identifiers to pseudonymize their personal information, while still keeping them traceable within the data. This article contributes to the debate around freedom of internet research and the ethical concerns that might arise by scraping data from the social web.

Download Full-text

Nonprobability Sampling and Twitter

Social Science Computer Review ◽

10.1177/0894439317709431 ◽

2017 ◽

Vol 36 (2) ◽

pp. 195-211 ◽

Cited By ~ 9

Author(s):

Patrick Rafail

Keyword(s):

Social Sciences ◽

Application Programming Interface ◽

Sampling Strategies ◽

Twitter Data ◽

Large Databases ◽

The Social ◽

Application Programming ◽

Data Source ◽

Twitter Activity ◽

Programming Interface

Twitter data are widely used in the social sciences. The Twitter Application Programming Interface (API) allows researchers to build large databases of user activity efficiently. Despite the potential of Twitter as a data source, less attention has been paid to issues of sampling, and in particular, the implications of different sampling strategies on overall data quality. This research proposes a set of conceptual distinctions between four types of populations that emerge when analyzing Twitter data and suggests sampling strategies that facilitate more comprehensive data collection from the Twitter API. Using three applications drawn from large databases of Twitter activity, this research also compares the results from the proposed sampling strategies, which provide defensible representations of the population of activity, to those collected with more frequently used hashtag samples. The results suggest that hashtag samples misrepresent important aspects of Twitter activity and may lead researchers to erroneous conclusions.

Download Full-text

COVID-19: A Relook at Healthcare Systems and Aged Populations

Sustainability ◽

10.3390/su12104200 ◽

2020 ◽

Vol 12 (10) ◽

pp. 4200 ◽

Cited By ~ 1

Author(s):

Thanh-Long Giang ◽

Dinh-Tri Vo ◽

Quan-Hoang Vuong

Keyword(s):

Application Programming Interface ◽

Healthcare Systems ◽

Finite Sample ◽

Death Rates ◽

Development Indicators ◽

The Public ◽

Macroeconomic Indicators ◽

Application Programming ◽

Using Data ◽

Programming Interface

Using data from the WHO’s Situation Report on the COVID-19 pandemic from 21 January 2020 to 30 March 2020 along with other health, demographic, and macroeconomic indicators from the WHO’s Application Programming Interface and the World Bank’s Development Indicators, this paper explores the death rates of infected persons and their possible associated factors. Through the panel analysis, we found consistent results that healthcare system conditions, particularly the number of hospital beds and medical staff, have played extremely important roles in reducing death rates of COVID-19 infected persons. In addition, both the mortality rates due to different non-communicable diseases (NCDs) and rate of people aged 65 and over were significantly related to the death rates. We also found that controlling international and domestic travelling by air along with increasingly popular anti-COVID-19 actions (i.e., quarantine and social distancing) would help reduce the death rates in all countries. We conducted tests for robustness and found that the Driscoll and Kraay (1998) method was the most suitable estimator with a finite sample, which helped confirm the robustness of our estimations. Based on the findings, we suggest that preparedness of healthcare systems for aged populations need more attentions from the public and politicians, regardless of income level, when facing COVID-19-like pandemics.

Download Full-text

Application Programming Interface (API) Research

International Journal of Enterprise Information Systems ◽

10.4018/ijeis.2019070105 ◽

2019 ◽

Vol 15 (3) ◽

pp. 76-95 ◽

Cited By ~ 1

Author(s):

Joshua Ofoeda ◽

Richard Boateng ◽

John Effah

Keyword(s):

Social Issues ◽

Application Programming Interface ◽

Future Research ◽

Academic Journals ◽

Research Directions ◽

The Social ◽

Application Programming ◽

Future Research Directions ◽

Programming Interface ◽

Level Of Analysis

The purpose of this study is to perform a synthesis of API research. The study took stock of literature from academic journals on APIs with their associated themes, frameworks, methodologies, publication outlets and level of analysis. The authors draw on a total of 104 articles from academic journals and conferences published from 2010 to 2018. A systematic literature review was conducted on the selected articles. The findings suggest that API research is primarily atheoretical and largely focuses on the technological dimensions such as design and usage; thus, neglecting most of the social issues such as the business and managerial applications of APIs, which are equally important. Future research directions are provided concerning the gaps identified.

Download Full-text

OPTIMADE, an API for exchanging materials data

Scientific Data ◽

10.1038/s41597-021-00974-z ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Casper W. Andersen ◽

Rickard Armiento ◽

Evgeny Blokhin ◽

Gareth J. Conduit ◽

Shyam Dwaraknath ◽

...

Keyword(s):

Application Programming Interface ◽

Worked Examples ◽

Materials Design ◽

The Public ◽

Software Packages ◽

Application Programming ◽

Programming Interface ◽

Universal Application

AbstractThe Open Databases Integration for Materials Design (OPTIMADE) consortium has designed a universal application programming interface (API) to make materials databases accessible and interoperable. We outline the first stable release of the specification, v1.0, which is already supported by many leading databases and several software packages. We illustrate the advantages of the OPTIMADE API through worked examples on each of the public materials databases that support the full API specification.

Download Full-text

Serverless Geospatial Data Processing Workflow System Design

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010020 ◽

2021 ◽

Vol 11 (1) ◽

pp. 20

Author(s):

Mete Ercan Pakdil ◽

Rahmi Nurhan Çelik

Keyword(s):

Data Processing ◽

System Design ◽

Application Programming Interface ◽

Geospatial Data ◽

Proof Of Concept ◽

The Public ◽

Workflow System ◽

Application Programming ◽

Open Geospatial Consortium ◽

Programming Interface

Geospatial data and related technologies have become an increasingly important aspect of data analysis processes, with their prominent role in most of them. Serverless paradigm have become the most popular and frequently used technology within cloud computing. This paper reviews the serverless paradigm and examines how it could be leveraged for geospatial data processes by using open standards in the geospatial community. We propose a system design and architecture to handle complex geospatial data processing jobs with minimum human intervention and resource consumption using serverless technologies. In order to define and execute workflows in the system, we also propose new models for both workflow and task definitions models. Moreover, the proposed system has new Open Geospatial Consortium (OGC) Application Programming Interface (API) Processes specification-based web services to provide interoperability with other geospatial applications with the anticipation that it will be more commonly used in the future. We implemented the proposed system on one of the public cloud providers as a proof of concept and evaluated it with sample geospatial workflows and cloud architecture best practices.

Download Full-text

An Analysis of Self-reported Longcovid Symptoms on Twitter

10.1101/2020.08.14.20175059 ◽

2020 ◽

Author(s):

Shubh Mohan Singh ◽

Chaitanya Reddy

Keyword(s):

Acute Infection ◽

Application Programming Interface ◽

Search Term ◽

Shortness Of Breath ◽

Public Health Importance ◽

The Public ◽

Streaming Application ◽

Application Programming ◽

Multiple Symptoms ◽

Programming Interface

Abstract Objectives: A majority of patients suffering from acute COVID-19 are expected to recover symptomatically and functionally. However there are reports that some people continue to experience symptoms even beyond the stage of acute infection. This phenomenon has been called longcovid. Study design: This study attempted to analyse symptoms reported by users on twitter self-identifying as longcovid. Methods: The search was carried out using the twitter public streaming application programming interface using a relevant search term. Results: We could identify 89 users with usable data in the tweets posted by them. A majority of users described multiple symptoms the most common of which were fatigue, shortness of breath, pain and brainfog/concentration difficulties. The most common course of symptoms was episodic. Conclusions: Given the public health importance of this issue, the study suggests that there is a need to better study post acute-COVID symptoms.

Download Full-text

BrAPI—an application programming interface for plant breeding applications

Bioinformatics ◽

10.1093/bioinformatics/btz190 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4147-4155 ◽

Cited By ~ 22

Author(s):

Peter Selby ◽

Rafael Abbeloos ◽

Jan Erik Backlund ◽

Martin Basterrechea Salido ◽

Guillaume Bauchet ◽

...

Keyword(s):

Plant Breeding ◽

Data Exchange ◽

Application Programming Interface ◽

Multiple Systems ◽

The Public ◽

Collaborative Community ◽

Application Programming ◽

Developer Tools ◽

Modern Genomic ◽

Programming Interface

Abstract Motivation Modern genomic breeding methods rely heavily on very large amounts of phenotyping and genotyping data, presenting new challenges in effective data management and integration. Recently, the size and complexity of datasets have increased significantly, with the result that data are often stored on multiple systems. As analyses of interest increasingly require aggregation of datasets from diverse sources, data exchange between disparate systems becomes a challenge. Results To facilitate interoperability among breeding applications, we present the public plant Breeding Application Programming Interface (BrAPI). BrAPI is a standardized web service API specification. The development of BrAPI is a collaborative, community-based initiative involving a growing global community of over a hundred participants representing several dozen institutions and companies. Development of such a standard is recognized as critical to a number of important large breeding system initiatives as a foundational technology. The focus of the first version of the API is on providing services for connecting systems and retrieving basic breeding data including germplasm, study, observation, and marker data. A number of BrAPI-enabled applications, termed BrAPPs, have been written, that take advantage of the emerging support of BrAPI by many databases. Availability and implementation More information on BrAPI, including links to the specification, test suites, BrAPPs, and sample implementations is available at https://brapi.org/. The BrAPI specification and the developer tools are provided as free and open source.

Download Full-text

Search Engine Get Application Programming Interface

Jurnal Sains dan Informatika ◽

10.34128/jsi.v5i2.175 ◽

2019 ◽

Vol 5 (2) ◽

pp. 88-97

Author(s):

M. Fuadi Aziz Muri ◽

Hendrik Setyo Utomo ◽

Rabini Sayyidati

Keyword(s):

Web Service ◽

Search Engine ◽

Application Programming Interface ◽

Function Concept ◽

The Public ◽

Representational State Transfer ◽

State Transfer ◽

General Search ◽

Application Programming ◽

Programming Interface

Application Programming Interface (API) is a function concept that can be called by other programs. The API works as a link that unites various applications of various types of platforms, commonly known as API public names. The public API has been widely spread, while its users, programmers who want to search for public APIs, must browse through various methods such as general search engines, repository documentation or directly in web articles. The user does not yet have a system specifically for collecting public-public APIs, so that users have difficulty in performing API public link searches. The solution to these problems can be solved by building a web framework with a search engine interface that provides specific public-public searches for the API, so that users can search the API public more easily. Web Service is an API that is made to support the interaction between two or more different applications through a network. Representational State Transfer (ReST) is one of the rules.

Download Full-text

An Efficient Method to Extract Geographic Information

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1003.0785s319 ◽

2019 ◽

Vol 8 (5S3) ◽

pp. 5-10 ◽

Cited By ~ 1

Keyword(s):

Personal Information ◽

Application Programming Interface ◽

Geographic Information ◽

The Internet ◽

Web Crawler ◽

Google Maps ◽

Points Of Interest ◽

Application Programming ◽

The Given ◽

Programming Interface

A ton of vital geographic information about spots, including points of interest, areas and personal information such as neighborhoods, phone numbers etc. can be found on the Internet. However, such information is not openly available using legitimate means. Furthermore, the given information is temperamental as it is static and not refreshed every now and again enough. In this paper, using the results of an internet list, an effective method to manage and collect datasets of spot names is demonstrated. The strategy proposed is to use the Google web crawler Application Programming Interface in order to recoup site pages related with express territory names and types of spots and after that analyses the resultant website pages to remove addresses and names of places. Using the data gathered from internet, the final result compiled is a dataset of spot names. We survey our philosophy by using accumulated data found using street view of Google Maps by examining signs belonging to businesses found in images. The conclusion exhibited by the results was that the modelled procedure efficiently created spot datasets on par with Google Maps and defeated the results of OSM.

Download Full-text

A Pilot Study of Valley Fever Tweets

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.601 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s101-s101

Author(s):

Nana Li ◽

Gondy Leroy ◽

Fariba Donovan ◽

John Galgiani ◽

Katherine Ellingson

Keyword(s):

New York ◽

Application Programming Interface ◽

Washington Dc ◽

Valley Fever ◽

Social Media Data ◽

The Public ◽

Application Programming ◽

Twitter Analysis ◽

Media Data ◽

Programming Interface

Background: Twitter is used by officials to distribute public health messages and by the public to post information about ongoing afflictions. Because tweets originate from geographically and socially diverse sources, scholars have used this social media data to analyze the spread of diseases like flu [Alessio Signorini 2011], asthma [Philip Harber 2019] and mental health disorders [Chandler McClellan, 2017]. To our knowledge, no Twitter analysis has been performed for Valley fever. Valley fever is a fungal infection caused by the Coccidioides organism, mostly found in Arizona and California. Objective: We analyzed tweets concerning Valley fever to evaluate content, location, and timing. Methods: We collected tweets using the Twitter search application programming interface using the terms “Valley fever,” “valleyfever,” “cocci” or “‘Valleyfever” from August 6 to 16, 2019, and again from October 20 to 29, 2019. In total, 2,117 Tweets were retrieved. Tweets not focused on Valley fever were filtered out, including a tweet about “Rift valley fever” and tweets where “valley” and “fever” were separate and not one phrase. We excluded tweets not written in English. In total, 1,533 tweets remained; we grouped them into 3 categories: original tweets, hereafter labeled “normal” (N = 497), retweets (N = 811), and replies (N = 225). We converted all terms to lowercase, removed white space and punctuation, and tokenized the tweets. Informal messaging conventions (eg, hashtag, @user, RT, links) and stop words were removed, and terms were lemmatized. Finally, we analyzed the frequency of tweets by season, state, and co-occurring terms. Results: Tweet frequency was 228.5 per week in summer and 113.4 per week in the fall. Users tweeted from 40 different states; the most common were California (N = 401; 10.1 per 100,00 population) and Arizona (N = 216, 30.1 per 100,000 population), New York (N = 49), Florida (N = 21), and Washington, DC (N = 14). Term frequency analysis showed that for normal tweets, the 5 most frequent terms were “awareness,” “Arizona,” “disease,” “California,” and “people.” For retweets, the most common terms were “Gunner” (a dog name), “vet,” “prayer,” “cough,” and “family.” For replies, they were “dog,” “lung,” “vet,” “day,” and “result.” Several symptoms were mentioned: “cough” (normal: 8, retweets: 104, and replies: 7), “sick” (normal: 21, retweets: 42, replies: 7), “rash” (normal: 2, retweets: 6, replies: 1), and “headache” (normal: 1, retweets: 3, replies: 0). Conclusions: Valley fever tweets are potentially sufficient to track disease intensity, especially in Arizona and California. Data collection over longer intervals is needed to understand the utility of Twitter in this context.Disclosures: NoneFunding: None

Download Full-text