A new distributed data analysis framework for better scientific collaborations

A common challenge for projects with multiple involved research institutes is a well-defined and productive collaboration. All parties measure and analyze different aspects, depend on each other, share common methods, and exchange the latest results, findings, and data. Today this exchange is often impeded by a lack of ready access to shared computing and storage resources. In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources.We present the prototype of an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data. It offers the functionality to request remote data through a comfortable interface, and to share and invoke single computational methods or even entire analytical workflows and their results. The prototype enables researchers to make their methods accessible as a backend module running on their own infrastructure. Hence researchers from other institutes may apply the available methods through a lightweight python or Javascript API. This API transforms standard python calls into requests to the backend process on the remote server. In the end, the overhead for both, the backend developer and the remote user, is very low. The effort of implementing the necessary workflow and API usage equalizes the writing of code in a non-distributed setup. Besides that, data do not have to be downloaded locally, the analysis can be executed &#8220;close to the data&#8221; while using the institutional infrastructure where the eligible data set is stored.With our prototype, we demonstrate distributed data access and analysis workflows across institutional borders to enable effective scientific collaboration, thus deepening our understanding of the Earth system.This framework has been developed in a joint effort of the DataHub and Digitial Earth initiatives within the Research Centers of the Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.&#160; (Helmholtz Association of German Research Centres, HGF).

Download Full-text

Studying Facebook and Instagram data: The Digital Footprints software

First Monday ◽

10.5210/fm.v20i12.5968 ◽

2015 ◽

Cited By ~ 10

Author(s):

Anja Bechmann ◽

Peter Bjerregaard Vahlstrup

Keyword(s):

Data Extraction ◽

Application Programming Interface ◽

Data Access ◽

Quality Data ◽

Ethical Considerations ◽

Private Data ◽

User Data ◽

Application Programming ◽

Digital Footprints ◽

Programming Interface

The aim of this article is to discuss methodological implications and challenges in different kinds of deep and big data studies of Facebook and Instagram through methods involving the use of Application Programming Interface (API) data. This article describes and discusses Digital Footprints (www.digitalfootprints.dk), a data extraction and analytics software that allows researchers to extract user data from Facebook and Instagram data sources; public streams as well as private data with user consent. Based on insights from the software design process and data driven studies the article argues for three main challenges: Data quality, data access and analysis, and legal and ethical considerations.

Download Full-text

A formal pattern of information system design

Journal of Physics Conference Series ◽

10.1088/1742-6596/2094/3/032045 ◽

2021 ◽

Vol 2094 (3) ◽

pp. 032045

Author(s):

A Y Unger

Keyword(s):

Information Systems ◽

Data Storage ◽

Application Programming Interface ◽

Data Access ◽

Server Side ◽

Data Access Control ◽

Independent Information ◽

Application Logic ◽

Application Programming ◽

Client Side

Abstract A new design pattern intended for distributed cloud-based information systems is proposed. Pattern is based on the traditional client-server architecture. The server side is divided into three principal components: data storage, application server and cache server. Each component can be used to deploy parts of several independent information systems, thus realizing shared-resource approach. A strategy of separation of competencies between the client and the server is proposed. The strategy assumes that the client side is responsible for application logic and the server side is responsible for data storage consistency and data access control. Data protection is ensured by means of two particular approaches: at the entity level and at the transaction level. The application programming interface to access data is presented at the level of identified transaction descriptors.

Download Full-text

Mining and YouTube Data Analysis using Hadoop

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7922.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1461-1465

Keyword(s):

Real Time ◽

Application Programming Interface ◽

Data Set ◽

Apache Hadoop ◽

Informed Decisions ◽

Hadoop Distributed File System ◽

Application Programming ◽

Using Data ◽

Programming Interface ◽

Social Media Tool

Analysis of structured and consistent data has seen remarkable success in past decades. Whereas, the analysis of unstructured data in the form of multimedia format remains a challenging task. YouTube is one of the most popular and used social media tool. It reveals the community feedback through comments for published videos, number of likes, dislikes, number of subscribers for a particular channel. The main objective of this work is to demonstrate by using Hadoop concepts, how data generated from YouTube can be mined and utilized to make targeted, real time and informed decisions. In our paper, we analyze the data to identify the top categories in which the most number of videos are uploaded. This YouTube data is publicly available and the YouTube data set is described below under the heading Data Set Description. The dataset will be fetched from the Google using the YouTube API (Application Programming Interface) and going to be stored in Hadoop Distributed File System (HDFS). Using MapReduce we are going to analyze the dataset to identify the video categories in which most number of videos are uploaded. The objective of this paper is to demonstrate Apache Hadoop framework concepts and how to make targeted, real-time and informed decisions using data gathered from YouTube.

Download Full-text

Push Button Population Health: The SMART/HL7 FHIR Bulk Data Access Application Programming Interface

npj Digital Medicine ◽

10.1038/s41746-020-00358-4 ◽

2020 ◽

Vol 3 (1) ◽

Cited By ~ 1

Author(s):

Kenneth D. Mandl ◽

Daniel Gottlieb ◽

Joshua C. Mandel ◽

Vladimir Ignatov ◽

Raheel Sayeed ◽

...

Keyword(s):

Information Technology ◽

Application Programming Interface ◽

Data Access ◽

Push Button ◽

Hl7 Fhir ◽

Bulk Data ◽

Application Programming ◽

Data Elements ◽

21St Century Cures Act ◽

Programming Interface

AbstractThe 21st Century Cures Act requires that certified health information technology have an application programming interface (API) giving access to all data elements of a patient’s electronic health record, “without special effort”. In the spring of 2020, the Office of the National Coordinator of Health Information Technology (ONC) published a rule—21st Century Cures Act Interoperability, Information Blocking, and the ONC Health IT Certification Program—regulating the API requirement along with protections against information blocking. The rule specifies the SMART/HL7 FHIR Bulk Data Access API, which enables access to patient-level data across a patient population, supporting myriad use cases across healthcare, research, and public health ecosystems. The API enables “push button population health” in that core data elements can readily and standardly be extracted from electronic health records, enabling local, regional, and national-scale data-driven innovation.

Download Full-text

Creating Open Online Applications with Geospatial Interfaces - Case Study "Palvelututka"

The Photogrammetric Journal of Finland ◽

10.17690/021272.1 ◽

2021 ◽

Vol 27 (2) ◽

pp. 1-14

Author(s):

Juho-Pekka Virtanen ◽

Arttu Julin ◽

Kaisa Jaalama ◽

Hannu Hyyppä

Keyword(s):

Three Dimensional ◽

Application Programming Interface ◽

Application Development ◽

Data Set ◽

3D City Model ◽

Combined Use ◽

Application Programming ◽

City Model ◽

City Models

Three-dimensional city models are an increasingly common data set maintained by many cities globally. At the same time, the focus of research has shifted from their production to their utilization in application development. We present the implementation of a demonstrator application combining the online visualization of a 3D city information model with the data from an application programming interface. By this, we aim to demonstrate the combined use of city APIs and 3D geospatial assets, promote their use for application development and show the performance of existing, openly available tools for 3D city model application development

Download Full-text

Impact of the persistence layer implementation methods on application per-formance

Journal of Computer Sciences Institute ◽

10.35784/jcsi.2067 ◽

2020 ◽

Vol 17 ◽

pp. 326-331

Author(s):

Kamil Siebyła ◽

Maria Skublewska-Paszkowska

Keyword(s):

Response Time ◽

Web Applications ◽

Application Programming Interface ◽

Data Access ◽

Persistence Layer ◽

Application Programming ◽

Different Levels ◽

Programming Interface ◽

The Way

There are various methods for creating web applications. Each of these methods has different levels of performance. This factor is measurable at every level of the application. The performance of the frontend layer depends on the response time from individual endpoint of the used API (Application Programming Interface). The way the data access will be programmed at a specific endpoint, therefore, determines the performance of the entire application. There are many programming methods that are often time-consuming to implement. This article presents a comparison of the available methods of handling the persistence layer in relation to the efficiency of their implementation.

Download Full-text

Application Programming Interface, Formats and Protocols, and Remote Data Access

Solutions for Networked Databases ◽

10.1016/b978-0-12-174060-3.50015-5 ◽

1993 ◽

pp. 185-198

Author(s):

Dimitris N. Chorafas ◽

Heinrich Steinmann

Keyword(s):

Application Programming Interface ◽

Data Access ◽

Application Programming ◽

Programming Interface ◽

Remote Data

Download Full-text

AN APPLICATION PROGRAMMING INTERFACE FOR THE COMMUNICATION AND STORAGE PROTOCOL FOR THE TALA EMPATHIC SPACE

Theory and Practice of Computation ◽

10.1142/9789814612883_0008 ◽

2014 ◽

Author(s):

GREGORY CU ◽

JOSE MARI R. CIPRIANO ◽

MICHAEL JOSEPH GONZALES ◽

KEVIN MARTIN TANALGO ◽

CHRISTIAN KAY B. MAGDAONG ◽

...

Keyword(s):

Application Programming Interface ◽

Application Programming ◽

And Storage ◽

Programming Interface

Download Full-text

Application Programming Interfaces in Health Care: Findings from a Current-State Sociotechnical Assessment

Applied Clinical Informatics ◽

10.1055/s-0039-1701001 ◽

2020 ◽

Vol 11 (01) ◽

pp. 059-069 ◽

Cited By ~ 1

Author(s):

Prashila Dullabh ◽

Lauren Hovey ◽

Krysta Heaney-Huls ◽

Nithya Rajendran ◽

Adam Wright ◽

...

Keyword(s):

Health Care ◽

Clinical Data ◽

Data Access ◽

Use Cases ◽

Data Set ◽

Application Programming Interfaces ◽

Current State ◽

Application Programming ◽

The Common ◽

Programming Interfaces

Abstract Objective Interest in application programming interfaces (APIs) is increasing as key stakeholders look for technical solutions to interoperability challenges. We explored three thematic areas to assess the current state of API use for data access and exchange in health care: (1) API use cases and standards; (2) challenges and facilitators for read and write capabilities; and (3) outlook for development of write capabilities. Methods We employed four methods: (1) literature review; (2) expert interviews with 13 API stakeholders; (3) review of electronic health record (EHR) app galleries; and (4) a technical expert panel. We used an eight-dimension sociotechnical model to organize our findings. Results The API ecosystem is complicated and cuts across five of the eight sociotechnical model dimensions: (1) app marketplaces support a range of use cases, the majority of which target providers' needs, with far fewer supporting patient access to data; (2) current focus on read APIs with limited use of write APIs; (3) where standards are used, they are largely Fast Healthcare Interoperability Resources (FHIR); (4) FHIR-based APIs support exchange of electronic health information within the common clinical data set; and (5) validating external data and data sources for clinical decision making creates challenges to provider workflows. Conclusion While the use of APIs in health care is increasing rapidly, it is still in the pilot stages. We identified five key issues with implications for the continued advancement of API use: (1) a robust normative FHIR standard; (2) expansion of the common clinical data set to other data elements; (3) enhanced support for write implementation; (4) data provenance rules; and (5) data governance rules. Thus, while APIs are being touted as a solution to interoperability challenges, they remain an emerging technology that is only one piece of a multipronged approach to data access and use.

Download Full-text

Efficient application programming interface for multi-dimensional modeling data

Journal of Hydroinformatics ◽

10.2166/hydro.2011.013 ◽

2011 ◽

Vol 14 (1) ◽

pp. 1-12

Author(s):

Norman L. Jones ◽

Robert M. Wallace ◽

Russell Jones ◽

Cary Butler ◽

Alan Zundel

Keyword(s):

Data Sharing ◽

Cross Sections ◽

Computational Models ◽

Application Programming Interface ◽

Data Access ◽

Brigham Young University ◽

Us Army ◽

Data Standard ◽

Application Programming ◽

Programming Interface

This paper describes an Application Programming Interface (API) for managing multi-dimensional data produced for water resource computational modeling that is being developed by the US Army Engineer Research and Development Center (ERDC), in conjunction with Brigham Young University. This API, along with a corresponding data standard, is being implemented within ERDC computational models to facilitate rapid data access, enhanced data compression and data sharing, and cross-platform independence. The API and data standard are known as the eXtensible Model Data Format (XMDF), and version 1.3 is available for free download. This API is designed to manage geometric data associated with grids, meshes, riverine and coastal cross sections, and both static and transient array-based datasets. The inclusion of coordinate system data makes it possible to share data between models developed in different coordinate systems. XMDF is used to store the data-intensive components of a modeling study in a compressed binary format that is platform-independent. It also provides a standardized file format that enhances modeling linking and data sharing between models.

Download Full-text