scholarly journals A new distributed data analysis framework for better scientific collaborations

Author(s):  
Philipp S. Sommer ◽  
Viktoria Wichert ◽  
Daniel Eggert ◽  
Tilman Dinter ◽  
Klaus Getzlaff ◽  
...  

<p>A common challenge for projects with multiple involved research institutes is a well-defined and productive collaboration. All parties measure and analyze different aspects, depend on each other, share common methods, and exchange the latest results, findings, and data. Today this exchange is often impeded by a lack of ready access to shared computing and storage resources. In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources.</p><p>We present the prototype of an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data. It offers the functionality to request remote data through a comfortable interface, and to share and invoke single computational methods or even entire analytical workflows and their results. The prototype enables researchers to make their methods accessible as a backend module running on their own <span>infrastructure</span>. Hence researchers from other institutes may apply the available methods through a lightweight python or Javascript API. This API transforms standard python calls into requests to the backend process on the remote server. In the end, the overhead for both, the backend developer and the remote user, is very low. The effort of implementing the necessary workflow and API usage equalizes the writing of code in a non-distributed setup. Besides that, data do not have to be downloaded locally, the analysis can be executed “close to the data” while using the institutional infrastructure where the eligible data set is stored.</p><p>With our prototype, we demonstrate distributed data access and analysis workflows across institutional borders to enable effective scientific collaboration, thus deepening our understanding of the Earth system.</p><p>This framework has been developed in a joint effort of the DataHub and Digitial Earth initiatives within the Research Centers of the Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.  (Helmholtz Association of German Research Centres, HGF).</p>

Author(s):  
Anja Bechmann ◽  
Peter Bjerregaard Vahlstrup

The aim of this article is to discuss methodological implications and challenges in different kinds of deep and big data studies of Facebook and Instagram through methods involving the use of Application Programming Interface (API) data. This article describes and discusses Digital Footprints (www.digitalfootprints.dk), a data extraction and analytics software that allows researchers to extract user data from Facebook and Instagram data sources; public streams as well as private data with user consent. Based on insights from the software design process and data driven studies the article argues for three main challenges: Data quality, data access and analysis, and legal and ethical considerations.


2021 ◽  
Vol 2094 (3) ◽  
pp. 032045
Author(s):  
A Y Unger

Abstract A new design pattern intended for distributed cloud-based information systems is proposed. Pattern is based on the traditional client-server architecture. The server side is divided into three principal components: data storage, application server and cache server. Each component can be used to deploy parts of several independent information systems, thus realizing shared-resource approach. A strategy of separation of competencies between the client and the server is proposed. The strategy assumes that the client side is responsible for application logic and the server side is responsible for data storage consistency and data access control. Data protection is ensured by means of two particular approaches: at the entity level and at the transaction level. The application programming interface to access data is presented at the level of identified transaction descriptors.


Analysis of structured and consistent data has seen remarkable success in past decades. Whereas, the analysis of unstructured data in the form of multimedia format remains a challenging task. YouTube is one of the most popular and used social media tool. It reveals the community feedback through comments for published videos, number of likes, dislikes, number of subscribers for a particular channel. The main objective of this work is to demonstrate by using Hadoop concepts, how data generated from YouTube can be mined and utilized to make targeted, real time and informed decisions. In our paper, we analyze the data to identify the top categories in which the most number of videos are uploaded. This YouTube data is publicly available and the YouTube data set is described below under the heading Data Set Description. The dataset will be fetched from the Google using the YouTube API (Application Programming Interface) and going to be stored in Hadoop Distributed File System (HDFS). Using MapReduce we are going to analyze the dataset to identify the video categories in which most number of videos are uploaded. The objective of this paper is to demonstrate Apache Hadoop framework concepts and how to make targeted, real-time and informed decisions using data gathered from YouTube.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Kenneth D. Mandl ◽  
Daniel Gottlieb ◽  
Joshua C. Mandel ◽  
Vladimir Ignatov ◽  
Raheel Sayeed ◽  
...  

AbstractThe 21st Century Cures Act requires that certified health information technology have an application programming interface (API) giving access to all data elements of a patient’s electronic health record, “without special effort”. In the spring of 2020, the Office of the National Coordinator of Health Information Technology (ONC) published a rule—21st Century Cures Act Interoperability, Information Blocking, and the ONC Health IT Certification Program—regulating the API requirement along with protections against information blocking. The rule specifies the SMART/HL7 FHIR Bulk Data Access API, which enables access to patient-level data across a patient population, supporting myriad use cases across healthcare, research, and public health ecosystems. The API enables “push button population health” in that core data elements can readily and standardly be extracted from electronic health records, enabling local, regional, and national-scale data-driven innovation.


2021 ◽  
Vol 27 (2) ◽  
pp. 1-14
Author(s):  
Juho-Pekka Virtanen ◽  
Arttu Julin ◽  
Kaisa Jaalama ◽  
Hannu Hyyppä

Three-dimensional city models are an increasingly common data set maintained by many cities globally. At the same time, the focus of research has shifted from their production to their utilization in application development. We present the implementation of a demonstrator application combining the online visualization of a 3D city information model with the data from an application programming interface. By this, we aim to demonstrate the combined use of city APIs and 3D geospatial assets, promote their use for application development and show the performance of existing, openly available tools for 3D city model application development


2020 ◽  
Vol 17 ◽  
pp. 326-331
Author(s):  
Kamil Siebyła ◽  
Maria Skublewska-Paszkowska

There are various methods for creating web applications. Each of these methods has different levels of performance. This factor is measurable at every level of the application. The performance of the frontend layer depends on the response time from individual endpoint of the used API (Application Programming Interface). The way the data access will be programmed at a specific endpoint, therefore, determines the performance of the entire application. There are many programming methods that are often time-consuming to implement. This article presents a comparison of the available methods of handling the persistence layer in relation to the efficiency of their implementation.                                                                                    


Author(s):  
GREGORY CU ◽  
JOSE MARI R. CIPRIANO ◽  
MICHAEL JOSEPH GONZALES ◽  
KEVIN MARTIN TANALGO ◽  
CHRISTIAN KAY B. MAGDAONG ◽  
...  

2020 ◽  
Vol 11 (01) ◽  
pp. 059-069 ◽  
Author(s):  
Prashila Dullabh ◽  
Lauren Hovey ◽  
Krysta Heaney-Huls ◽  
Nithya Rajendran ◽  
Adam Wright ◽  
...  

Abstract Objective Interest in application programming interfaces (APIs) is increasing as key stakeholders look for technical solutions to interoperability challenges. We explored three thematic areas to assess the current state of API use for data access and exchange in health care: (1) API use cases and standards; (2) challenges and facilitators for read and write capabilities; and (3) outlook for development of write capabilities. Methods We employed four methods: (1) literature review; (2) expert interviews with 13 API stakeholders; (3) review of electronic health record (EHR) app galleries; and (4) a technical expert panel. We used an eight-dimension sociotechnical model to organize our findings. Results The API ecosystem is complicated and cuts across five of the eight sociotechnical model dimensions: (1) app marketplaces support a range of use cases, the majority of which target providers' needs, with far fewer supporting patient access to data; (2) current focus on read APIs with limited use of write APIs; (3) where standards are used, they are largely Fast Healthcare Interoperability Resources (FHIR); (4) FHIR-based APIs support exchange of electronic health information within the common clinical data set; and (5) validating external data and data sources for clinical decision making creates challenges to provider workflows. Conclusion While the use of APIs in health care is increasing rapidly, it is still in the pilot stages. We identified five key issues with implications for the continued advancement of API use: (1) a robust normative FHIR standard; (2) expansion of the common clinical data set to other data elements; (3) enhanced support for write implementation; (4) data provenance rules; and (5) data governance rules. Thus, while APIs are being touted as a solution to interoperability challenges, they remain an emerging technology that is only one piece of a multipronged approach to data access and use.


2011 ◽  
Vol 14 (1) ◽  
pp. 1-12
Author(s):  
Norman L. Jones ◽  
Robert M. Wallace ◽  
Russell Jones ◽  
Cary Butler ◽  
Alan Zundel

This paper describes an Application Programming Interface (API) for managing multi-dimensional data produced for water resource computational modeling that is being developed by the US Army Engineer Research and Development Center (ERDC), in conjunction with Brigham Young University. This API, along with a corresponding data standard, is being implemented within ERDC computational models to facilitate rapid data access, enhanced data compression and data sharing, and cross-platform independence. The API and data standard are known as the eXtensible Model Data Format (XMDF), and version 1.3 is available for free download. This API is designed to manage geometric data associated with grids, meshes, riverine and coastal cross sections, and both static and transient array-based datasets. The inclusion of coordinate system data makes it possible to share data between models developed in different coordinate systems. XMDF is used to store the data-intensive components of a modeling study in a compressed binary format that is platform-independent. It also provides a standardized file format that enhances modeling linking and data sharing between models.


Sign in / Sign up

Export Citation Format

Share Document