scholarly journals ESCAPE Data Lake

2021 ◽  
Vol 251 ◽  
pp. 02056
Author(s):  
Riccardo Di Maria ◽  
Rizart Dona ◽  

The European-funded ESCAPE project (Horizon 2020) aims to address computing challenges in the context of the European Open Science Cloud. The project targets Particle Physics and Astronomy facilities and research infrastructures, focusing on the development of solutions to handle Exabyte-scale datasets. The science projects in ESCAPE are in different phases of evolution and count a variety of specific use cases and challenges to be addressed. This contribution describes the shared-ecosystem architecture of services, the Data Lake, fulfilling the needs in terms of data organisation, management, and access of the ESCAPE community. The Pilot Data Lake consists of several storage services operated by the partner institutes and connected through reliable networks, and it adopts Rucio to orchestrate data management and organisation. The results of a 24-hour Full Dress Rehearsal are also presented, highlighting the achievements of the Data Lake model and of the ESCAPE sciences.

2020 ◽  
Vol 245 ◽  
pp. 04019
Author(s):  
Rosie Bolton ◽  
Simone Campana ◽  
Andrea Ceccanti ◽  
Xavier Espinal ◽  
Aristeidis Fkiaras ◽  
...  

The European-funded ESCAPE project will prototype a shared solution to computing challenges in the context of the European Open Science Cloud. It targets Astronomy and Particle Physics facilities and research infrastructures and focuses on developing solutions for handling Exabyte scale datasets. The DIOS work package aims at delivering a Data Infrastructure for Open Science. Such an infrastructure would be a non HEP specific implementation of the data lake concept elaborated in the HSF Community White Paper and endorsed in the WLCG Strategy Document for HL-LHC. The science projects in ESCAPE are in different phases of evolution. While HL-LHC can leverage 15 years of experience of distributed computing in WLCG, other sciences are building now their computing models. This contribution describes the architecture of a shared ecosystem of services fulfilling the needs in terms of data organisation, management and access for the ESCAPE community. The backbone of such a data lake will consist of several storage services operated by the partner institutes and connected through reliable networks. Data management and organisation will be orchestrated through Rucio. A layer of caching and latency hiding services, supporting various access protocols will serve the data to heterogeneous facilities, from conventional Grid sites to HPC centres and Cloud providers. The authentication and authorisation system will be based on tokens. For the success of the project, DIOS will integrate open source solutions which demonstrated reliability and scalability as at the multi petabyte scale. Such services will be configured, deployed and complemented to cover the use cases of the ESCAPE sciences which will be further developed during the project.


2020 ◽  
Vol 41 (6/7) ◽  
pp. 383-399
Author(s):  
Elisha R.T. Chiware

PurposeThe paper presents a literature review on research data management services in African academic and research libraries on the backdrop of the advancing open science and open research data infrastructures. It provides areas of focus for library to support open research data.Design/methodology/approachThe literature analysis and future role of African libraries in research data management services were based on three areas as follows:open science, research infrastructures and open data infrastructures. Focussed literature searches were conducted across several electronic databases and discovery platforms, and a qualitative content analysis approach was used to explore the themes based on a coded list.FindingsThe review reports of an environment where open science in Africa is still at developmental stages. Research infrastructures face funding and technical challenges. Data management services are in formative stages with progress reported in a few countries where open science and research data management policies have emerged, cyber and data infrastructures are being developed and limited data librarianship courses are being taught.Originality/valueThe role of the academic and research libraries in Africa remains important in higher education and the national systems of research and innovation. Libraries should continue to align with institutional and national trends in response to the provision of data management services and as partners in the development of research infrastructures.


2020 ◽  
Author(s):  
Massimo Cocco ◽  
Daniele Bailo ◽  
Keith G. Jeffery ◽  
Rossana Paciello ◽  
Valerio Vinciarelli ◽  
...  

<p>Interoperability has long been an objective for research infrastructures dealing with research data to foster open access and open science. More recently, FAIR principles (Findability, Accessibility, Interoperability and Reusability) have been proposed. The FAIR principles are now reference criteria for promoting and evaluating openness of scientific data. FAIRness is considered a necessary target for research infrastructures in different scientific domains at European and global level.</p><p>Solid Earth RIs have long been committed to engage scientific communities involved in data collection, standardization and quality management as well as providing metadata and services for qualification, storage and accessibility. They are working to adopt FAIR principles, thus addressing the onerous task of turning these principles into practices. To make FAIR principles a reality in terms of service provision for data stewardship, some RI implementers in EPOS have proposed a FAIR-adoption process leveraging a four stage roadmap that reorganizes FAIR principles to better fit to scientists and RI implementers mindset. The roadmap considers FAIR principles as requirements in the software development life cycle, and reorganizes them into data, metadata, access services and use services. Both the implementation and the assessment of “FAIRness” level by means of questionnaire and metrics is made simple and closer to day-to-day scientists works.</p><p>FAIR data and service management is demanding, requiring resources and skills and more importantly it needs sustainable IT resources. For this reason, FAIR data management is challenging for many Research Infrastructures and data providers turning FAIR principles into reality through viable and sustainable practices. FAIR data management also includes implementing services to access data as well as to visualize, process, analyse and model them for generating new scientific products and discoveries.</p><p>FAIR data management is challenging to Earth scientists because it depends on their perception of finding, accessing and using data and scientific products: in other words, the perception of data sharing. The sustainability of FAIR data and service management is not limited to financial sustainability and funding; rather, it also includes legal, governance and technical issues that concern the scientific communities.</p><p>In this contribution, we present and discuss some of the main challenges that need to be urgently tackled in order to run and operate FAIR data services in the long-term, as also envisaged by the European Open Science Cloud initiative: a) sustainability of the IT solutions and resources to support practices for FAIR data management (i.e., PID usage and preservation, including costs for operating the associated IT services); b) re-usability, which on one hand requires clear and tested methods to manage heterogeneous metadata and provenance, while on the other hand can be considered a frontier research field; c) FAIR services provision, which presents many open questions related to the application of FAIR principles to services for data stewardship, and to services for the creation of data products taking in input FAIR raw data, for which is not clear how FAIRness compliancy of data products can be still guaranteed.</p>


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Dumitru Roman ◽  
Neal Reeves ◽  
Esteban Gonzalez ◽  
Irene Celino ◽  
Shady Abd El Kader ◽  
...  

PurposeCitizen Science – public participation in scientific projects – is becoming a global practice engaging volunteer participants, often non-scientists, with scientific research. Citizen Science is facing major challenges, such as quality and consistency, to reap open the full potential of its outputs and outcomes, including data, software and results. In this context, the principles put forth by Data Science and Open Science domains are essential for alleviating these challenges, which have been addressed at length in these domains. The purpose of this study is to explore the extent to which Citizen Science initiatives capitalise on Data Science and Open Science principles.Design/methodology/approachThe authors analysed 48 Citizen Science projects related to pollution and its effects. They compared each project against a set of Data Science and Open Science indicators, exploring how each project defines, collects, analyses and exploits data to present results and contribute to knowledge.FindingsThe results indicate several shortcomings with respect to commonly accepted Data Science principles, including lack of a clear definition of research problems and limited description of data management and analysis processes, and Open Science principles, including lack of the necessary contextual information for reusing project outcomes.Originality/valueIn the light of this analysis, the authors provide a set of guidelines and recommendations for better adoption of Data Science and Open Science principles in Citizen Science projects, and introduce a software tool to support this adoption, with a focus on preparation of data management plans in Citizen Science projects.


2021 ◽  
Author(s):  
Renato Alves ◽  
Dimitrios Bampalikis ◽  
Leyla Jael Castro ◽  
José María Fernández ◽  
Jennifer Harrow ◽  
...  

Data Management Plans are now considered a key element of Open Science. They describe the data management life cycle for the data to be collected, processed and/or generated within the lifetime of a particular project or activity. A Software Manag ement Plan (SMP) plays the same role but for software. Beyond its management perspective, the main advantage of an SMP is that it both provides clear context to the software that is being developed and raises awareness. Although there are a few SMPs already available, most of them require significant technical knowledge to be effectively used. ELIXIR has developed a low-barrier SMP, specifically tailored for life science researchers, aligned to the FAIR Research Software principles. Starting from the Four Recommendations for Open Source Software, the ELIXIR SMP was iteratively refined by surveying the practices of the community and incorporating the received feedback. Currently available as a survey, future plans of the ELIXIR SMP include a human- and machine-readable version, that can be automatically queried and connected to relevant tools and metrics within the ELIXIR Tools ecosystem and beyond.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 292
Author(s):  
Michael Hewera ◽  
Daniel Hänggi ◽  
Björn Gerlach ◽  
Ulf Dietrich Kahlert

Reports of non-replicable research demand new methods of research data management. Electronic laboratory notebooks (ELNs) are suggested as tools to improve the documentation of research data and make them universally accessible. In a self-guided approach, we introduced the open-source ELN eLabFTW into our lab group and, after using it for a while, think it is a useful tool to overcome hurdles in ELN introduction by providing a combination of properties making it suitable for small preclinical labs, like ours. We set up our instance of eLabFTW, without any further programming needed. Our efforts to embrace open data approach by introducing an ELN fits well with other institutional organized ELN initiatives in academic research.


Author(s):  
Marie Timmermann

Open Science aims to enhance the quality of research by making research and its outputs openly available, reproducible and accessible. Science Europe, the association of major Research Funding Organisations and Research Performing Organisations, advocates data sharing as one of the core aspects of Open Science and promotes a more harmonised approach to data sharing policies. Good research data management is a prerequisite for Open Science and data management policies should be aligned as much as possible, while taking into account discipline-specific differences. Research data management is a broad and complex field with many actors involved. It needs collective efforts by all actors to work towards aligned policies that foster Open Science.


2014 ◽  
Vol 9 (2) ◽  
pp. 17-27 ◽  
Author(s):  
Ritu Arora ◽  
Maria Esteva ◽  
Jessica Trelogan

The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC) resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.


2021 ◽  
Author(s):  
Nikolay Skvortsov

The principles known by FAIR abbreviation have been applied for different kinds of data management technologies to support data reuse. In particular, they are important for investigations and development in research infrastructures but applied in significantly different ways. These principles are recognized as prospective since, according to them, data in the context of reuse should be readable and actionable by both humans and machines. The review of solutions for data interoperability and reuse in research infrastructures is presented in the paper. It is shown that conceptual modeling based on formal domain specifications still has good potential for data reuse in research infrastructures. It allows to relate data, methods, and other resources semantically, classify and identify them in the domain, integrate and verify the correctness of data reuse. Infrastructures based on formal domain modeling can make heterogeneous data management and research significantly more effective and automated.


Sign in / Sign up

Export Citation Format

Share Document