NFDI4Earth – addressing the digital needs of Earth System Sciences

2021 ◽  
Author(s):  
Peter Braesicke ◽  
Jörg Seegert ◽  
Hannes Thiemann ◽  
Lars Bernard

<p>NFDI4Earth addresses the digital needs of Earth System (ES) Sciences (ESS) in Germany. ES scientists cooperate in international and interdisciplinary networks with the overarching aim to understand the functioning of and interactions within the ES and address the multiple challenges of global change.</p> <p>NFDI4Earth is a community-driven process providing researchers with access to FAIR, coherent, and open ES data, innovative research data management (RDM) and data science methods. The NFDI4Earth work plan comprises four task areas (TA):</p> <p>TA1 2Participate will engage with the ESS community and secures that NFDI4Earth is driven by user requirements: Pilots, small agile projects proposed by the community leverage existing technologies and manifest the researchers’ RDM needs. The Incubator Lab identifies promising new tools and scouts for trends in ES Data Science. The EduHubs produce open, ready to use educational resources on implementing FAIR principles in the ESS. The Academy will connect young researchers and their data-driven research to NFDI4Earth.</p> <p>TA2 2Facilitate realizes the OneStop4All as the web-based entry point to FAIR, open and innovative RDM in ESS. It supports users on how to find, access, share, publish and work with ES data. Specific user requests beyond the scope of the OneStop4All will be routed to a distributed User Support Network. TA2 will also unlock the wealth of data that exists in governmental data repositories and will collaborate with all services on supporting long-term archiving.</p> <p>TA3 2Interoperate aims at interoperability and coherence of the heterogeneous, segmented range of ESS RDM services. The ecosystems of ESS (meta-)data and software repositories, data science services and collaboration platforms will be synthesised. Based on common standards, TA3 provides consistent methods for a self-evaluation of RDM offerings. TA3 works on NFDI cross-cutting topics, provides a Living Handbook and ensures co-operation with international RDM initiatives and standardisation bodies.</p> <p>TA4 2Coordinate facilitates the overall management of the NFDI4Earth consortium. TA4 acts as central support service and coordination of the technical implementations. It also offers virtual research environments. The NFDI4Earth Coordination Office will support the NFDI4Earth community in day-to-day operations and acts as the NFDI4Earth point of contact. It develops a commonly agreed model for a sustainable operation of NFDI4Earth.</p> <p>The NFDI4Earth governance aims for an open and inclusive development of the NFDI4Earth services. As one example, so-called interest groups can be initiated by the NFDI4Earth community to explore individual topics in greater depth and provide input and feedback to the NFDI4Earth developments. Moreover, as a community we will work on a commonly accepted NFDI4Earth FAIRness and Openness Commitment that is key to fostering a cultural change towards FAIR and Open RDM for all.</p>

Author(s):  
Ladjel Bellatreche ◽  
Carlos Ordonez ◽  
Dominique Méry ◽  
Matteo Golfarelli ◽  
El Hassan Abdelwahed

2021 ◽  
Vol 8 ◽  
Author(s):  
Maria-Theresia Verwega ◽  
Carola Trahms ◽  
Avan N. Antia ◽  
Thorsten Dickhaus ◽  
Enno Prigge ◽  
...  

Earth System Sciences have been generating increasingly larger amounts of heterogeneous data in recent years. We identify the need to combine Earth System Sciences with Data Sciences, and give our perspective on how this could be accomplished within the sub-field of Marine Sciences. Marine data hold abundant information and insights that Data Science techniques can reveal. There is high demand and potential to combine skills and knowledge from Marine and Data Sciences to best take advantage of the vast amount of marine data. This can be accomplished by establishing Marine Data Science as a new research discipline. Marine Data Science is an interface science that applies Data Science tools to extract information, knowledge, and insights from the exponentially increasing body of marine data. Marine Data Scientists need to be trained Data Scientists with a broad basic understanding of Marine Sciences and expertise in knowledge transfer. Marine Data Science doctoral researchers need targeted training for these specific skills, a crucial component of which is co-supervision from both parental sciences. They also might face challenges of scientific recognition and lack of an established academic career path. In this paper, we, Marine and Data Scientists at different stages of their academic career, present perspectives to define Marine Data Science as a distinct discipline. We draw on experiences of a Doctoral Research School, MarDATA, dedicated to training a cohort of early career Marine Data Scientists. We characterize the methods of Marine Data Science as a toolbox including skills from their two parental sciences. All of these aim to analyze and interpret marine data, which build the foundation of Marine Data Science.


2018 ◽  
Author(s):  
Hamid Bagher ◽  
Usha Muppiral ◽  
Andrew J Severin ◽  
Hridesh Rajan

AbstractBackgroundCreating a computational infrastructure to analyze the wealth of information contained in data repositories that scales well is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared Data Science Infrastructures like Boa can be used to more efficiently process and parse data contained in large data repositories. The main features of Boa are inspired from existing languages for data intensive computing and can easily integrate data from biological data repositories.ResultsHere, we present an implementation of Boa for Genomic research (BoaG) on a relatively small data repository: RefSeq’s 97,716 annotation (GFF) and assembly (FASTA) files and metadata. We used BoaG to query the entire RefSeq dataset and gain insight into the RefSeq genome assemblies and gene model annotations and show that assembly quality using the same assembler varies depending on species.ConclusionsIn order to keep pace with our ability to produce biological data, innovative methods are required. The Shared Data Science Infrastructure, BoaG, can provide greater access to researchers to efficiently explore data in ways previously not possible for anyone but the most well funded research groups. We demonstrate the efficiency of BoaG to explore the RefSeq database of genome assemblies and annotations to identify interesting features of gene annotation as a proof of concept for much larger datasets.


2020 ◽  
Vol 6 ◽  
Author(s):  
Christoph Steinbeck ◽  
Oliver Koepler ◽  
Felix Bach ◽  
Sonja Herres-Pawlis ◽  
Nicole Jung ◽  
...  

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.


2021 ◽  
Author(s):  
Ronald R. Gutierrez ◽  
Frank E. Escusa ◽  
Alice Lefebvre ◽  
Carlo Gualtieri ◽  
Francisco Nunez-Gonzalez ◽  
...  

<p>Open and data-driven paradigms have allowed to answer fundamental scientific questions in different disciplines such as astronomy, ecology and fluid mechanics, among others. Recently, the need to collaboratively build a large, engineered and freely accessible bed form database has been highlighted as a necessary step to adopt these paradigms in bed form dynamics research.</p><p>Most large database architectures have followed the principles of relational databases model solutions (RDBMS). Recently, non-relational (NoSQL) architectures (e.g., key-value store, graph databases, document-oriented, etc.) have been proposed to improve the capabilities and flexibility of RDBMS. Both RDBMS and NoSQL architectures require designing an engineered metadata structure to define the data taxonomy and structure, which are subsequently used to develop a metadata language for data querying. Past research suggests that the development of a metadata language needs a collaborative and iterative approach.</p><p>Defining the data taxonomy and structure for bed form data may be challenging because: [1] there is not a standardized protocol for conducting field and laboratory measurements; [2] it is expected that existing bed form data have a wide spectrum of data characteristics (e.g. length, format, resolution, structured or non-structured, etc.); and [3] bedforms are studied by scientists and engineers from different disciplines (e.g., geologists, ecologists, civil and water engineers, etc.).</p><p>In recent years, several data repositories have been built to manage large datasets related to the Earth System. One of these repositories is the Earth Science Information Partners, which has proposed standards to promote and improve the preservation, availability and overall quality of Earth System related data. These standards map the roles of participants (e.g., creators, intermediaries and end users) and delivers protocols to ensure proper data distribution and quality control.</p><p>This contribution presents the first iteration of a metadata language for subaqueous bed form data, named BedformsML0, which adopts the standards of the Earth Science Information Partners. BedformsML0 may serve as a prototype to describe bed form observations from field and laboratory measurements, model outputs, technical reports, scientific papers, post processed data, etc. Biogeoenvironmental observations associated to bed form dynamics (e.g., hydrodynamics, turbulence, river and coastal morphology, biota density, habitat metrics, sediment transport, sediment properties, land use dynamics, etc.) may also be represented in BedformsML0. It could subsequently be improved in future iterations via the collaboration of professionals from different Earth science fields to also describe subaerial, and extraterrestrial bed form data. Likewise, BedformsML0 can be used as machine search query selection for massive data processing and visualization of bed form observations. </p>


2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Renata Curty

RESUMO As diretivas governamentais e institucionais em torno do compartilhamento de dados de pesquisas financiadas com dinheiro público têm impulsionado a rápida expansão de repositórios digitais de dados afim de disponibilizar esses ativos científicos para reutilização, com propósitos nem sempre antecipados, pelos pesquisadores que os produziram/coletaram. De modo contraditório, embora o argumento em torno do compartilhamento de dados seja fortemente sustentado no potencial de reúso e em suas consequentes contribuições para o avanço científico, esse tema permanece acessório às discussões em torno da ciência de dados e da ciência aberta. O presente artigo de revisão narrativa tem por objetivo lançar um olhar mais atento ao reúso de dados e explorar mais diretamente esse conceito, ao passo que propõe uma classificação inicial de cinco abordagens distintas para o reúso de dados de pesquisa (reaproveitamento, agregação, integração, metanálise e reanálise), com base em situações hipotéticas acompanhadas de casos de reúso de dados publicados na literatura científica. Também explora questões determinantes para a condição de reúso, relacionando a reusabilidade à qualidade da documentação que acompanha os dados. Oferece discussão sobre os desafios da documentação de dados, bem como algumas iniciativas e recomendações para que essas dificuldades sejam contornadas. Espera-se que os argumentos apresentados contribuam não somente para o avanço conceitual em torno do reúso e da reusabilidade de dados, mas também reverberem em ações relacionadas à documentação dos dados de modo a incrementar o potencial de reúso desses ativos científicos.Palavras-chave: Reúso de Dados; Reprodutibilidade Científica; Reusabilidade; Ciência Aberta; Dados de Pesquisa. ABSTRACT The availability of scientific assets through data repositories has been greatly increased as a result of government and institutional data sharing policies and mandates for publicly funded research, allowing data to be reused for purposes not always anticipated by primary researchers. Despite the fact that the argument favoring data sharing is strongly grounded in the possibilities of data reuse and its contributions to scientific advancement, this subject remains unobserved in discussions about data science and open science. This paper follows a narrative review method to take a closer look at data reuse in order to better conceptualize this term, while proposing an early classification of five distinct data reuse approaches (repurposing, aggregation, integration, meta-analysis and reanalysis) based on hypothetical cases and literature examples. It also explores the determinants of what constitutes reusable data, and the relationship between data reusability and documentation quality. It presents some challenges associated with data documentation and points out some initiatives and recommendations to overcome such problems. It expects to contribute not only for the conceptual advancement around the reusability and effective reuse of the data, but also to result in initiatives related to data documentation in order to increase the reuse potential of these scientific assets.Keywords:Data Reuse; Scientific Reproducibility; Reusability; Open Science; Research Data.  


Author(s):  
Lea Meier ◽  
Kevin Tippenhauer ◽  
Murat Sariyar

Multiple challenges await third-party digital health services when trying to enter the health market. Prominent examples of such services are clinical decision support systems provided as external software. Uncertainty about their challenges, technical as well as legal, pose serious hurdles for many innovations to be adopted early on. There are many options and trade-offs to provide digital healthcare solutions as a third-party service. This paper discusses them by referring to a pharmacogenetic decision support service. By providing best-practices, scenario descriptions and templates designed for third-party services with respect to legal and technical issues, obstacles and uncertainties can be reduced, which will have an impact on better diagnoses and treatments in the healthcare system.


2020 ◽  
Vol 166 ◽  
pp. 09004
Author(s):  
Richard Tomlins ◽  
Helen Cuthill ◽  
Alan Richards ◽  
Arun Sukumar ◽  
Oksana Malynka

This article reflects on the development of a creative economy training product and toolkit developed by Coventry University with SEBRAE (the Brasilian Micro and Small Business Support Service) and funded by British Council. It was devised following two weeks creative economy scoping visits in autumn 2017 in Brasil. The scoping visits identified the need for a fun and “disruptive” business planning experience leading to rapid prototyping which would allow new creative economy ideas to be brought to market at low development cost – “Sprint”. A one day micro Sprint was tested in four locations in Brazil to excellent feedback in late 2017. The client subsequently requested a three day version of the methodology to invest more time in the cultural change of the creative entrepreneur and the development of an associated toolkit. However, this Sprint has subsequently also been rolled out in a super condensed 3 hour version piloting in 2019 and 2020 in Ukraine through British Council Creative Spark programmes. The toolkit offers skills and techniques to train creative entrepreneurs and their mentors in enabling the growth of the creative economy in their communities. This paper predominantly focuses on the implementation of the client commissioned three day Sprint.


2021 ◽  
Author(s):  
Chunlei Tang ◽  
Li Zhou ◽  
Joseph Plasek ◽  
Yangyong Zhu ◽  
Yajun Huang ◽  
...  

UNSTRUCTURED Electronic patient data are critical to clinical and translational science, and research patient data repositories (RPDRs) are a central resource for any work in biomedical data science. However, the data science ecosystem, due to its inherently transdisciplinary nature, poses challenges to existing RPDRs and demands expansions and new developments, calling for a wide variety of new functions and capabilities in the administrative, educational, and organizational domains. The power of data science in the business realm is tremendous. In business, it is already viewed as a critical resource, and this will likely occur in healthcare as well. This perspective focuses on best practices in developing RPDRs, and identifies areas which we believe have not received enough attention. These include deployment, contribution calculation, internal talent marketplaces, data partnerships, data sovereigns’ new capital assets, and cross-border data sharing.


Metabolomics ◽  
2019 ◽  
Vol 15 (10) ◽  
Author(s):  
Kevin M. Mendez ◽  
Leighton Pritchard ◽  
Stacey N. Reinke ◽  
David I. Broadhurst

Abstract Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.


Sign in / Sign up

Export Citation Format

Share Document