An integrative framework for data-driven investigation of environmental systems

Author(s):  
Daniel Eggert ◽  
Doris Dransch

<p>Environmental scientists aim at understanding not only single components but systems, one example is the flood system; scientists investigate the conditions, drivers and effects of flood events and the relations between them. Investigating environmental systems with a data-driven research approach requires linking a variety of data, analytical methods, and derived results.</p><p><br>Several obstacles exist in the recent scientific work environment that hinder scientists to easily create these links. They are distributed and heterogeneous data sets, separated analytical tools, discontinuous analytical workflows, as well as isolated views to data and data products. We address these obstacles with the exception of distributed and heterogeneous data since this is part of other ongoing initiatives.</p><p><br>Our goal is to develop a framework supporting the data-driven investigation of environmental systems. First we integrate separated analytical tools and methods by the means of a component-based software framework. Furthermore we allow for seamless and continuous  analytical workflows by applying the concept of digital workflows, which also demands the aforementioned integration of separated tools and methods. Finally we provide integrated views of data and data products by interactive visual interfaces with multiple linked views. The combination of these three concepts from computer science allows us to create a digital research environment that enable scientists to create the initially mentioned links in a flexible way. We developed a generic concept for our approach, implemented a corresponding framework and finally applied both to realize a “Flood Event Explorer” prototype supporting the comprehensive investigation of a flood system.</p><p><br>In order to implement a digital workflow our approach intends to precisely define the workflow’s requirements. We mostly do this by conducting informal interviews with the domain scientists. The defined requirements also include the needed analytical tools and methods, as well as the utilized data and data products. For technically integrating the needed tools and methods our created software framework provides a modularization approach based on a messaging system. This allows us to create custom modules or wrap existing implementations and tools. The messaging system (e.g. pulsar) then connects these individual modules. This enables us to combine multiple methods and tools into a seamless digital workflow. The described approach of course demands the proper definition of interfaces to modules and data sources. Finally our software framework provides multiple generic visual front-end components (e.g. tables, maps and charts) to create interactive linked views supporting the visual analysis of the workflow’s data.</p>

2012 ◽  
Vol 34 ◽  
pp. 30-43 ◽  
Author(s):  
A. Castelletti ◽  
S. Galelli ◽  
M. Restelli ◽  
R. Soncini-Sessa

2019 ◽  
Author(s):  
Valentin Resseguier ◽  
Wei Pan ◽  
Baylor Fox-Kemper

Abstract. Stochastic subgrid parameterizations enable ensemble forecasts of fluid dynamics systems and ultimately accurate data assimilation. Stochastic Advection by Lie Transport (SALT) and models under Location Uncertainty (LU) are recent and similar physically-based stochastic schemes. SALT dynamics conserve helicity whereas LU models conserve kinetic energy. After highlighting general similarities between LU and SALT frameworks, this paper focuses on their common challenge: the parameterization choice. We compare uncertainty quantification skills of a stationary heterogeneous data-driven parameterization and a non-stationary homogeneous self-similar parameterization. For stationary, homogeneous Surface Quasi-Geostrophic (SQG) turbulence, both parameterizations lead to high quality ensemble forecasts. This paper also discusses a heterogeneous adaptation of the homogeneous parameterization targeted at better simulation of strong straight buoyancy fronts.


Author(s):  
Nina Vyatkina

Data-Driven Learning (DDL), or a corpus-based method of language teaching and learning, has been developing rapidly since the turn of the century and has been shown to be effective and efficient. Nevertheless, DDL is still not widely used in regular classrooms for a number of reasons. One of them is that few workable pedagogical frameworks have been suggested for integrating DDL into language courses and curricula. This chapter describes an exemplar of a practical application of such a pedagogical framework to a high-intermediate university-level German as a foreign language course with a significant DDL component. The Design-Based Research approach is used as the main methodological framework. The chapter concludes with a discussion of wider pedagogical implications.


Author(s):  
Yvan Le Bras ◽  
Aurélie Delavaud ◽  
Dominique Pelletier ◽  
Jean-Baptiste Mihoub

Most biodiversity research aims at understanding the states and dynamics of biodiversity and ecosystems. To do so, biodiversity research increasingly relies on the use of digital products and services such as raw data archiving systems (e.g. structured databases or data repositories), ready-to-use datasets (e.g. cleaned and harmonized files with normalized measurements or computed trends) as well as associated analytical tools (e.g. model scripts in Github). Several world-wide initiatives facilitate the open access to biodiversity data, such as the Global Biodiversity Information Facility (GBIF) or GenBank, Predicts etc. Although these pave the way towards major advances in biodiversity research, they also typically deliver data products that are sometimes poorly informative as they fail to capture the genuine ecological information they intend to grasp. In other words, access to ready-to-use aggregated data products may sacrifice ecological relevance for data harmonization, resulting in over-simplified, ill-advised standard formats. This is singularly true when the main challenge is to match complementary data (large diversity of measured variables, integration of different levels of life organizations etc.) collected with different requirements and scattered in multiple databases. Improving access to raw data, and meaningful detailed metadata and analytical tools associated with standardized workflows is critical to maintain and maximize the generic relevance of ecological data. Consequently, advancing the design of digital products and services is essential for interoperability while also enhancing reproducibility and transparency in biodiversity research. To go further, a minimal common framework organizing biodiversity observation and data organization is needed. In this regard, the Essential Biodiversity Variable (EBV) concept might be a powerful way to boost progress toward this goal as well as to connect research communities worldwide. As a national Biodiversity Observation Network (BON) node, the French BON is currently embodied by a national research e-infrastructure called "Pôle national de données de biodiversité" (PNDB, formerly ECOSCOPE), aimed at simultaneously empowering the quality of scientific activities and promoting networking within the scientific community at a national level. Through the PNDB, the French BON is working on developing biodiversity data workflows oriented toward end services and products, both from and for a research perspective. More precisely, the two pillars of the PNDB are a metadata portal and a workflow-oriented web platform dedicated to the access of biodiversity data and associated analytical tools (Galaxy-E). After four years of experience, we are now going deeper into metadata specification, dataset descriptions and data structuring through the extensive use of Ecological Metadata Language (EML) as a pivot format. Moreover, we evaluate the relevance of existing tools such as Metacat/Morpho and DEIMS-SDR (Dynamic Ecological Information Management System - Site and dataset registry) in order to ensure a link with other initiatives like Environmental Data Initiative, DataOne and Long-Term Ecological Research related observation networks. Regarding data analysis, an open-source Galaxy-E platform was launched in 2017 as part of a project targeting the design of a citizen science observation system in France (“65 Millions d'observateurs”). Here, we propose to showcase ongoing French activities towards global challenges related to biodiversity information and knowledge dissemination. We particularly emphasize our focus on embracing the FAIR (findable, accessible, interoperable and reusable) data principles Wilkinson et al. 2016 across the development of the French BON e-infrastructure and the promising links we anticipate for operationalizing EBVs. Using accessible and transparent analytical tools, we present the first online platform allowing the performance of advanced yet user-friendly analyses of biodiversity data in a reproducible and shareable way using data from various data sources, such as GBIF, Atlas of Living Australia (ALA), eBIRD, iNaturalist and environmental data such as climate data.


Author(s):  
James Yao ◽  
John Wang ◽  
Qiyang Chen ◽  
Ruben Xing

Data warehouse is a system which can integrate heterogeneous data sources to support the decision making process. Data warehouse design is a lengthy, time-consuming, and costly process. There has been a high failure in data warehouse development projects. Thus how to design and develop a data warehouse have become important issues for information systems designers and developers. This paper reviews and discusses some of the core data warehouse design and development methodologies in information system development. The paper presents in particular the most recent and much heated hybrid approach which is a combination of data-driven and requirement-driven approaches.


2019 ◽  
Vol 21 (4) ◽  
pp. 1182-1195
Author(s):  
Andrew C Liu ◽  
Krishna Patel ◽  
Ramya Dhatri Vunikili ◽  
Kipp W Johnson ◽  
Fahad Abdu ◽  
...  

Abstract Sepsis is a series of clinical syndromes caused by the immunological response to infection. The clinical evidence for sepsis could typically attribute to bacterial infection or bacterial endotoxins, but infections due to viruses, fungi or parasites could also lead to sepsis. Regardless of the etiology, rapid clinical deterioration, prolonged stay in intensive care units and high risk for mortality correlate with the incidence of sepsis. Despite its prevalence and morbidity, improvement in sepsis outcomes has remained limited. In this comprehensive review, we summarize the current landscape of risk estimation, diagnosis, treatment and prognosis strategies in the setting of sepsis and discuss future challenges. We argue that the advent of modern technologies such as in-depth molecular profiling, biomedical big data and machine intelligence methods will augment the treatment and prevention of sepsis. The volume, variety, veracity and velocity of heterogeneous data generated as part of healthcare delivery and recent advances in biotechnology-driven therapeutics and companion diagnostics may provide a new wave of approaches to identify the most at-risk sepsis patients and reduce the symptom burden in patients within shorter turnaround times. Developing novel therapies by leveraging modern drug discovery strategies including computational drug repositioning, cell and gene-therapy, clustered regularly interspaced short palindromic repeats -based genetic editing systems, immunotherapy, microbiome restoration, nanomaterial-based therapy and phage therapy may help to develop treatments to target sepsis. We also provide empirical evidence for potential new sepsis targets including FER and STARD3NL. Implementing data-driven methods that use real-time collection and analysis of clinical variables to trace, track and treat sepsis-related adverse outcomes will be key. Understanding the root and route of sepsis and its comorbid conditions that complicate treatment outcomes and lead to organ dysfunction may help to facilitate identification of most at-risk patients and prevent further deterioration. To conclude, leveraging the advances in precision medicine, biomedical data science and translational bioinformatics approaches may help to develop better strategies to diagnose and treat sepsis in the next decade.


2020 ◽  
pp. 107780042097874
Author(s):  
Vivien Sommer

Digital technology has made it easier for researchers to conduct and produce multimodal data. In terms of a social semiotic understanding, multimodal means that data are produced from different sign resources, such as field protocols combined with visual recordings or document analysis consisting of audiovisual material. The increase in multimodal data brings the challenge of developing analytical tools not only to collect data but also to examine them. In this article, I introduce a research approach for how to integrate multimodal data within the framework of grounded theory by extending the coding process with a social semiotic understanding of data as a combination of different sign modes. This approach makes it possible not only to analyze data based on different modes separately but also to analyze their combination, for example, the interweaving of text and image.


2020 ◽  
Vol 12 (10) ◽  
pp. 4246 ◽  
Author(s):  
David Pastor-Escuredo ◽  
Yolanda Torres ◽  
María Martínez-Torres ◽  
Pedro J. Zufiria

Natural disasters affect hundreds of millions of people worldwide every year. The impact assessment of a disaster is key to improve the response and mitigate how a natural hazard turns into a social disaster. An actionable quantification of impact must be integratively multi-dimensional. We propose a rapid impact assessment framework that comprises detailed geographical and temporal landmarks as well as the potential socio-economic magnitude of the disaster based on heterogeneous data sources: Environment sensor data, social media, remote sensing, digital topography, and mobile phone data. As dynamics of floods greatly vary depending on their causes, the framework may support different phases of decision-making during the disaster management cycle. To evaluate its usability and scope, we explored four flooding cases with variable conditions. The results show that social media proxies provide a robust identification with daily granularity even when rainfall detectors fail. The detection also provides information of the magnitude of the flood, which is potentially useful for planning. Network analysis was applied to the social media to extract patterns of social effects after the flood. This analysis showed significant variability in the obtained proxies, which encourages the scaling of schemes to comparatively characterize patterns across many floods with different contexts and cultural factors. This framework is presented as a module of a larger data-driven system designed to be the basis for responsive and more resilient systems in urban and rural areas. The impact-driven approach presented may facilitate public–private collaboration and data sharing by providing real-time evidence with aggregated data to support the requests of private data with higher granularity, which is the current most important limitation in implementing fully data-driven systems for disaster response from both local and international actors.


Sign in / Sign up

Export Citation Format

Share Document