scholarly journals RavenR v2.1.4: an open source R package to support flexible hydrologic modelling

2021 ◽  
Author(s):  
Robert Chlumsky ◽  
James R. Craig ◽  
Simon G. M. Lin ◽  
Sarah Grass ◽  
Leland Scantlebury ◽  
...  

Abstract. In recent decades, advances in the flexibility and complexity of hydrologic models has enhanced their utility in scientific studies and practice alike. However, the increasing complexity of these tools leads to a number of challenges, including steep learning curves for new users and in the reproducibility of modelling studies. Here, we present the RavenR package, an R package that leverages the power of scripting to both enhance the usability of the Raven hydrologic modelling framework and provide complimentary analyses that are useful for modellers. The RavenR package contains functions that may be useful in each step of the model-building process, particularly for preparing input files and analyzing model outputs, and these tools may be useful even for non-Raven users. The utility of the RavenR package is demonstrated with the presentation of six use cases for a model of the Liard River basin in Canada. These use cases provide examples of visually reviewing the model configuration, preparing input files for observation and forcing data, simplifying the model discretization, performing reality checks on the model output, and evaluating the performance of the model. All of the use cases are fully reproducible, with additional reproducible examples of RavenR functions included with the package distribution itself. It is anticipated that the RavenR package will continue to evolve with the Raven project, and will provide a useful tool to new and experienced users of Raven alike.

Author(s):  
Zhiguo Zeng ◽  
Tasneem Bani-Mustafa ◽  
Roger Flage ◽  
Enrico Zio

In this paper, we present an integrated framework for quantifying epistemic uncertainty in probabilistic risk assessment. Three types of epistemic uncertainty, that is, completeness, structural and parametric uncertainties, are considered. A maturity model is developed to evaluate the management of these epistemic uncertainties in the model building process. The impact of epistemic uncertainty on the result of the risk assessment is, then, estimated based on the developed maturity model. Then, an integrated risk index is defined to reflect the epistemic uncertainty in the risk assessment results. An indifference method is developed to evaluate the index based on the maturity of epistemic uncertainty management. A case study concerning a nuclear power plant is shown to demonstrate the applicability of the overall modelling framework.


2005 ◽  
Vol 62 (4) ◽  
pp. 760-770 ◽  
Author(s):  
Gordon D. Hastie ◽  
René J. Swift ◽  
George Slesser ◽  
Paul M. Thompson ◽  
William R. Turrell

Abstract Dolphin distributions have been related to a range of oceanographic determinants. The complex topography and hydrography of the Faroe-Shetland Channel have a significant influence on the distribution of many species. However, there is no published detail on how dolphin distributions there are influenced by either topography or hydrography. The study therefore aims to relate dolphin distributions in the Faroe-Shetland Channel to environmental variables, using a general additive modelling framework applied to passive acoustic survey data. Models were created using data from 2001, and were cross-validated to test their predictive power. Predictions were calculated at each stage in the model-building process, and were tested against data from 2002. The results suggest that water noise level, time of day, month, water depth, and surface temperature were significant influences on the probability of detecting dolphins acoustically during 2001. Furthermore, the model was a significant predictor of dolphin distribution in 2002. The model with the greatest predictive power included the terms water noise level, time of day, month, and water depth. The results provide information of potential use in understanding the determinants of dolphin distributions, and hopefully will help managers address concerns about the potential impacts on dolphins of anthropogenic activity.


Oecologia ◽  
2021 ◽  
Author(s):  
Peng He ◽  
Pierre-Olivier Montiglio ◽  
Marius Somveille ◽  
Mauricio Cantor ◽  
Damien R. Farine

AbstractBy shaping where individuals move, habitat configuration can fundamentally structure animal populations. Yet, we currently lack a framework for generating quantitative predictions about the role of habitat configuration in modulating population outcomes. To address this gap, we propose a modelling framework inspired by studies using networks to characterize habitat connectivity. We first define animal habitat networks, explain how they can integrate information about the different configurational features of animal habitats, and highlight the need for a bottom–up generative model that can depict realistic variations in habitat potential connectivity. Second, we describe a model for simulating animal habitat networks (available in the R package AnimalHabitatNetwork), and demonstrate its ability to generate alternative habitat configurations based on empirical data, which forms the basis for exploring the consequences of alternative habitat structures. Finally, we lay out three key research questions and demonstrate how our framework can address them. By simulating the spread of a pathogen within a population, we show how transmission properties can be impacted by both local potential connectivity and landscape-level characteristics of habitats. Our study highlights the importance of considering the underlying habitat configuration in studies linking social structure with population-level outcomes.


2021 ◽  
Author(s):  
Kor de Jong ◽  
Marc van Kreveld ◽  
Debabrata Panja ◽  
Oliver Schmitz ◽  
Derek Karssenberg

<p>Data availability at global scale is increasing exponentially. Although considerable challenges remain regarding the identification of model structure and parameters of continental scale hydrological models, we will soon reach the situation that global scale models could be defined at very high resolutions close to 100 m or less. One of the key challenges is how to make simulations of these ultra-high resolution models tractable ([1]).</p><p>Our research contributes by the development of a model building framework that is specifically designed to distribute calculations over multiple cluster nodes. This framework enables domain experts like hydrologists to develop their own large scale models, using a scripting language like Python, without the need to acquire the skills to develop low-level computer code for parallel and distributed computing.</p><p>We present the design and implementation of this software framework and illustrate its use with a prototype 100 m, 1 h continental scale hydrological model. Our modelling framework ensures that any model built with it is parallelized. This is made possible by providing the model builder with a set of building blocks of models, which are coded in such a manner that parallelization of calculations occurs within and across these building blocks, for any combination of building blocks. There is thus full flexibility on the side of the modeller, without losing performance.</p><p>This breakthrough is made possible by applying a novel approach to the implementation of the model building framework, called asynchronous many-tasks, provided by the HPX C++ software library ([3]). The code in the model building framework expresses spatial operations as large collections of interdependent tasks that can be executed efficiently on individual laptops as well as computer clusters ([2]). Our framework currently includes the most essential operations for building large scale hydrological models, including those for simulating transport of material through a flow direction network. By combining these operations, we rebuilt an existing 100 m, 1 h resolution model, thus far used for simulations of small catchments, requiring limited coding as we only had to replace the computational back end of the existing model. Runs at continental scale on a computer cluster show acceptable strong and weak scaling providing a strong indication that global simulations at this resolution will soon be possible, technically speaking.</p><p>Future work will focus on extending the set of modelling operations and adding scalable I/O, after which existing models that are currently limited in their ability to use the computational resources available to them can be ported to this new environment.</p><p>More information about our modelling framework is at https://lue.computationalgeography.org.</p><p><strong>References</strong></p><p>[1] M. Bierkens. Global hydrology 2015: State, trends, and directions. Water Resources Research, 51(7):4923–4947, 2015.<br>[2] K. de Jong, et al. An environmental modelling framework based on asynchronous many-tasks: scalability and usability. Submitted.<br>[3] H. Kaiser, et al. HPX - The C++ standard library for parallelism and concurrency. Journal of Open Source Software, 5(53):2352, 2020.</p>


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10849
Author(s):  
Maximilian Knoll ◽  
Jennifer Furkel ◽  
Juergen Debus ◽  
Amir Abdollahi

Background Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5–15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups. Methods Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2–10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation. Results V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3–10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54–1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28–1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59–1.00) for V1 and 0.54 (range 0.32–0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method. Conclusions The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR).


Author(s):  
Pedro M. Esperança ◽  
Dari F. Da ◽  
Ben Lambert ◽  
Roch K. Dabiré ◽  
Thomas S. Churcher

AbstractNear infrared spectroscopy is increasingly being used as an economical method to monitor mosquito vector populations in support of disease control. Despite this rise in popularity, strong geographical variation in spectra has proven an issue for generalising predictions from one location to another. Here, we use a functional data analysis approach—which models spectra as smooth curves rather than as a discrete set of points—to develop a method that is robust to geographic heterogeneity. Specifically, we use a penalised generalised linear modelling framework which includes efficient functional representation of spectra, spectral smoothing and regularisation. To ensure better generalisation of model predictions from one training set to another, we use cross-validation procedures favouring smoother representation of spectra. To illustrate the performance of our approach, we collected spectra for field-caught specimens of Anopheles gambiae complex mosquitoes – the most epidemiologically important vector species on the planet – in two sites in Burkina Faso. Using these spectra, we show how models trained on data from one site can successfully classify morphologically identical sibling species in another site, over 250km away. Whilst we apply our framework to species prediction, our unified statistical framework can, alternatively, handle regression analysis (for example, to determine mosquito age) and other types of multinomial classification (for example, to determine infection status). To make our methods readily available for field entomologists, we have created an open-source R package mlevcm. All data used is publicly also available.


2018 ◽  
Vol 18 (16) ◽  
pp. 12551-12580 ◽  
Author(s):  
Cheng Chen ◽  
Oleg Dubovik ◽  
Daven K. Henze ◽  
Tatyana Lapyonak ◽  
Mian Chin ◽  
...  

Abstract. Understanding the role atmospheric aerosols play in the Earth–atmosphere system is limited by uncertainties in the knowledge of their distribution, composition and sources. In this paper, we use the GEOS-Chem based inverse modelling framework for retrieving desert dust (DD), black carbon (BC) and organic carbon (OC) aerosol emissions simultaneously. Aerosol optical depth (AOD) and aerosol absorption optical depth (AAOD) retrieved from the multi-angular and polarimetric POLDER/PARASOL measurements generated by the GRASP algorithm (hereafter PARASOL/GRASP) have been assimilated. First, the inversion framework is validated in a series of numerical tests conducted with synthetic PARASOL-like data. These tests show that the framework allows for retrieval of the distribution and strength of aerosol emissions. The uncertainty of retrieved daily emissions in error free conditions is below 25.8 % for DD, 5.9 % for BC and 26.9 % for OC. In addition, the BC emission retrieval is sensitive to BC refractive index, which could produce an additional factor of 1.8 differences for total BC emissions. The approach is then applied to 1 year (December 2007 to November 2008) of data over the African and Arabian Peninsula region using PARASOL/GRASP spectral AOD and AAOD at six wavelengths (443, 490, 565, 670, 865 and 1020 nm). Analysis of the resulting retrieved emissions indicates 1.8 times overestimation of the prior DD online mobilization and entrainment model. For total BC and OC, the retrieved emissions show a significant increase of 209.9 %–271.8 % in comparison to the prior carbonaceous aerosol emissions. The model posterior simulation with retrieved emissions shows good agreement with both the AOD and AAOD PARASOL/GRASP products used in the inversion. The fidelity of the results is evaluated by comparison of posterior simulations with measurements from AERONET that are completely independent measurements and more temporally frequent than PARASOL observations. To further test the robustness of our posterior emissions constrained using PARASOL/GRASP, the posterior emissions are implemented in the GEOS-5/GOCART model and the consistency of simulated AOD and AAOD with other independent measurements (MODIS and OMI) demonstrates promise in applying this database for modelling studies.


2021 ◽  
Author(s):  
Sophia Eugeni ◽  
Eric Vaags ◽  
Steven V. Weijs

&lt;p&gt;Accurate hydrologic modelling is critical to effective water resource management. As catchment attributes strongly influence the hydrologic behaviors in an area, they can be used to inform hydrologic models to better predict the discharge in a basin. Some basins may be more difficult to accurately predict than others. The difficulty in predicting discharge may also be related to the complexity of the discharge signal. The study establishes the relationship between a catchment&amp;#8217;s static attributes and hydrologic model performance in those catchments, and also investigates the link to complexity, which we quantify with measures of compressibility based in information theory.&amp;#160;&lt;/p&gt;&lt;p&gt;The project analyzes a large national dataset, comprised of catchment attributes for basins across the United States, paired with established performance metrics for corresponding hydrologic models. Principal Component Analysis (PCA) was completed on the catchment attributes data to determine the strongest modes in the input. The basins were clustered according to their catchment attributes and the performance within the clusters was compared.&amp;#160;&lt;/p&gt;&lt;p&gt;Significant differences in model performance emerged between the clusters of basins. For the complexity analysis, details of the implementation and technical challenges will be discussed, as well as preliminary results.&lt;/p&gt;


Author(s):  
Anthony Federico ◽  
Stefano Monti

Abstract Summary Geneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases. Availability and implementation The most recent version of the package is available at https://github.com/montilab/hypeR. Contact [email protected] or [email protected]


Sign in / Sign up

Export Citation Format

Share Document