Issues of diffuse pollution model complexity arising from performance benchmarking

Abstract. Flow and nitrate dynamics were simulated in two catchments, the River Aire in northern England and the River Ythan in north-east Scotland. In the case of the Aire, a diffuse pollution model was coupled with a river quality model (CASCADE-QUESTOR); in the study of the Ythan, an integrated model (SWAT) was used. In each study, model performance was evaluated for differing levels of spatial representation in input data sets (rainfall, soils and land use). In respect of nitrate concentrations, the performance of the models was compared with that of a regression model based on proportions of land cover. The overall objective was to assess the merits of spatially distributed input data sets. In both catchments, specific measures of quantitative performance showed that models using the most detailed available input data contributed, at best, only a marginal improvement over simpler implementations. Hence, the level of complexity used in input data sets has to be determined, not only on multiple criteria of quantitative performance but also on qualitative assessments, reflecting the specific context of the model application and the current and likely future needs of end-users.

Download Full-text

Extending a Large-scale Model to Better Represent Water Resources without Increasing the Model Complexity

10.20944/preprints202109.0195.v1 ◽

2021 ◽

Author(s):

Robyn Horan ◽

Nathan J. Rickards ◽

Alexandra Kaelin ◽

Helen E. Baron ◽

Thomas Thomas ◽

...

Keyword(s):

Water Resources ◽

Input Data ◽

Large Scale ◽

Model Performance ◽

Model Complexity ◽

Scale Model ◽

Groundwater Abstraction ◽

Water Balance Components ◽

Groundwater Levels ◽

Hydrological System

A robust hydrological assessment is challenging in regions where human interference, within all aspects of the hydrological system, significantly alters the flow regime of rivers. The challenge was to extend a large-scale water resources model, GWAVA, to better represent water resources without increasing the model complexity. A groundwater and a regulated reservoir routine were incorporated into GWAVA using modifications of the existing AMBHAS-1D and Hanasaki methodologies, respectively. The groundwater routine can be varied in complexity when sufficient input data is available but fundamentally is driven by three input parameters. The reservoir routine was extended to account for the presence of large, regulated reservoirs using two calibratable parameters. The additional groundwater processes and reservoir regulation was tested in two highly anthropogenically influenced basins in India: the Cauvery and Narmada. The inclusion of the revised groundwater routine improved the simulation of streamflow in the headwater catchments and was successful in improving the representation of the baseflow component. In addition, the model was able to produce a time series of daily groundwater levels, recharge to groundwater and groundwater abstraction. The regulated reservoir routine improved the simulation of streamflow in catchments downstream of major reservoirs, where the streamflow was largely reflective of reservoir releases, when calibrated using downstream observed streamflow records. The model was able to provide a more robust representation of the annual volume and daily outflow released from the major reservoirs and simulate the major reservoir storages adequately. The addition of one-dimensional groundwater processes and a regulated reservoir routine proved successful in improving the model performance and traceability of water balance components, without excessively increasing the model complexity and input data requirements.

Download Full-text

An introduction to distributed training of deep neural networks for segmentation tasks with large seismic data sets

Geophysics ◽

10.1190/geo2021-0130.1 ◽

2021 ◽

Vol 86 (6) ◽

pp. KS151-KS160

Author(s):

Claire Birnie ◽

Haithem Jarraya ◽

Fredrik Hansteen

Keyword(s):

Neural Networks ◽

Input Data ◽

Model Performance ◽

Microseismic Monitoring ◽

Training Data ◽

Data Sets ◽

Distributed Training ◽

Spatiotemporal Information ◽

Training Approach ◽

Data Generator

Deep learning applications are drastically progressing in seismic processing and interpretation tasks. However, most approaches subsample data volumes and restrict model sizes to minimize computational requirements. Subsampling the data risks losing vital spatiotemporal information which could aid training, whereas restricting model sizes can impact model performance, or in some extreme cases renders more complicated tasks such as segmentation impossible. We have determined how to tackle the two main issues of training of large neural networks (NNs): memory limitations and impracticably large training times. Typically, training data are preloaded into memory prior to training, a particular challenge for seismic applications in which the data format is typically four times larger than that used for standard image processing tasks (float32 versus uint8). Based on an example from microseismic monitoring, we evaluate how more than 750 GB of data can be used to train a model by using a data generator approach, which only stores in memory the data required for that training batch. Furthermore, efficient training over large models is illustrated through the training of a seven-layer U-Net with input data dimensions of [Formula: see text] (approximately [Formula: see text] million parameters). Through a batch-splitting distributed training approach, the training times are reduced by a factor of four. The combination of data generators and distributed training removes any necessity of data subsampling or restriction of NN sizes, offering the opportunity to use larger networks, higher resolution input data, or move from 2D to 3D problem spaces.

Download Full-text

Evaluation of an algorithm to choose between competing models of respiratory mechanics

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2015-0103 ◽

2015 ◽

Vol 1 (1) ◽

pp. 428-432 ◽

Cited By ~ 2

Author(s):

Jörn Kretschmer ◽

Axel Riedlinger ◽

Knut Möller

Keyword(s):

Decision Support ◽

Respiratory Mechanics ◽

Clinical Situation ◽

Complex Model ◽

Model Complexity ◽

Model Parameters ◽

Data Sets ◽

Quality Model ◽

Model Based ◽

Competing Models

AbstractModel based decision support helps in optimizing therapy settings for individual patients while providing additional insight into a patient’s disease state through the identified model parameters. Using multiple models with different simulation focus and complexity allows adapting decision support to the current clinical situation and the available data. A previously presented set of numerical criteria allows selecting the best model based on fit quality, model complexity, and how well the parameter values are defined by the presented data. To systematically evaluate those criteria in an algorithm we have created insilico data sets using four different respiratory mechanics models with three different parameter settings each. Each of those artificial patients was ventilated with three different manoeuvres and the resulting data was used to identify the same models used to create the data. The selection algorithm was then presented with the results to select the best model. Not considering determinateness of the identified model parameters, the algorithm chose the same model that was used to create the data in 78%, a more complex model in 5% and a less complex model in 18% of all cases. When including the determinateness of model parameters in the decision process, the algorithm chose the same model in 42% of the cases and a less complex model in 56% of all cases. In 2% of the presented cases, no model complied with the required criteria.

Download Full-text

HOW TO USE CROP GROWTH MODEL WOFOST FOR FORECASTING GROWTH AND YIELD OF A CROP

Journal of AgriSearch ◽

10.21921/jas.v3i1.4107 ◽

2016 ◽

Vol 3 (1) ◽

Author(s):

LAL SINGH ◽

PARMEET SINGH ◽

RAIHANA HABIB KANTH ◽

PURUSHOTAM SINGH ◽

SABIA AKHTER ◽

...

Keyword(s):

User Interface ◽

Graphical User Interface ◽

Input Data ◽

Growth And Yield ◽

Weather Data ◽

Data Sets ◽

Crop Growth Model ◽

Control Center ◽

Field Crops ◽

Crop Calendar

WOFOST version 7.1.3 is a computer model that simulates the growth and production of annual field crops. All the run options are operational through a graphical user interface named WOFOST Control Center version 1.8 (WCC). WCC facilitates selecting the production level, and input data sets on crop, soil, weather, crop calendar, hydrological field conditions, soil fertility parameters and the output options. The files with crop, soil and weather data are explained, as well as the run files and the output files. A general overview is given of the development and the applications of the model. Its underlying concepts are discussed briefly.

Download Full-text

CPT to RVU conversion improves model performance in the prediction of surgical case length

Scientific Reports ◽

10.1038/s41598-021-93573-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nicholas Garside ◽

Hamed Zaribafzadeh ◽

Ricardo Henao ◽

Royce Chung ◽

Daniel Buckland

Keyword(s):

Model Performance ◽

Individual Case ◽

Model Complexity ◽

Actual Case ◽

Surgical Case ◽

Relative Value ◽

Prediction Time ◽

Boosted Decision Tree ◽

Tree Models ◽

Duke University

AbstractMethods used to predict surgical case time often rely upon the current procedural terminology (CPT) code as a nominal variable to train machine-learned models, however this limits the ability of the model to incorporate new procedures and adds complexity as the number of unique procedures increases. The relative value unit (RVU, a consensus-derived billing indicator) can serve as a proxy for procedure workload and could replace the CPT code as a primary feature for models that predict surgical case length. Using 11,696 surgical cases from Duke University Health System electronic health records data, we compared boosted decision tree models that predict individual case length, changing the method by which the model coded procedure type; CPT, RVU, and CPT–RVU combined. Performance of each model was assessed by inference time, MAE, and RMSE compared to the actual case length on a test set. Models were compared to each other and to the manual scheduler method that currently exists. RMSE for the RVU model (60.8 min) was similar to the CPT model (61.9 min), both of which were lower than scheduler (90.2 min). 65.2% of our RVU model’s predictions (compared to 43.2% from the current human scheduler method) fell within 20% of actual case time. Using RVUs reduced model prediction time by ninefold and reduced the number of training features from 485 to 44. Replacing pre-operative CPT codes with RVUs maintains model performance while decreasing overall model complexity in the prediction of surgical case length.

Download Full-text

An ABC-optimized fuzzy ELECTRE approach for assessing petroleum potential at the petroleum system level

Open Geosciences ◽

10.1515/geo-2020-0159 ◽

2020 ◽

Vol 12 (1) ◽

pp. 580-597

Author(s):

Mohamad Hamzeh ◽

Farid Karimipour

Keyword(s):

Spatial Data ◽

Spatial Models ◽

Model Performance ◽

Essential Elements ◽

Petroleum System ◽

Petroleum Exploration ◽

System Level ◽

Data Sets ◽

Petroleum Potential ◽

Spatial Data Sets

AbstractAn inevitable aspect of modern petroleum exploration is the simultaneous consideration of large, complex, and disparate spatial data sets. In this context, the present article proposes the optimized fuzzy ELECTRE (OFE) approach based on combining the artificial bee colony (ABC) optimization algorithm, fuzzy logic, and an outranking method to assess petroleum potential at the petroleum system level in a spatial framework using experts’ knowledge and the information available in the discovered petroleum accumulations simultaneously. It uses the characteristics of the essential elements of a petroleum system as key criteria. To demonstrate the approach, a case study was conducted on the Red River petroleum system of the Williston Basin. Having completed the assorted preprocessing steps, eight spatial data sets associated with the criteria were integrated using the OFE to produce a map that makes it possible to delineate the areas with the highest petroleum potential and the lowest risk for further exploratory investigations. The success and prediction rate curves were used to measure the performance of the model. Both success and prediction accuracies lie in the range of 80–90%, indicating an excellent model performance. Considering the five-class petroleum potential, the proposed approach outperforms the spatial models used in the previous studies. In addition, comparing the results of the FE and OFE indicated that the optimization of the weights by the ABC algorithm has improved accuracy by approximately 15%, namely, a relatively higher success rate and lower risk in petroleum exploration.

Download Full-text

Ensemble Predictions of Air Pollutants in China in 2013 for Health Effects Studies Using WRF/CMAQ Modeling System with Four Emission Inventories

10.5194/acp-2017-182 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jianlin Hu ◽

Xun Li ◽

Lin Huang ◽

Qi Ying ◽

Qiang Zhang ◽

...

Keyword(s):

Air Quality ◽

Health Effects ◽

Air Pollutants ◽

Emission Inventory ◽

Model Performance ◽

Observation Data ◽

Emission Inventories ◽

Air Quality Model ◽

Quality Model ◽

Future Health

Abstract. Accurate exposure estimates are required for health effects analyses of severe air pollution in China. Chemical transport models (CTMs) are widely used tools to provide detailed information of spatial distribution, chemical composition, particle size fractions, and source origins of pollutants. The accuracy of CTMs' predictions in China is largely affected by the uncertainties of public available emission inventories. The Community Multi-scale Air Quality model (CMAQ) with meteorological inputs from the Weather Research and Forecasting model (WRF) were used in this study to simulate air quality in China in 2013. Four sets of simulations were conducted with four different anthropogenic emission inventories, including the Multi-resolution Emission Inventory for China (MEIC), the Emission Inventory for China by School of Environment at Tsinghua University (SOE), the Emissions Database for Global Atmospheric Research (EDGAR), and the Regional Emission inventory in Asia version 2 (REAS2). Model performance was evaluated against available observation data from 422 sites in 60 cities across China. Model predictions of O3 and PM2.5 with the four inventories generally meet the criteria of model performance, but difference exists in different pollutants and different regions among the inventories. Ensemble predictions were calculated by linearly combining the results from different inventories under the constraint that sum of the squared errors between the ensemble results and the observations from all the cities was minimized. The ensemble annual concentrations show improved agreement with observations in most cities. The mean fractional bias (MFB) and mean fractional errors (MFE) of the ensemble predicted annual PM2.5 at the 60 cities are −0.11 and 0.24, respectively, which are better than the MFB (−0.25–−0.16) and MFE (0.26–0.31) of individual simulations. The ensemble annual 1-hour peak O3 (O3-1 h) concentrations are also improved, with mean normalized bias (MNB) of 0.03 and mean normalized errors (MNE) of 0.14, compared to MNB of 0.06–0.19 and MNE of 0.16–0.22 of the individual predictions. The ensemble predictions agree better with observations with daily, monthly, and annual averaging times in all regions of China for both PM2.5 and O3-1 h. The study demonstrates that ensemble predictions by combining predictions from individual emission inventories can improve the accuracy of predicted temporal and spatial distributions of air pollutants. This study is the first ensemble model study in China using multiple emission inventories and the results are publicly available for future health effects studies.

Download Full-text

Sensitivity of a Complex Urban Air Quality Model to Input Data

Journal of Applied Meteorology ◽

10.1175/1520-0450(1981)020<1020:soacua>2.0.co;2 ◽

1981 ◽

Vol 20 (9) ◽

pp. 1020-1040 ◽

Cited By ~ 19

Author(s):

Christian Seigneur ◽

Thomas W. Tesche ◽

Philip M. Roth ◽

Larry E. Reid

Keyword(s):

Air Quality ◽

Input Data ◽

Urban Air Quality ◽

Urban Air ◽

Air Quality Model ◽

Quality Model

Download Full-text

Assessment of nominal data requirements for robust estimation of fractional snow cover in alpine-forested terrain

10.5194/egusphere-egu21-14212 ◽

2021 ◽

Author(s):

Elzbieta Wisniewski ◽

Wit Wisniewski

Keyword(s):

Snow Cover ◽

Input Data ◽

Model Performance ◽

Slope Aspect ◽

Ann Model ◽

Linear Modeling ◽

Nominal Data ◽

Fractional Snow Cover ◽

Non Linear ◽

Input Variables

<p>The presented research examines what minimum combination of input variables are required to obtain state-of-the-art fractional snow cover (FSC) estimates for heterogeneous alpine-forested terrains. Currently, one of the most accurate FSC estimators for alpine regions is based on training an Artificial Neural Network (ANN) that can deconvolve the relationships among numerous compounded and possibly non-linear bio-geophysical relations encountered in alpine terrain. Under the assumption that the ANN optimally extracts available information from its input data, we can exploit the ANN as a tool to assess the contributions toward FSC estimation of each of the data sources, and combinations thereof. By assessing the quality of the modeled FSC estimates versus ground equivalent data, suitable combinations of input variables can be identified. High spatial resolution IKONOS images are used to estimate snow cover for ANN training and validation, and also for error assessment of the ANN FSC results. Input variables are initially chosen representing information already incorporated into leading snow cover estimators (ex. two multispectral bands for NDSI, etc.). Additional variables such as topographic slope, aspect, and shadow distribution are evaluated to observe the ANN as it accounts for illumination incidence and directional reflectance of surfaces affecting the viewed radiance in complex terrain. Snow usually covers vegetation and underlying geology partially, therefore the ANN also has to resolve spectral mixtures of unobscured surfaces surrounded by snow. Multispectral imagery if therefore acquired in the fall prior to the first snow of the season and are included in the ANN analyses for assessing the baseline reflectance values of the environment that later become modified by the snow. In this study, nine representative scenarios of input data are selected to analyze the FSC performance. Numerous selections of input data combinations produced good results attesting to the powerful ability of ANNs to extract information and utilize redundancy. The best ANN FSC model performance was achieved when all 15 pre-selected inputs were used. The need for non-linear modeling to estimate FSC was verified by forcing the ANN to behave linearly. The linear ANN model exhibited profoundly decreased FSC performance, indicating that non-linear processing more optimally estimates FSC in alpine-forested environments.</p>

Download Full-text

The effect of input data resolution and complexity on the uncertainty of hydrological predictions in a humid vegetated watershed

Hydrology and Earth System Sciences ◽

10.5194/hess-22-5947-2018 ◽

2018 ◽

Vol 22 (11) ◽

pp. 5947-5965 ◽

Cited By ~ 1

Author(s):

Linh Hoang ◽

Rajith Mukundan ◽

Karen E. B. Moore ◽

Emmet M. Owens ◽

Tammo S. Steenhuis

Keyword(s):

Land Use ◽

Parameter Uncertainty ◽

Input Data ◽

Hydrological Modeling ◽

Assessment Tool ◽

Model Performance ◽

Data Complexity ◽

Delaware River ◽

Output Uncertainty ◽

Data Resolution

Abstract. Uncertainty in hydrological modeling is of significant concern due to its effects on prediction and subsequent application in watershed management. Similar to other distributed hydrological models, model uncertainty is an issue in applying the Soil and Water Assessment Tool (SWAT). Previous research has shown how SWAT predictions are affected by uncertainty in parameter estimation and input data resolution. Nevertheless, little information is available on how parameter uncertainty and output uncertainty are affected by input data of varying complexity. In this study, SWAT-Hillslope (SWAT-HS), a modified version of SWAT capable of predicting saturation-excess runoff, was applied to assess the effects of input data with varying degrees of complexity on parameter uncertainty and output uncertainty. Four digital elevation model (DEM) resolutions (1, 3, 10 and 30 m) were tested for their ability to predict streamflow and saturated areas. In a second analysis, three soil maps and three land use maps were used to build nine SWAT-HS setups from simple to complex (fewer to more soil types/land use classes), which were then compared to study the effect of input data complexity on model prediction/output uncertainty. The case study was the Town Brook watershed in the upper reaches of the West Branch Delaware River in the Catskill region, New York, USA. Results show that DEM resolution did not impact parameter uncertainty or affect the simulation of streamflow at the watershed outlet but significantly affected the spatial pattern of saturated areas, with 10m being the most appropriate grid size to use for our application. The comparison of nine model setups revealed that input data complexity did not affect parameter uncertainty. Model setups using intermediate soil/land use specifications were slightly better than the ones using simple information, while the most complex setup did not show any improvement from the intermediate ones. We conclude that improving input resolution and complexity may not necessarily improve model performance or reduce parameter and output uncertainty, but using multiple temporal and spatial observations can aid in finding the appropriate parameter sets and in reducing prediction/output uncertainty.

Download Full-text