scholarly journals Data-driven catchment classification: application to the pub problem

2011 ◽  
Vol 15 (6) ◽  
pp. 1921-1935 ◽  
Author(s):  
M. Di Prinzio ◽  
A. Castellarin ◽  
E. Toth

Abstract. A promising approach to catchment classification makes use of unsupervised neural networks (Self Organising Maps, SOM's), which organise input data through non-linear techniques depending on the intrinsic similarity of the data themselves. Our study considers ∼300 Italian catchments scattered nationwide, for which several descriptors of the streamflow regime and geomorphoclimatic characteristics are available. We compare a reference classification, identified by using indices of the streamflow regime as input to SOM, with four alternative classifications, which were identified on the basis of catchment descriptors that can be derived for ungauged basins. One alternative classification adopts the available catchment descriptors as input to SOM, the remaining classifications are identified by applying SOM to sets of derived variables obtained by applying Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) to the available catchment descriptors. The comparison is performed relative to a PUB problem, that is for predicting several streamflow indices in ungauged basins. We perform an extensive cross-validation to quantify nationwide the accuracy of predictions of mean annual runoff, mean annual flood, and flood quantiles associated with given exceedance probabilities. Results of the study indicate that performing PCA and, in particular, CCA on the available set of catchment descriptors before applying SOM significantly improves the effectiveness of SOM classifications by reducing the uncertainty of hydrological predictions in ungauged sites.

2011 ◽  
Vol 8 (1) ◽  
pp. 391-427 ◽  
Author(s):  
M. Di Prinzio ◽  
A. Castellarin ◽  
E. Toth

Abstract. Objective criteria for catchment classification are identified by the scientific community among the key research topics for improving the interpretation and representation of the spatiotemporal variability of streamflow. A promising approach to catchment classification makes use of unsupervised neural networks (Self Organising Maps, SOM's), which organise input data through non-linear techniques depending on the intrinsic similarity of the data themselves. Our study considers ~300 Italian catchments scattered nationwide, for which several descriptors of the streamflow regime and geomorphoclimatic characteristics are available. We qualitatively and quantitatively compare in the context of PUB (Prediction in Ungauged Basins) a reference classification, RC, with four alternative classifications, AC's. RC was identified by using indices of the streamflow regime as input to SOM, whereas AC's were identified on the basis of catchment descriptors that can be derived for ungauged basins. One AC directly adopts the available catchment descriptors as input to SOM. The remaining AC's are identified by applying SOM to two sets of derived variables obtained by applying Principal Component Analysis (PCA, second AC) and Canonical Correlation Analysis (CCA, third and fourth ACs) to the available catchment descriptors. First, we measure the similarity between each AC and RC. Second, we use AC's and RC to regionalize several streamflow indices and we compare AC's with RC in terms of accuracy of streamflow prediction. In particular, we perform an extensive cross-validation to quantify nationwide the accuracy of predictions in ungauged basins of mean annual runoff, mean annual flood, and flood quantiles associated with given exceedance probabilities. Results of the study show that CCA can significantly improve the effectiveness of SOM classifications for the PUB problem.


2011 ◽  
Vol 11 (3) ◽  
pp. 673-695 ◽  
Author(s):  
V. Iacobellis ◽  
A. Gioia ◽  
S. Manfreda ◽  
M. Fiorentino

Abstract. A regional probabilistic model for the estimation of medium-high return period flood quantiles is presented. The model is based on the use of theoretically derived probability distributions of annual maximum flood peaks (DDF). The general model is called TCIF (Two-Component IF model) and encompasses two different threshold mechanisms associated with ordinary and extraordinary events, respectively. Based on at-site calibration of this model for 33 gauged sites in Southern Italy, a regional analysis is performed obtaining satisfactory results for the estimation of flood quantiles for return periods of technical interest, thus suggesting the use of the proposed methodology for the application to ungauged basins. The model is validated by using a jack-knife cross-validation technique taking all river basins into consideration.


2021 ◽  
Vol 50 (9) ◽  
pp. 2765-2779
Author(s):  
Basri Badyalina ◽  
Ani Shabri ◽  
Muhammad Fadhil Marsani

Among the foremost frequent and vital tasks for hydrologist is to deliver a high accuracy estimation on the hydrological variable, which is reliable. It is essential for flood risk evaluation project, hydropower development and for developing efficient water resource management. Presently, the approach of the Group Method of Data Handling (GMDH) has been widely applied in the hydrological modelling sector. Yet, comparatively, the same tool is not vastly used for the hydrological estimation at ungauged basins. In this study, a modified GMDH (MGMDH) model was developed to ameliorate the GMDH model performance on estimating hydrological variable at ungauged sites. The MGMDH model consists of four transfer functions that include polynomial, hyperbolic tangent, sigmoid and radial basis for hydrological estimation at ungauged basins; as well as; it incorporates the Principal Component Analysis (PCA) in the GMDH model. The purpose of PCA is to lessen the complexity of the GMDH model; meanwhile, the implementation of four transfer functions is to enhance the estimation performance of the GMDH model. In evaluating the effectiveness of the proposed model, 70 selected basins were adopted from the locations throughout Peninsular Malaysia. A comparative study on the performance was done between the MGMDH and GMDH model as well as with other extensively used models in the area of flood quantile estimation at ungauged basins known as Linear Regression (LR), Nonlinear Regression (NLR) and Artificial Neural Network (ANN). The results acquired demonstrated that the MGMDH model possessed the best estimation with the highest accuracy comparatively among all models tested. Thus, it can be deduced that MGMDH model is a robust and efficient instrument for flood quantiles estimation at ungauged basins.


2020 ◽  
Author(s):  
Xin Yi See ◽  
Benjamin Reiner ◽  
Xuelan Wen ◽  
T. Alexander Wheeler ◽  
Channing Klein ◽  
...  

<div> <div> <div> <p>Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H- pyrrole (C) via Ti- catalyzed formal [2+2+1] cycloaddition of phenyl propyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space along with k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and only optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (> 90% C) by incorporating 2,6-dimethyl- 4-(pyrrolidin-1-yl)pyridine as a ligand. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development. </p> </div> </div> </div>


2012 ◽  
Vol 43 (6) ◽  
pp. 833-850 ◽  
Author(s):  
Ziqi Yan ◽  
Lars Gottschalk ◽  
Irina Krasovskaia ◽  
Jun Xia

The long-term mean value of runoff is the basic descriptor of available water resources. This paper focuses on the accuracy that can be achieved when mapping this variable across space and along main rivers for a given stream gauging network. Three stochastic interpolation schemes for estimating average annual runoff across space are evaluated and compared. Two of the schemes firstly interpolate runoff to a regular grid net and then integrate the grid values along rivers. One of these schemes includes a constraint to account for the lateral water balance along the rivers. The third scheme interpolates runoff directly to points along rivers. A drainage basin in China with 20 gauging sites is used as a test area. In general, all three approaches reproduce the sample discharges along rivers with postdiction errors along main river branches around 10%. Using more objective cross-validation results, it was found that the two schemes based on basin integration, and especially the one with a constraint, performed significantly better than the one with direct interpolation to points along rivers. The analysis did not allow identification of possible influence of surface water use.


2019 ◽  
pp. 311-316
Author(s):  
Marie Dalémat ◽  
Michel Coret ◽  
Adrien Leygue ◽  
Erwan Verron

Sign in / Sign up

Export Citation Format

Share Document