scholarly journals The MISSING DATA REPRESENTATION BY PERCEPTION THRESHOLDS IN FLOOD FLOW FREQUENCY ASSESSMENT

Author(s):  
Nikola Đokić ◽  
Borislava Blagojević ◽  
Vladislava Mihailović

Flood flow frequency analysis (FFA) plays one of the key roles in many fields of hydraulic engineering and water resources management. The output of the FFA are sets of flood quantiles which are the base for the next step of the flood related analyses. The reliability of these results depends of many factors, and the first one is the reliability of the input data - datasets of the annual peak flow.  In practice, however, engineers often encounter the problem of incomplete datasets (missing data, data gaps and/or broken records). In this paper, we perform at-site focused analysis, and a complete  dataset of annual peak flows from 1931 to 2016 at the hydrologic station Senta of the Tisa river  we use as the reference dataset. From this original dataset  we remove some data and thus we obtain 15 new series that have gaps of different lengths and locations. Each dataset we further subject to flood frequency assessment using USACE HEC-SSP Bulletin 17C analysis, which introduces the concept of „perception threshold“ that can be used for missing data representation. For the data representation in HEC-SSP we use infinity for perception threshold upper bound and different lower bounds for all missing flows in one dataset, so that  we create 56 variants of input HEC-SSP datasets. The flood flow quantiles assessed from the datsets with missing data and different perception thresholds we evaluate through percentage error relative to the reference dataset and confidence interval width as uncertainty measure.  The results  for datasets with one gap up to 23% of the observation period, indicate acceptable flood quantile estimates are obtained even for larger return periods, by setting a lower perception threshold bound at the value of the highest observed flow in the available series of annual maxima.

Water Policy ◽  
2021 ◽  
Author(s):  
Richard M. Vogel ◽  
Charles N. Kroll

Abstract Extreme drought and resulting low streamflows occur throughout the U.S., causing billions of dollars in annual losses, detrimentally impacting ecosystems, as well as agricultural, hydropower, navigation, water supply, recreation, and a myriad of other water resource systems, leading to reductions in both the effectiveness and resiliency of our water resource infrastructure. Since 1966, with the introduction of Bulletin 13 titled ‘Methods of Flow Frequency Analysis’, the U.S. adopted uniform guidelines for performing flood flow frequency analysis to ensure and enable all federal agencies concerned with water resource design, planning, and management under flood conditions to obtain sensible, consistent, and reproducible estimators of flood flow statistics. Remarkably, over one-half century later, no uniform national U.S. guidelines for hydrologic drought streamflow frequency analysis exist, and the various assorted guidelines that do exist are not reliable because (1) they are based on methods developed for floods, which are distinctly different than low streamflows and (2) the methods do not take advantage of the myriad of advances in flood and low streamflow frequency analyses over the last 50 years. We provide a justification for the need for developing national guidelines for streamflow drought frequency analysis as an analog to the existing national guidelines for flood frequency analysis. Those guidelines should result in improved water resources design, planning, operations, and management under low streamflow conditions throughout the U.S. and could prove useful elsewhere.


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Ariel Linden

The patient activation measure (PAM) is an increasingly popular instrument used as the basis for interventions to improve patient engagement and as an outcome measure to assess intervention effect. However, a PAM score may be calculated when there are missing responses, which could lead to substantial measurement error. In this paper, measurement error is systematically estimated across the full possible range of missing items (one to twelve), using simulation in which populated items were randomly replaced with missing data for each of 1,138 complete surveys obtained in a randomized controlled trial. The PAM score was then calculated, followed by comparisons of overall simulated average mean, minimum, and maximum PAM scores to the true PAM score in order to assess the absolute percentage error (APE) for each comparison. With only one missing item, the average APE was 2.5% comparing the true PAM score to the simulated minimum score and 4.3% compared to the simulated maximum score. APEs increased with additional missing items, such that surveys with 12 missing items had average APEs of 29.7% (minimum) and 44.4% (maximum). Several suggestions and alternative approaches are offered that could be pursued to improve measurement accuracy when responses are missing.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Rahi Jain ◽  
Wei Xu

Abstract Background Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. Method and results This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. Conclusion DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data.


2012 ◽  
Vol 16 (5) ◽  
pp. 1269-1279 ◽  
Author(s):  
S. B. Shaw ◽  
M. T. Walter

Abstract. Comparative analysis has been a little used approach to the teaching of hydrology. Instead, hydrology is often taught by introducing fundamental principles with the assumption that they are sufficiently universal to apply across most any hydrologic system. In this paper, we illustrate the value of using comparative analysis to enhance students' insights into the degree and predictability of future non-stationarity in flood frequency analysis. Traditionally, flood frequency analysis is taught from a statistical perspective that can offer limited means of understanding the nature of non-stationarity. By visually comparing graphics of mean daily flows and annual peak discharges (plotted against Julian day) for watersheds in a variety of locales, distinct differences in the timing and nature of flooding in different regions of the US becomes readily apparent. Such differences highlight the dominant hydroclimatological drivers of different watersheds. When linked with information on the predictability of hydroclimatic drivers (hurricanes, atmospheric rivers, snowpack melt, convective events) in a changing climate, such comparative analysis provides students with an improved physical understanding of flood processes and a stronger foundation on which to make judgments about how to modify statistical techniques for making predictions in a changing climate. We envision that such comparative analysis could be incorporated into a number of other traditional hydrologic topics.


2020 ◽  
Vol 07 (02) ◽  
pp. 161-177
Author(s):  
Oyekale Abel Alade ◽  
Ali Selamat ◽  
Roselina Sallehuddin

One major characteristic of data is completeness. Missing data is a significant problem in medical datasets. It leads to incorrect classification of patients and is dangerous to the health management of patients. Many factors lead to the missingness of values in databases in medical datasets. In this paper, we propose the need to examine the causes of missing data in a medical dataset to ensure that the right imputation method is used in solving the problem. The mechanism of missingness in datasets was studied to know the missing pattern of datasets and determine a suitable imputation technique to generate complete datasets. The pattern shows that the missingness of the dataset used in this study is not a monotone missing pattern. Also, single imputation techniques underestimate variance and ignore relationships among the variables; therefore, we used multiple imputations technique that runs in five iterations for the imputation of each missing value. The whole missing values in the dataset were 100% regenerated. The imputed datasets were validated using an extreme learning machine (ELM) classifier. The results show improvement in the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with the original dataset with different classifiers like support vector machine (SVM), radial basis function (RBF), and ELMs.


2021 ◽  
Vol 7 (9) ◽  
pp. 1608-1619
Author(s):  
Fatimah Bibi Hamzah ◽  
Firdaus Mohd Hamzah ◽  
Siti Fatin Mohd Razali ◽  
Hafiza Samad

Missing data is a common problem in hydrological studies; therefore, data reconstruction is critical, especially when it is crucial to employ all available resources, even incomplete records. Furthermore, missing data could have an impact on statistical analysis results, and the amount of variability in the data would not be fittingly anticipated. As a result, this study compared the performance of three imputation methods in predicting recurrence in streamflow datasets: robust random regression imputation (RRRI), k-nearest neighbours (k-NN), and classification and regression tree (CART). Furthermore, entire historical daily streamflow data from 2012 to 2014 (as training dataset) were utilised to assess and validate the effectiveness of the imputation methods in addressing missing streamflow data. Following that, all three methods coupled with multiple linear regression (MLR), were used to restore streamflow rates in Malaysia's Langat River Basin from 1978 to 2016. The estimation techniques effectiveness was evaluated using metrics inclusive of the Nash-Sutcliffe efficiency coefficient (CE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE). The results confirmed that RRRI coupled with MLR (RRRI-MLR) had the lowest RMSE and MAPE values, outperforming all other techniques tested for filling missing data in daily streamflow datasets. This indicates that the RRRI-MLR is the best method for dealing with missing data in streamflow datasets. Doi: 10.28991/cej-2021-03091747 Full Text: PDF


2021 ◽  
Author(s):  
Gerardo Benito ◽  
Olegario Castillo ◽  
Juan A. Ballesteros-Cánovas ◽  
Maria Machado ◽  
Mariano Barriendos

Abstract. Current climate modelling frameworks present significant uncertainties when it comes to quantifying flood quantiles in the context of climate change, calling for new information and strategies in hazard assessments. Here, state-of-the-art methods on hydraulic and statistical modelling are applied to historical and contemporaneous flood records to evaluate flood hazards beyond natural climate cycles. A comprehensive flood record of the Duero River in Zamora (Spain) was compiled from documentary sources, early water-level readings and continuous gauge records spanning the last 500 years. Documentary evidence of flood events includes minute books (municipal and ecclesiastic), narrative descriptions, epigraphic marks, newspapers and technical reports. We identified 69 flood events over the period 1250 to 1871, of which, 15 were classified as catastrophic floods, 16 as extraordinary floods, and 38 as ordinary floods. Subsequently, a 2D-hydraulic model was implemented to relate flood stages (flood marks and inundated areas) into discharges. The historical flood records show the largest floods over the last 500 years occurred in 1860 (3450 m3/s), 1597 (3200 m3/s), and 1739 (2700 m3/s). Moreover, at least 24 floods exceeded the perception threshold of 1900 m3/s during the period (1500–1871). Annual maximum flood records were completed with gauged water-level readings (PRE: 1872–1919) and systematic gauge records (SYS: 1920–2018). The flood frequency analyses were based on (1) Expected Moments Algorithm (EMA) and (2) Maximum Likelihood Estimator (MLE) method, using five datasets with different temporal frameworks (HISTO: 1511–2018, PRE-SYS: 1872–2018, ALLSYS: 1920–2018, SYS1: 1920–1969, and SYS2: 1970–2018). The most consistent results were obtained using the HISTO dataset, even for high quantiles (0.001 % AEP). PRE-SYS was robust for the 1 % AEP flood with increasing uncertainty in the 0.2 % AEP or 500-year flood, and ALLSYS results were uncertain in the 1 % and 0.2 % AEP floods. Since the 1970s, the frequency of extraordinary floods (> 1900 m3/s) declined, although floods on the range of the historical perception threshold occurred in 2001 (2075 m3/s) and 2013 (1654 m3/s). Even if the future remains uncertain, this bottom-up approach addresses flood hazards under climate variability providing real and certain flood discharges. Our results can provide a guide on low-regret adaptation decisions and improve public perception of extreme flooding.


Sign in / Sign up

Export Citation Format

Share Document