Technical Note: The normal quantile transformation and its application in a flood forecasting system

Abstract. The Normal Quantile Transform (NQT) has been used in many hydrological and meteorological applications in order to make the Cumulated Distribution Function (CDF) of the observed, simulated and forecast river discharge, water level or precipitation data Gaussian. It is also the heart of the meta-Gaussian model for assessing the total predictive uncertainty of the Hydrological Uncertainty Processor (HUP) developed by Krzysztofowicz. In the field of geo-statistics this transformation is better known as the Normal-Score Transform. In this paper some possible problems caused by small sample sizes when applying the NQT in flood forecasting systems will be discussed and a novel way to solve the problem will be outlined by combining extreme value analysis and non-parametric regression methods. The method will be illustrated by examples of hydrological stream-flow forecasts.

Download Full-text

Technical Note: The Normal Quantile Transformation and its application in a flood forecasting system

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-8-9275-2011 ◽

2011 ◽

Vol 8 (5) ◽

pp. 9275-9297 ◽

Cited By ~ 2

Author(s):

K. Bogner ◽

F. Pappenberger ◽

H. L. Cloke

Keyword(s):

Gaussian Model ◽

Flood Forecasting ◽

Technical Note ◽

Small Sample ◽

Extreme Value Analysis ◽

Practical Implementation ◽

Value Analysis ◽

Regression Methods ◽

Forecasting System ◽

Small Sample Sizes

Abstract. The Normal Quantile Transform (NQT) has been used in many hydrological and meteorological applications in order to make the Cumulated Density Function (CDF) of the observed, simulated and forecast river discharge, water level or precipitation data Gaussian. It is also the heart of the meta-Gaussian model for assessing the total predictive uncertainty of the Hydrological Uncertainty Processor (HUP) developed by Krzysztofowicz. In the field of geo-statistics this transformation is better known as Normal-Score Transform. In this paper some possible problems caused by small sample sizes for the applicability in flood forecasting systems will be discussed and illustrated by examples. For the practical implementation commands and examples from the freely available and widely used statistical computing language R (R Development Core Team, 2011) will be given (represented in Courier font) and possible solutions are suggested by combining extreme value analysis and non-parametric regression methods.

Download Full-text

Flash-Flood Forecasting in an Andean Mountain Catchment—Development of a Step-Wise Methodology Based on the Random Forest Algorithm

Water ◽

10.3390/w10111519 ◽

2018 ◽

Vol 10 (11) ◽

pp. 1519 ◽

Cited By ~ 20

Author(s):

Paul Muñoz ◽

Johanna Orellana-Alvear ◽

Patrick Willems ◽

Rolando Célleri

Keyword(s):

Random Forest ◽

Flash Flood ◽

Flood Forecasting ◽

Extreme Value Analysis ◽

Complex Data ◽

Random Forest Algorithm ◽

Time Duration ◽

Value Analysis ◽

Parsimonious Models ◽

Flash Flood Forecasting

Flash-flood forecasting has emerged worldwide due to the catastrophic socio-economic impacts this hazard might cause and the expected increase of its frequency in the future. In mountain catchments, precipitation-runoff forecasts are limited by the intrinsic complexity of the processes involved, particularly its high rainfall variability. While process-based models are hard to implement, there is a potential to use the random forest algorithm due to its simplicity, robustness and capacity to deal with complex data structures. Here a step-wise methodology is proposed to derive parsimonious models accounting for both hydrological functioning of the catchment (e.g., input data, representation of antecedent moisture conditions) and random forest procedures (e.g., sensitivity analyses, dimension reduction, optimal input composition). The methodology was applied to develop short-term prediction models of varying time duration (4, 8, 12, 18 and 24 h) for a catchment representative of the Ecuadorian Andes. Results show that the derived parsimonious models can reach validation efficiencies (Nash-Sutcliffe coefficient) from 0.761 (4-h) to 0.384 (24-h) for optimal inputs composed only by features accounting for 80% of the model’s outcome variance. Improvement in the prediction of extreme peak flows was demonstrated (extreme value analysis) by including precipitation information in contrast to the use of pure autoregressive models.

Download Full-text

The impact of the 30 most cited articles on hip arthroscopy: what is the subject matter?

Journal of Hip Preservation Surgery ◽

10.1093/jhps/hnz067 ◽

2020 ◽

Vol 7 (1) ◽

pp. 14-21

Author(s):

Alexander von Glinski ◽

Emre Yilmaz ◽

Ryan Goodmanson ◽

Clifford Pierre ◽

Sven Frieler ◽

...

Keyword(s):

Surgical Treatment ◽

Hip Arthroscopy ◽

Randomized Clinical Trials ◽

Technical Note ◽

Small Sample ◽

Level Of Evidence ◽

Thomson Reuters ◽

Treatment Indications ◽

Small Sample Sizes ◽

The Impact

Abstract The purpose of this study was to identify the 30 most cited articles on hip arthroscopy and discuss their influence on recent surgical treatment. Due to advancements in hip arthroscopy, there is a widening spectrum of diagnostic and treatment indications. The purpose of this study was to identify the 30 most cited articles on hip arthroscopy and discuss their influence on contemporary surgical treatment. The Thomson Reuters Web of Science was used to identify the 30 most cited studies on hip arthroscopy between 1900 and 2018. These 30 articles generated 6152 citations with an average of 205.07 citations per item. Number of citations ranged from 146 to 461. Twenty-five out of the 30 papers were clinical cohort studies with a level of evidence between III and IV, encompassing 4348 patients. Four studies were reviewed (one including a technical note) and one a case report. We were able to identify the 30 most cited articles in the field of hip arthroscopy. Most articles were reported in high-impact journals, but reported small sample sizes in a retrospective setting. Prospective multi-arm cohort trials or randomized clinical trials represent opportunities for future studies.

Download Full-text

BlackSheep: A Bioconductor and Bioconda package for differential extreme value analysis

10.1101/825067 ◽

2019 ◽

Author(s):

Lili Blumenberg ◽

Emily Kawaler ◽

MacIntosh Cornwell ◽

Shaleigh Smith ◽

Kelly Ruggles ◽

...

Keyword(s):

Differential Expression Analysis ◽

Shotgun Proteomics ◽

Small Sample ◽

Extreme Value Analysis ◽

Differential Analysis ◽

Value Analysis ◽

Genome Wide ◽

Genome Wide Data ◽

Parametric Description

AbstractUnbiased assays such as shotgun proteomics and RNA-seq provide high-resolution molecular characterization of tumors. These assays measure molecules with highly varied distributions, making interpretation and hypothesis testing challenging. Samples with the most extreme measurements for a molecule can reveal the most interesting biological insights, yet are often excluded from analysis. Furthermore, rare disease subtypes are, by definition, underrepresented in cancer cohorts. To provide a strategy for identifying molecules aberrantly enriched in small sample cohorts, we present BlackSheep--a package for non-parametric description and differential analysis of genome-wide data, available at https://github.com/ruggleslab/blackSheep. BlackSheep is a complementary tool to other differential expression analysis methods that may be underpowered when analyzing small subgroups in a larger cohort.

Download Full-text

Topology and Geometry for Small Sample Sizes: An Application to Research on the Profoundly Gifted

10.31234/osf.io/mknpj ◽

2018 ◽

Author(s):

Colleen Molloy Farrelly

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

School Attendance ◽

Gifted Students ◽

Small Sample ◽

Topological Data Analysis ◽

Small Samples ◽

Logistic Regression Models ◽

Regression Methods ◽

Small Sample Sizes

This study aims to confirm prior findings on the usefulness of topological data analysis (TDA) in the analysis of small samples, particularly focused on cohorts of profoundly gifted students, as well as explore the use of TDA-based regression methods for statistical modeling with small samples. A subset of the Gross sample is analyzed through supervised and unsupervised methods, including 16 and 17 individuals, respectively. Unsupervised learning confirmed prior results suggesting that evenly gifted and unevenly gifted subpopulations fundamentally differ. Supervised learning focused on predicting graduate school attendance and awards earned during undergraduate studies, and TDA-based logistic regression models were compared with more traditional machine learning models for logistic regression. Results suggest 1) that TDA-based methods are capable of handing small samples and seem more robust to the issues that arise in small samples than other machine learning methods and 2) that early childhood achievement scores and several factors related to childhood education interventions (such as early entry and radical acceleration) play a role in predicting key educational and professional achievements in adulthood. Possible new directions from this work include the use of TDA-based tools in the analysis of rare cohorts thus-far relegated to qualitative analytics or case studies, as well as potential exploration of early educational factors and adult-level achievement in larger populations of the profoundly gifted, particularly within the Study of Exceptional Talent and Talent Identification Program cohorts.

Download Full-text

Spatial Features of Extreme Waves in Gulf of Mexico

Volume 6B: Ocean Engineering ◽

10.1115/omae2020-19190 ◽

2020 ◽

Author(s):

Ryota Wada ◽

Philip Jonathan ◽

Takuji Waseda

Keyword(s):

Gulf Of Mexico ◽

Sample Size ◽

Small Sample Size ◽

Extreme Value ◽

Small Sample ◽

Extreme Value Analysis ◽

Value Analysis ◽

Wave Hindcast ◽

Wave Data ◽

Effective Manner

Abstract Extreme value analysis of significant wave height using data from a single location often incurs large uncertainty due to small sample size. Including wave data from nearby locations increases sample size at the risk of introducing dependency between extreme events and hence violating modelling assumptions. In this work, we consider extreme value analysis of spatial wave data from the 109-year GOMOS wave hindcast for the Gulf of Mexico, seeking to incorporate the effects of spatial dependence in a simple but effective manner. We demonstrate that, for estimation of return values at a given location, incorporation of data from a circular disk region with radius of approximately 5° (long.-lat.), centred at the location of interest, provides an appropriate basis for extreme value analysis using the STM-E approach of Wada et al. (2018).

Download Full-text