Clustering Scatter Plots Using Data Depth Measures

A family dog supplies the measurements for scatter plots and variables so that students can explore relationships among data sets—not to mention paws and tails.

Download Full-text

Observational Uncertainty in Hydrological Modelling using Data Depth

Global NEST Journal ◽

10.30955/gnj.002354 ◽

2017 ◽

Vol 19 (3) ◽

pp. 489-497

Keyword(s):

Hydrological Model ◽

Meteorological Data ◽

River Basin Management ◽

Hydrological Modelling ◽

Data Depth ◽

Basin Management ◽

Flow Generation ◽

Observational Uncertainty ◽

Using Data ◽

Quantifying Uncertainty

For any river basin management, one needs tools to predict runoff at different time and spatial resolutions. Hydrological models are tools which account for the storage, flow of water and water balance in a watershed, which include exchanges of water and energy within the earth, atmosphere and oceans and utilise metrological data to generate flow. There are several sources of error in meteorological data, namely, through measurement at point level, interpolation, etc. When an erroneous input is passed to a model, one cannot expect an error free output from the prediction. Every prediction is associated with uncertainty. Quantification of these uncertainties is of prime importance in real world forecasting. In this study, an attempt has been made to study uncertainty associated with hydrological modelling, using the idea of data depth. To see the effect of uncertainty in rainfall on flow generation through a model, the input to a model was altered by adding an error and a different realisation was made. A Monte Carlo simulation generated a large number of hydrological model parameter sets drawn from the uniform distribution. The model was run using these parameters for each realisation of the rainfall. The parameters which are good for different realisations are more likely to be good parameters sets. For each parameter set, data depth was calculated and a likelihood was assigned to each parameter set based on the depth values. Based on this, the frequency distribution of the likelihood was analysed as well. The results show that uncertainty in hydrological modelling are multiplicative. The proposed methodology to assign prediction uncertainty is demonstrated using the ‘TopNet’ model for the Waipara river catchment located in the central east of the South Island, New Zealand. The results of this study will be helpful in calibration of hydrological model and in quantifying uncertainty in the prediction.

Download Full-text

The Kitchen Sink

The AI Delusion ◽

10.1093/oso/9780198824305.003.0009 ◽

2018 ◽

Author(s):

Gary Smith

Keyword(s):

Multiple Regression ◽

Historical Period ◽

Scatter Plot ◽

Consumer Spending ◽

Explanatory Variables ◽

The World ◽

Scatter Plots ◽

Using Data ◽

The One ◽

Over Time

Back in the 1980s, I talked to an economics professor who made forecasts for a large bank based on simple correlations like the one in Figure 1. If he wanted to forecast consumer spending, he made a scatter plot of income and spending and used a transparent ruler to draw a line that seemed to fit the data. If the scatter looked like Figure 1, then when income went up, he predicted that spending would go up. The problem with his simple scatter plots is that the world is not simple. Income affects spending, but so does wealth. What if this professor happened to draw his scatter plot using data from a historical period in which income rose (increasing spending) but the stock market crashed (reducing spending) and the wealth effect was more powerful than the income effect, so that spending declined, as in Figure 2? The professor’s scatter plot of spending and income will indicate that an increase in income reduces spending. Then, when he tries to forecast spending for a period when income and wealth both increase, his prediction of a decline in spending will be disastrously wrong. Multiple regression to the rescue. Multiple regression models have multiple explanatory variables. For example, a model of consumer spending might be: C = a + bY + cW where C is consumer spending, Y is household income, and W is wealth. The order in which the explanatory variables are listed does not matter. What does matter is which variables are included in the model and which are left out. A large part of the art of regression analysis is choosing explanatory variables that are important and ignoring those that are unimportant. The coefficient b measures the effect on spending of an increase in income, holding wealth constant, and c measures the effect on spending of an increase in wealth, holding income constant. The math for estimating these coefficients is complicated but the principle is simple: choose the estimates that give the best predictions of consumer spending for the data used to estimate the model. In Chapter 4, we saw that spurious correlations can appear when we compare variables like spending, income, and wealth that all tend to increase over time.

Download Full-text

Non-convex penalized multitask regression using data depth-based penalties

Stat ◽

10.1002/sta4.174 ◽

2018 ◽

Vol 7 (1) ◽

pp. e174 ◽

Cited By ~ 1

Author(s):

Subhabrata Majumdar ◽

Snigdhansu Chatterjee

Keyword(s):

Data Depth ◽

Using Data

Download Full-text

Regionalization of hydrological model parameters using data depth

Hydrology Research ◽

10.2166/nh.2011.031 ◽

2011 ◽

Vol 42 (5) ◽

pp. 356-371 ◽

Cited By ~ 7

Author(s):

András Bárdossy ◽

Shailesh Kumar Singh

Keyword(s):

Hydrological Model ◽

Data Depth ◽

Model Parameters ◽

Hydrological Models ◽

Rainfall Runoff ◽

Catchment Characteristics ◽

Rainfall Runoff Model ◽

Using Data ◽

Runoff Model

The parameters of hydrological models with no or short discharge records can only be estimated using regional information. We can assume that catchments with similar characteristics show a similar hydrological behaviour. A regionalization of hydrological model parameters on the basis of catchment characteristics is therefore plausible. However, due to the non-uniqueness of the rainfall/runoff model parameters (equifinality), a procedure of a regional parameter estimation by model calibration and a subsequent fit of a regional function is not appropriate. In this paper, a different procedure based on the depth function and convex combinations of model parameters is introduced. Catchment characteristics to be used for regionalization can be identified by the same procedure. Regionalization is then performed using different approaches: multiple linear regression using the deepest parameter sets and convex combinations. The assessment of the quality of the regionalized models is also discussed. An example of 28 British catchments illustrates the methodology.

Download Full-text

Data envelopment analysis: Ranking units using data depth measures

10.1063/1.5044127 ◽

2018 ◽

Author(s):

Onřej Sokol ◽

Miroslav Rada

Keyword(s):

Data Envelopment Analysis ◽

Data Depth ◽

Data Envelopment ◽

Using Data

Download Full-text

Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth

Data Science and Engineering ◽

10.1007/s41019-019-0091-y ◽

2019 ◽

Vol 4 (2) ◽

pp. 132-140 ◽

Cited By ~ 15

Author(s):

Channamma Patil ◽

Ishwar Baidari

Keyword(s):

Optimal Number ◽

Data Depth ◽

Number Of Clusters ◽

Using Data ◽

Optimal Number Of Clusters

Download Full-text

The Spirit of Discovery: The Digital Roots of Integers

Mathematics Teacher ◽

10.5951/mt.101.5.0379 ◽

2007 ◽

Vol 101 (5) ◽

pp. 379-383

Author(s):

Eric Milou ◽

Jay Schiffman

Keyword(s):

Data Analysis ◽

Inservice Teachers ◽

Scatter Plots ◽

Using Data ◽

Discovery Method ◽

Positive Integers

In many mathematics classes, students are asked to learn via the discovery method, in the hope that the intrinsic beauty of mathematics becomes more accessible and that making conjectures, forming hypotheses, and analyzing patterns will help them compute fluently and solve problems creatively and resourcefully (NCTM 2000). The activity discussed in this article was conducted with a group of preservice and inservice teachers, and the objectives included examining patterns, making conjectures, and using data analysis to construct scatter plots and tables, all in the spirit of discovering mathematics. This activity is based on a concept called the multiplicative digital root of an integer (Sloane 1973). Here we take the term integers, unless otherwise qualified, to mean positive integers.

Download Full-text