New Nonparametric Tests of Multivariate Locations and Scales Using Data Depth

AbstractAn important problem in many domains is to predict how a system will respond to interventions. This task is inherently linked to estimating the system’s underlying causal structure. To this end, Invariant Causal Prediction (ICP) [1] has been proposed which learns a causal model exploiting the invariance of causal relations using data from different environments. When considering linear models, the implementation of ICP is relatively straightforward. However, the nonlinear case is more challenging due to the difficulty of performing nonparametric tests for conditional independence.In this work, we present and evaluate an array of methods for nonlinear and nonparametric versions of ICP for learning the causal parents of given target variables. We find that an approach which first fits a nonlinear model with data pooled over all environments and then tests for differences between the residual distributions across environments is quite robust across a large variety of simulation settings. We call this procedure “invariant residual distribution test”. In general, we observe that the performance of all approaches is critically dependent on the true (unknown) causal structure and it becomes challenging to achieve high power if the parental set includes more than two variables.As a real-world example, we consider fertility rate modeling which is central to world population projections. We explore predicting the effect of hypothetical interventions using the accepted models from nonlinear ICP. The results reaffirm the previously observed central causal role of child mortality rates.

Download Full-text

Power comparison of data depth-based nonparametric tests for testing equality of locations

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2016.1269329 ◽

2016 ◽

Vol 87 (8) ◽

pp. 1489-1497 ◽

Cited By ~ 4

Author(s):

D. T. Shirke ◽

S. D. Khorate

Keyword(s):

Nonparametric Tests ◽

Data Depth ◽

Power Comparison

Download Full-text

Nonparametric Tests for Homogeneity of Species Assemblages: A Data Depth Approach

Biometrics ◽

10.1111/j.1541-0420.2011.01573.x ◽

2011 ◽

Vol 67 (4) ◽

pp. 1481-1488 ◽

Cited By ~ 8

Author(s):

Jun Li ◽

Jifei Ban ◽

Louis S. Santiago

Keyword(s):

Nonparametric Tests ◽

Data Depth ◽

Species Assemblages

Download Full-text

Observational Uncertainty in Hydrological Modelling using Data Depth

Global NEST Journal ◽

10.30955/gnj.002354 ◽

2017 ◽

Vol 19 (3) ◽

pp. 489-497

Keyword(s):

Hydrological Model ◽

Meteorological Data ◽

River Basin Management ◽

Hydrological Modelling ◽

Data Depth ◽

Basin Management ◽

Flow Generation ◽

Observational Uncertainty ◽

Using Data ◽

Quantifying Uncertainty

For any river basin management, one needs tools to predict runoff at different time and spatial resolutions. Hydrological models are tools which account for the storage, flow of water and water balance in a watershed, which include exchanges of water and energy within the earth, atmosphere and oceans and utilise metrological data to generate flow. There are several sources of error in meteorological data, namely, through measurement at point level, interpolation, etc. When an erroneous input is passed to a model, one cannot expect an error free output from the prediction. Every prediction is associated with uncertainty. Quantification of these uncertainties is of prime importance in real world forecasting. In this study, an attempt has been made to study uncertainty associated with hydrological modelling, using the idea of data depth. To see the effect of uncertainty in rainfall on flow generation through a model, the input to a model was altered by adding an error and a different realisation was made. A Monte Carlo simulation generated a large number of hydrological model parameter sets drawn from the uniform distribution. The model was run using these parameters for each realisation of the rainfall. The parameters which are good for different realisations are more likely to be good parameters sets. For each parameter set, data depth was calculated and a likelihood was assigned to each parameter set based on the depth values. Based on this, the frequency distribution of the likelihood was analysed as well. The results show that uncertainty in hydrological modelling are multiplicative. The proposed methodology to assign prediction uncertainty is demonstrated using the ‘TopNet’ model for the Waipara river catchment located in the central east of the South Island, New Zealand. The results of this study will be helpful in calibration of hydrological model and in quantifying uncertainty in the prediction.

Download Full-text

Nonparametric tests for multivariate multi-sample locations based on data depth

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2019.1590577 ◽

2019 ◽

Vol 89 (9) ◽

pp. 1574-1591

Author(s):

Somanath D. Pawar ◽

Digambar T. Shirke

Keyword(s):

Nonparametric Tests ◽

Data Depth

Download Full-text

Testing the Missing Mechanism of Demographic and Health Variables in the Health and Retirement Study

Innovation in Aging ◽

10.1093/geroni/igaa057.1644 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 509-509

Author(s):

Peiyi Lu ◽

Mack Shelley

Keyword(s):

Missing Data ◽

Missing Values ◽

Auxiliary Information ◽

Missing At Random ◽

Nonparametric Tests ◽

Drop Out ◽

Health And Retirement Study ◽

Statistical Justification ◽

Using Data ◽

Retirement Study

Abstract Studies using data from longitudinal health survey of older adults usually assumed the data were missing completely at random (MCAR) or missing at random (MAR). Thus subsequent analyses used multiple imputation or likelihood-based method to handle missing data. However, little existing research actually examines whether the data met the MCAR/MAR assumptions before performing data analyses. This study first summarized the commonly used statistical methods to test missing mechanism and discussed their application conditions. Then using two-wave longitudinal data from the Health and Retirement Study (HRS; wave 2014-2015 and wave 2016-2017; N=18,747), this study applied different approaches to test the missing mechanism of several demographic and health variables. These approaches included Little’s test, logistic regression method, nonparametric tests, false discovery rate, and others. Results indicated the data did not meet the MCAR assumption even though they had a very low rate of missing values. Demographic variables provided good auxiliary information for health variables. Health measures (e.g., self-reported health, activity of daily life, depressive symptoms) met the MAR assumptions. Older respondents could drop out and die in the longitudinal survey, but attrition did not significantly affect the MAR assumption. Our findings supported the MAR assumptions for the demographic and health variables in HRS, and therefore provided statistical justification to HRS researchers about using imputation or likelihood-based methods to deal with missing data. However, researchers are strongly encouraged to test the missing mechanism of the specific variables/data they choose when using a new dataset.

Download Full-text

Non-convex penalized multitask regression using data depth-based penalties

Stat ◽

10.1002/sta4.174 ◽

2018 ◽

Vol 7 (1) ◽

pp. e174 ◽

Cited By ~ 1

Author(s):

Subhabrata Majumdar ◽

Snigdhansu Chatterjee

Keyword(s):

Data Depth ◽

Using Data

Download Full-text

Regionalization of hydrological model parameters using data depth

Hydrology Research ◽

10.2166/nh.2011.031 ◽

2011 ◽

Vol 42 (5) ◽

pp. 356-371 ◽

Cited By ~ 7

Author(s):

András Bárdossy ◽

Shailesh Kumar Singh

Keyword(s):

Hydrological Model ◽

Data Depth ◽

Model Parameters ◽

Hydrological Models ◽

Rainfall Runoff ◽

Catchment Characteristics ◽

Rainfall Runoff Model ◽

Using Data ◽

Runoff Model

The parameters of hydrological models with no or short discharge records can only be estimated using regional information. We can assume that catchments with similar characteristics show a similar hydrological behaviour. A regionalization of hydrological model parameters on the basis of catchment characteristics is therefore plausible. However, due to the non-uniqueness of the rainfall/runoff model parameters (equifinality), a procedure of a regional parameter estimation by model calibration and a subsequent fit of a regional function is not appropriate. In this paper, a different procedure based on the depth function and convex combinations of model parameters is introduced. Catchment characteristics to be used for regionalization can be identified by the same procedure. Regionalization is then performed using different approaches: multiple linear regression using the deepest parameter sets and convex combinations. The assessment of the quality of the regionalized models is also discussed. An example of 28 British catchments illustrates the methodology.

Download Full-text