scholarly journals Towards Geostatistical Learning for the Geosciences: A Case Study in Improving the Spatial Awareness of Spectral Clustering

2020 ◽  
Vol 52 (8) ◽  
pp. 1035-1048
Author(s):  
H. Talebi ◽  
L. J. M. Peeters ◽  
U. Mueller ◽  
R. Tolosana-Delgado ◽  
K. G. van den Boogaart

AbstractThe particularities of geosystems and geoscience data must be understood before any development or implementation of statistical learning algorithms. Without such knowledge, the predictions and inferences may not be accurate and physically consistent. Accuracy, transparency and interpretability, credibility, and physical realism are minimum criteria for statistical learning algorithms when applied to the geosciences. This study briefly reviews several characteristics of geoscience data and challenges for novel statistical learning algorithms. A novel spatial spectral clustering approach is introduced to illustrate how statistical learners can be adapted for modelling geoscience data. The spatial awareness and physical realism of the spectral clustering are improved by utilising a dissimilarity matrix based on nonparametric higher-order spatial statistics. The proposed model-free technique can identify meaningful spatial clusters (i.e. meaningful geographical subregions) from multivariate spatial data at different scales without the need to define a model of co-dependence. Several mixed (e.g. continuous and categorical) variables can be used as inputs to the proposed clustering technique. The proposed technique is illustrated using synthetic and real mining datasets. The results of the case studies confirm the usefulness of the proposed method for modelling spatial data.

Stats ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 1-11
Author(s):  
Felix Mbuga ◽  
Cristina Tortora

Cluster analysis seeks to assign objects with similar characteristics into groups called clusters so that objects within a group are similar to each other and dissimilar to objects in other groups. Spectral clustering has been shown to perform well in different scenarios on continuous data: it can detect convex and non-convex clusters, and can detect overlapping clusters. However, the constraint on continuous data can be limiting in real applications where data are often of mixed-type, i.e., data that contains both continuous and categorical features. This paper looks at extending spectral clustering to mixed-type data. The new method replaces the Euclidean-based similarity distance used in conventional spectral clustering with different dissimilarity measures for continuous and categorical variables. A global dissimilarity measure is than computed using a weighted sum, and a Gaussian kernel is used to convert the dissimilarity matrix into a similarity matrix. The new method includes an automatic tuning of the variable weight and kernel parameter. The performance of spectral clustering in different scenarios is compared with that of two state-of-the-art mixed-type data clustering methods, k-prototypes and KAMILA, using several simulated and real data sets.


2021 ◽  
Vol 298 ◽  
pp. 117164
Author(s):  
Marco Biemann ◽  
Fabian Scheller ◽  
Xiufeng Liu ◽  
Lizhen Huang

1983 ◽  
Vol 15 (6) ◽  
pp. 801-813 ◽  
Author(s):  
B Fingleton

Log-linear models are an appropriate means of determining the magnitude and direction of interactions between categorical variables that in common with other statistical models assume independent observations. Spatial data are often dependent rather than independent and thus the analysis of spatial data by log-linear models may erroneously detect interactions between variables that are spurious and are the consequence of pairwise correlations between observations. A procedure is described in this paper to accommodate these effects that requires only very minimal assumptions about the nature of the autocorrelation process given systematic sampling at intersection points on a square lattice.


2020 ◽  
Vol 34 (04) ◽  
pp. 3316-3323
Author(s):  
Qingpeng Cai ◽  
Ling Pan ◽  
Pingzhong Tang

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.


2019 ◽  
Vol 406 ◽  
pp. 109-120 ◽  
Author(s):  
Patrick Schratz ◽  
Jannes Muenchow ◽  
Eugenia Iturritxa ◽  
Jakob Richter ◽  
Alexander Brenning

2016 ◽  
Vol 2 (2) ◽  
pp. 383-389 ◽  
Author(s):  
Benjamin F. Trueman ◽  
Sean A. MacIsaac ◽  
Amina K. Stoddart ◽  
Graham A. Gagnon

Fluorescence spectroscopy has potential applications for monitoring disinfection by-products (DBPs) during water treatment. This paper demonstrates the novel application of several statistical learning algorithms for fluorescence-based DBP prediction.


Sign in / Sign up

Export Citation Format

Share Document