Effect of limited data sets in evaluating the scaling properties of spatially distributed data: an example from mining-induced seismic activity

Geospatial analysis lacks methods like the word vector representations and pre-trained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec, an unsupervised representation learning algorithm that extends the distributional hypothesis from natural language — words appearing in similar contexts tend to have similar meanings — to spatially distributed data. We demonstrate empirically that Tile2Vec learns semantically meaningful representations for both image and non-image datasets. Our learned representations significantly improve performance in downstream classification tasks and, similarly to word vectors, allow visual analogies to be obtained via simple arithmetic in the latent space.

Download Full-text

Monofractal or multifractal: a case study of spatial distribution of mining-induced seismic activity

Nonlinear Processes in Geophysics ◽

10.5194/npg-1-182-1994 ◽

1994 ◽

Vol 1 (2/3) ◽

pp. 182-190 ◽

Cited By ~ 16

Author(s):

M. Eneva

Keyword(s):

Spatial Distribution ◽

Seismic Activity ◽

Real Data ◽

Data Sets ◽

Point Sets ◽

Data Set ◽

Limited Size ◽

The Real ◽

Induced Seismic Activity ◽

Generalized Correlation

Abstract. Using finite data sets and limited size of study volumes may result in significant spurious effects when estimating the scaling properties of various physical processes. These effects are examined with an example featuring the spatial distribution of induced seismic activity in Creighton Mine (northern Ontario, Canada). The events studied in the present work occurred during a three-month period, March-May 1992, within a volume of approximate size 400 x 400 x 180 m3. Two sets of microearthquake locations are studied: Data Set 1 (14,338 events) and Data Set 2 (1654 events). Data Set 1 includes the more accurately located events and amounts to about 30 per cent of all recorded data. Data Set 2 represents a portion of the first data set that is formed by the most accurately located and the strongest microearthquakes. The spatial distribution of events in the two data sets is examined for scaling behaviour using the method of generalized correlation integrals featuring various moments q. From these, generalized correlation dimensions are estimated using the slope method. Similar estimates are made for randomly generated point sets using the same numbers of events and the same study volumes as for the real data. Uniform and monofractal random distributions are used for these simulations. In addition, samples from the real data are randomly extracted and the dimension spectra for these are examined as well. The spectra for the uniform and monofractal random generations show spurious multifractality due only to the use of finite numbers of data points and limited size of study volume. Comparing these with the spectra of dimensions for Data Set 1 and Data Set 2 allows us to estimate the bias likely to be present in the estimates for the real data. The strong multifractality suggested by the spectrum for Data Set 2 appears to be largely spurious; the spatial distribution, while different from uniform, could originate from a monofractal process. The spatial distribution of microearthquakes in Data Set 1 is either monofractal as well, or only weakly multifractal. In all similar studies, comparisons of result from real data and simulated point sets may help distinguish between genuine and artificial multifractality, without necessarily resorting to large number of data.

Download Full-text

SPIDER—an interactive statistical tool for the analysis of spatially distributed data

International Journal of Geographical Information Systems ◽

10.1080/02693799008941547 ◽

1990 ◽

Vol 4 (3) ◽

pp. 285-296 ◽

Cited By ~ 60

Author(s):

JOHN HASLETT ◽

GRAHAM WILLS ◽

ANTONY UNWIN

Keyword(s):

Distributed Data ◽

Statistical Tool ◽

Spatially Distributed ◽

Spatially Distributed Data

Download Full-text

Linear Models with Spatially Distributed Data

Sociological Methods & Research ◽

10.1177/004912418000900102 ◽

1980 ◽

Vol 9 (1) ◽

pp. 29-60 ◽

Cited By ~ 116

Author(s):

Patrick Doreian

Keyword(s):

Linear Models ◽

Distributed Data ◽

Spatially Distributed ◽

Spatially Distributed Data

Download Full-text

Description of a computer program for analyzing multivariate spatially distributed data

Computers & Geosciences ◽

10.1016/0098-3004(89)90025-3 ◽

1989 ◽

Vol 15 (4) ◽

pp. 593-598 ◽

Cited By ~ 13

Author(s):

Hans Wackernagel

Keyword(s):

Computer Program ◽

Distributed Data ◽

Spatially Distributed ◽

Spatially Distributed Data

Download Full-text

Estimation of dimension for spatially distributed data and related limit theorems

Journal of Multivariate Analysis ◽

10.1016/0047-259x(89)90100-0 ◽

1989 ◽

Vol 28 (1) ◽

pp. 115-148 ◽

Cited By ~ 22

Author(s):

C.D Cutler ◽

D.A Dawson

Keyword(s):

Limit Theorems ◽

Distributed Data ◽

Spatially Distributed ◽

Spatially Distributed Data

Download Full-text

Assessing the Water Pollution of the Brahmaputra River Using Water Quality Indexes

Toxics ◽

10.3390/toxics9110297 ◽

2021 ◽

Vol 9 (11) ◽

pp. 297

Author(s):

Alina Barbulescu ◽

Lucica Barbes ◽

Cristian Stefan Dumitriu

Keyword(s):

Water Pollution ◽

Water Quality ◽

Mean Squared Error ◽

Percentage Error ◽

Distributed Data ◽

Brahmaputra River ◽

Monotonic Trend ◽

Spatially Distributed ◽

British Colombia ◽

Spatially Distributed Data

Water quality is continuously affected by anthropogenic and environmental conditions. A significant issue of the Indian rivers is the massive water pollution, leading to the spreading of different diseases due to its daily use. Therefore, this study investigates three aspects. The first one is testing the hypothesis of the existence of a monotonic trend of the series of eight water parameters of the Brahmaputra River recorded for 17 years at ten hydrological stations. When this hypothesis was rejected, a loess trend was fitted. The second aspect is to assess the water quality using three indicators (WQI)–CCME WQI, British Colombia, and a weighted index. The third aspect is to group the years and the stations in clusters used to determine the regional (spatial) and temporal trend of the WQI series, utilizing a new algorithm. A statistical analysis does not reject the hypothesis of a monotonic trend presence for the spatially distributed data but not for the temporal ones. Hierarchical clustering based on the computed WQIs detected two clusters for the spatially distributed data and two for the temporal-distributed data. The procedure proposed for determining the WQI temporal and regional evolution provided good results in terms of mean absolute error, root mean squared error (RMSE), and mean absolute percentage error (MAPE).

Download Full-text

Integration of spatially distributed data for the 3-D modelling of a kinematic phenomenon in Italy

10.5242/iamg.2011.0185 ◽

2011 ◽

Cited By ~ 1

Author(s):

Roberto PASSALACQUA ◽

Rossella BOVOLENTA

Keyword(s):

Distributed Data ◽

Spatially Distributed ◽

Spatially Distributed Data

Download Full-text