scholarly journals Effect of limited data sets in evaluating the scaling properties of spatially distributed data: an example from mining-induced seismic activity

1996 ◽  
Vol 124 (3) ◽  
pp. 773-786 ◽  
Author(s):  
Mariana Eneva
Author(s):  
Neal Jean ◽  
Sherrie Wang ◽  
Anshul Samar ◽  
George Azzari ◽  
David Lobell ◽  
...  

Geospatial analysis lacks methods like the word vector representations and pre-trained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec, an unsupervised representation learning algorithm that extends the distributional hypothesis from natural language — words appearing in similar contexts tend to have similar meanings — to spatially distributed data. We demonstrate empirically that Tile2Vec learns semantically meaningful representations for both image and non-image datasets. Our learned representations significantly improve performance in downstream classification tasks and, similarly to word vectors, allow visual analogies to be obtained via simple arithmetic in the latent space.


1994 ◽  
Vol 1 (2/3) ◽  
pp. 182-190 ◽  
Author(s):  
M. Eneva

Abstract. Using finite data sets and limited size of study volumes may result in significant spurious effects when estimating the scaling properties of various physical processes. These effects are examined with an example featuring the spatial distribution of induced seismic activity in Creighton Mine (northern Ontario, Canada). The events studied in the present work occurred during a three-month period, March-May 1992, within a volume of approximate size 400 x 400 x 180 m3. Two sets of microearthquake locations are studied: Data Set 1 (14,338 events) and Data Set 2 (1654 events). Data Set 1 includes the more accurately located events and amounts to about 30 per cent of all recorded data. Data Set 2 represents a portion of the first data set that is formed by the most accurately located and the strongest microearthquakes. The spatial distribution of events in the two data sets is examined for scaling behaviour using the method of generalized correlation integrals featuring various moments q. From these, generalized correlation dimensions are estimated using the slope method. Similar estimates are made for randomly generated point sets using the same numbers of events and the same study volumes as for the real data. Uniform and monofractal random distributions are used for these simulations. In addition, samples from the real data are randomly extracted and the dimension spectra for these are examined as well. The spectra for the uniform and monofractal random generations show spurious multifractality due only to the use of finite numbers of data points and limited size of study volume. Comparing these with the spectra of dimensions for Data Set 1 and Data Set 2 allows us to estimate the bias likely to be present in the estimates for the real data. The strong multifractality suggested by the spectrum for Data Set 2 appears to be largely spurious; the spatial distribution, while different from uniform, could originate from a monofractal process. The spatial distribution of microearthquakes in Data Set 1 is either monofractal as well, or only weakly multifractal. In all similar studies, comparisons of result from real data and simulated point sets may help distinguish between genuine and artificial multifractality, without necessarily resorting to large number of data.


Toxics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 297
Author(s):  
Alina Barbulescu ◽  
Lucica Barbes ◽  
Cristian Stefan Dumitriu

Water quality is continuously affected by anthropogenic and environmental conditions. A significant issue of the Indian rivers is the massive water pollution, leading to the spreading of different diseases due to its daily use. Therefore, this study investigates three aspects. The first one is testing the hypothesis of the existence of a monotonic trend of the series of eight water parameters of the Brahmaputra River recorded for 17 years at ten hydrological stations. When this hypothesis was rejected, a loess trend was fitted. The second aspect is to assess the water quality using three indicators (WQI)–CCME WQI, British Colombia, and a weighted index. The third aspect is to group the years and the stations in clusters used to determine the regional (spatial) and temporal trend of the WQI series, utilizing a new algorithm. A statistical analysis does not reject the hypothesis of a monotonic trend presence for the spatially distributed data but not for the temporal ones. Hierarchical clustering based on the computed WQIs detected two clusters for the spatially distributed data and two for the temporal-distributed data. The procedure proposed for determining the WQI temporal and regional evolution provided good results in terms of mean absolute error, root mean squared error (RMSE), and mean absolute percentage error (MAPE).


Sign in / Sign up

Export Citation Format

Share Document