Spatial machine learning: new opportunities for regional science

The Annals of Regional Science ◽

10.1007/s00168-021-01101-x ◽

2021 ◽

Author(s):

Katarzyna Kopczewska

Keyword(s):

Machine Learning ◽

Spatial Econometrics ◽

Spatial Data ◽

Quantitative Methods ◽

Cross Validation ◽

Spatial Clustering ◽

Regional Science ◽

Fine Tuning ◽

Clustering Methods ◽

Spatial Data Integration

AbstractThis paper is a methodological guide to using machine learning in the spatial context. It provides an overview of the existing spatial toolbox proposed in the literature: unsupervised learning, which deals with clustering of spatial data, and supervised learning, which displaces classical spatial econometrics. It shows the potential of using this developing methodology, as well as its pitfalls. It catalogues and comments on the usage of spatial clustering methods (for locations and values, both separately and jointly) for mapping, bootstrapping, cross-validation, GWR modelling and density indicators. It provides details of spatial machine learning models, which are combined with spatial data integration, modelling, model fine-tuning and predictions to deal with spatial autocorrelation and big data. The paper delineates “already available” and “forthcoming” methods and gives inspiration for transplanting modern quantitative methods from other thematic areas to research in regional science.

Download Full-text

Based on GIS Spatial Clustering Algorithm Research

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.971-973.1565 ◽

2014 ◽

Vol 971-973 ◽

pp. 1565-1568

Author(s):

Zhi Yong Wang

Keyword(s):

Objective Function ◽

Spatial Data ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Direct Access ◽

Cluster Center ◽

Clustering Methods ◽

Concept Clustering ◽

Spatial Data Management ◽

Calculated Distance

Facing the particularity of the current limitations and spatial clustering clustering methods, the objective function from concept clustering starting to GIS spatial data management and spatial analysis for technical support, explores the space between the sample direct access to the distance calculated distance and indirect reach up costs. K samples randomly selected as the cluster center, with space for the sample to reach the center of each cluster sample is divided according to the distance, the sum of the spatial clustering center of the sample to reach its cost objective function for clustering, introduction of genetic algorithm, a spatial clustering algorithm based on GIS. Finally, the algorithm is tested by examples.

Download Full-text

Statistical Methods in Spatial Epidemiology

10.1093/oso/9780190843496.003.0004 ◽

2018 ◽

Author(s):

Samson Y. Gebreab

Keyword(s):

Spatial Heterogeneity ◽

Spatial Data ◽

Spatial Epidemiology ◽

Spatial Clustering ◽

Spatial Models ◽

Scan Statistics ◽

Spatial Dependency ◽

Neighborhood Characteristics ◽

Spatial Effects ◽

Clustering Methods

Most studies evaluating relationships between neighborhood characteristics and health neglect to examine and account for the spatial dependency across neighborhoods, that is, how neighboring areas are related to each other, although the possible presence of spatial effects (e.g., spatial dependency, spatial heterogeneity) can potentially influence the results in substantial ways. This chapter first discusses the concept of spatial autocorrelation and then provides an overview of different spatial clustering methods, including Moran’s I and spatial scan statistics as well as different models to map spatial data, for example, spatial Bayesian mapping. Next, this chapter discusses various spatial regression methods used in spatial epidemiology for accounting spatial dependency and/or spatial heterogeneity in modeling the relationships between neighborhood characteristics and health outcomes, including spatial econometric models, Bayesian spatial models, and multilevel spatial models.

Download Full-text

Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data

Mathematical Problems in Engineering ◽

10.1155/2016/1564516 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Arvind Sharma ◽

R. K. Gupta ◽

Akhilesh Tiwari

Keyword(s):

Data Mining ◽

Information System ◽

Geographic Information System ◽

Spatial Data ◽

Spatial Clustering ◽

Spatial Data Mining ◽

Geographic Information ◽

Data Sets ◽

Clustering Methods ◽

Data Objects

There are many techniques available in the field of data mining and its subfield spatial data mining is to understand relationships between data objects. Data objects related with spatial features are called spatial databases. These relationships can be used for prediction and trend detection between spatial and nonspatial objects for social and scientific reasons. A huge data set may be collected from different sources as satellite images, X-rays, medical images, traffic cameras, and GIS system. To handle this large amount of data and set relationship between them in a certain manner with certain results is our primary purpose of this paper. This paper gives a complete process to understand how spatial data is different from other kinds of data sets and how it is refined to apply to get useful results and set trends to predict geographic information system and spatial data mining process. In this paper a new improved algorithm for clustering is designed because role of clustering is very indispensable in spatial data mining process. Clustering methods are useful in various fields of human life such as GIS (Geographic Information System), GPS (Global Positioning System), weather forecasting, air traffic controller, water treatment, area selection, cost estimation, planning of rural and urban areas, remote sensing, and VLSI designing. This paper presents study of various clustering methods and algorithms and an improved algorithm of DBSCAN as IDBSCAN (Improved Density Based Spatial Clustering of Application of Noise). The algorithm is designed by addition of some important attributes which are responsible for generation of better clusters from existing data sets in comparison of other methods.

Download Full-text

A method for efficient clustering of spatial data in network space

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202806 ◽

2021 ◽

pp. 1-18

Author(s):

Trang T.D. Nguyen ◽

Loan T.T. Nguyen ◽

Anh Nguyen ◽

Unil Yun ◽

Bay Vo

Keyword(s):

Spatial Data ◽

Euclidean Distance ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Spatial Data Mining ◽

Spatial Data Analysis ◽

Clustering Methods ◽

Dbscan Algorithm ◽

Density Based Clustering ◽

Network Space

Spatial clustering is one of the main techniques for spatial data mining and spatial data analysis. However, existing spatial clustering methods primarily focus on points distributed in planar space with the Euclidean distance measurement. Recently, NS-DBSCAN has been developed to perform clustering of spatial point events in Network Space based on a well-known clustering algorithm, named Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The NS-DBSCAN algorithm has efficiently solved the problem of clustering network constrained spatial points. When compared to the NC_DT (Network-Constraint Delaunay Triangulation) clustering algorithm, the NS-DBSCAN algorithm efficiently solves the problem of clustering network constrained spatial points by visualizing the intrinsic clustering structure of spatial data by constructing density ordering charts. However, the main drawback of this algorithm is when the data are processed, objects that are not specifically categorized into types of clusters cannot be removed, which is undeniably a waste of time, particularly when the dataset is large. In an attempt to have this algorithm work with great efficiency, we thus recommend removing edges that are longer than the threshold and eliminating low-density points from the density ordering table when forming clusters and also take other effective techniques into consideration. In this paper, we develop a theorem to determine the maximum length of an edge in a road segment. Based on this theorem, an algorithm is proposed to greatly improve the performance of the density-based clustering algorithm in network space (NS-DBSCAN). Experiments using our proposed algorithm carried out in collaboration with Ho Chi Minh City, Vietnam yield the same results but shows an advantage of it over NS-DBSCAN in execution time.

Download Full-text

Application of artificially intelligent systems for the identification of discrete fossiliferous levels

PeerJ ◽

10.7717/peerj.8767 ◽

2020 ◽

Vol 8 ◽

pp. e8767

Author(s):

David M. Martín-Perea ◽

Lloyd A. Courtenay ◽

M. Soledad Domingo ◽

Jorge Morales

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Intelligent Systems ◽

Machine Learning Algorithms ◽

Fine Tuning ◽

Pattern Recognition Techniques ◽

Qualitative Approaches ◽

Density Based Clustering ◽

Quantitative Manner ◽

Collaborative Intelligence

The separation of discrete fossiliferous levels within an archaeological or paleontological site with no clear stratigraphic horizons has historically been carried out using qualitative approaches, relying on two-dimensional transversal and longitudinal projection planes. Analyses of this type, however, can often be conditioned by subjectivity based on the perspective of the analyst. This study presents a novel use of Machine Learning algorithms for pattern recognition techniques in the automated separation and identification of fossiliferous levels. This approach can be divided into three main steps including: (1) unsupervised Machine Learning for density based clustering (2) expert-in-the-loop Collaborative Intelligence Learning for the integration of geological data followed by (3) supervised learning for the final fine-tuning of fossiliferous level models. For evaluation of these techniques, this method was tested in two Late Miocene sites of the Batallones Butte paleontological complex (Madrid, Spain). Here we show Machine Learning analyses to be a valuable tool for the processing of spatial data in an efficient and quantitative manner, successfully identifying the presence of discrete fossiliferous levels in both Batallones-3 and Batallones-10. Three discrete fossiliferous levels have been identified in Batallones-3, whereas another three have been differentiated in Batallones-10.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Mapping the Groundwater Level and Soil Moisture of a Montane Peat Bog Using UAV Monitoring and Machine Learning

Remote Sensing ◽

10.3390/rs13050907 ◽

2021 ◽

Vol 13 (5) ◽

pp. 907

Author(s):

Theodora Lendzioch ◽

Jakub Langhammer ◽

Lukáš Vlček ◽

Robert Minařík

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Spatial Data ◽

Groundwater Level ◽

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Sampling Strategy ◽

Peat Bog ◽

Strong Impact ◽

Ground Truth Data

One of the best preconditions for the sufficient monitoring of peat bog ecosystems is the collection, processing, and analysis of unique spatial data to understand peat bog dynamics. Over two seasons, we sampled groundwater level (GWL) and soil moisture (SM) ground truth data at two diverse locations at the Rokytka Peat bog within the Sumava Mountains, Czechia. These data served as reference data and were modeled with a suite of potential variables derived from digital surface models (DSMs) and RGB, multispectral, and thermal orthoimages reflecting topomorphometry, vegetation, and surface temperature information generated from drone mapping. We used 34 predictors to feed the random forest (RF) algorithm. The predictor selection, hyperparameter tuning, and performance assessment were performed with the target-oriented leave-location-out (LLO) spatial cross-validation (CV) strategy combined with forward feature selection (FFS) to avoid overfitting and to predict on unknown locations. The spatial CV performance statistics showed low (R2 = 0.12) to high (R2 = 0.78) model predictions. The predictor importance was used for model interpretation, where temperature had strong impact on GWL and SM, and we found significant contributions of other predictors, such as Normalized Difference Vegetation Index (NDVI), Normalized Difference Index (NDI), Enhanced Red-Green-Blue Vegetation Index (ERGBVE), Shape Index (SHP), Green Leaf Index (GLI), Brightness Index (BI), Coloration Index (CI), Redness Index (RI), Primary Colours Hue Index (HI), Overall Hue Index (HUE), SAGA Wetness Index (TWI), Plan Curvature (PlnCurv), Topographic Position Index (TPI), and Vector Ruggedness Measure (VRM). Additionally, we estimated the area of applicability (AOA) by presenting maps where the prediction model yielded high-quality results and where predictions were highly uncertain because machine learning (ML) models make predictions far beyond sampling locations without sampling data with no knowledge about these environments. The AOA method is well suited and unique for planning and decision-making about the best sampling strategy, most notably with limited data.

Download Full-text

Development of Machine Learning Models to Predict Compressed Sward Height in Walloon Pastures Based on Sentinel-1, Sentinel-2 and Meteorological Data Using Multiple Data Transformations

Remote Sensing ◽

10.3390/rs13030408 ◽

2021 ◽

Vol 13 (3) ◽

pp. 408

Author(s):

Charles Nickmilder ◽

Anthony Tedde ◽

Isabelle Dufrasne ◽

Françoise Lessire ◽

Bernard Tychon ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Support System ◽

Cross Validation ◽

Learning Models ◽

Data Transformations ◽

Independent Validation ◽

Sward Height ◽

Machine Learning Models ◽

Sentinel 2

Accurate information about the available standing biomass on pastures is critical for the adequate management of grazing and its promotion to farmers. In this paper, machine learning models are developed to predict available biomass expressed as compressed sward height (CSH) from readily accessible meteorological, optical (Sentinel-2) and radar satellite data (Sentinel-1). This study assumed that combining heterogeneous data sources, data transformations and machine learning methods would improve the robustness and the accuracy of the developed models. A total of 72,795 records of CSH with a spatial positioning, collected in 2018 and 2019, were used and aggregated according to a pixel-like pattern. The resulting dataset was split into a training one with 11,625 pixellated records and an independent validation one with 4952 pixellated records. The models were trained with a 19-fold cross-validation. A wide range of performances was observed (with mean root mean square error (RMSE) of cross-validation ranging from 22.84 mm of CSH to infinite-like values), and the four best-performing models were a cubist, a glmnet, a neural network and a random forest. These models had an RMSE of independent validation lower than 20 mm of CSH at the pixel-level. To simulate the behavior of the model in a decision support system, performances at the paddock level were also studied. These were computed according to two scenarios: either the predictions were made at a sub-parcel level and then aggregated, or the data were aggregated at the parcel level and the predictions were made for these aggregated data. The results obtained in this study were more accurate than those found in the literature concerning pasture budgeting and grassland biomass evaluation. The training of the 124 models resulting from the described framework was part of the realization of a decision support system to help farmers in their daily decision making.

Download Full-text

Smart Design Nano-Hybrid Formulations by Machine Learning

Proceedings ◽

10.3390/iecp2020-08700 ◽

2020 ◽

Vol 78 (1) ◽

pp. 5

Author(s):

Raquel de Melo Barbosa ◽

Fabio Fonseca de Oliveira ◽

Gabriel Bezerra Motta Câmara ◽

Tulio Flavio Accioly de Lima e Moura ◽

Fernanda Nervo Raffin ◽

...

Keyword(s):

Machine Learning ◽

Experimental Data ◽

Water Solubility ◽

Inorganic Materials ◽

Fine Tuning ◽

Support Vector ◽

Drug Solubility ◽

Physical Behavior ◽

Best Fit

Nano-hybrid formulations combine organic and inorganic materials in self-assembled platforms for drug delivery. Laponite is a synthetic clay, biocompatible, and a guest of compounds. Poloxamines are amphiphilic four-armed compounds and have pH-sensitive and thermosensitive properties. The association of Laponite and Poloxamine can be used to improve attachment to drugs and to increase the solubility of β-Lapachone (β-Lap). β-Lap has antiviral, antiparasitic, antitumor, and anti-inflammatory properties. However, the low water solubility of β-Lap limits its clinical and medical applications. All samples were prepared by mixing Tetronic 1304 and LAP in a range of 1–20% (w/w) and 0–3% (w/w), respectively. The β-Lap solubility was analyzed by UV-vis spectrophotometry, and physical behavior was evaluated across a range of temperatures. The analysis of data consisted of response surface methodology (RMS), and two kinds of machine learning (ML): multilayer perceptron (MLP) and support vector machine (SVM). The ML techniques, generated from a training process based on experimental data, obtained the best correlation coefficient adjustment for drug solubility and adequate physical classifications of the systems. The SVM method presented the best fit results of β-Lap solubilization. In silico tools promoted fine-tuning, and near-experimental data show β-Lap solubility and classification of physical behavior to be an excellent strategy for use in developing new nano-hybrid platforms.

Download Full-text

Spatial analysis for political scientists

Italian Political Science Review/Rivista Italiana di Scienza Politica ◽

10.1017/ipo.2021.7 ◽

2021 ◽

pp. 1-17

Author(s):

Jessica Di Salvatore ◽

Andrea Ruggeri

Keyword(s):

Comparative Politics ◽

Spatial Data ◽

Spatial Clustering ◽

Statistical Tests ◽

Spatial Regression ◽

Spatial Models ◽

Building Blocks ◽

Spatial Relationships ◽

Electoral Studies ◽

Data Points

Abstract How does space matter in our analyses? How can we evaluate diffusion of phenomena or interdependence among units? How biased can our analysis be if we do not consider spatial relationships? All the above questions are critical theoretical and empirical issues for political scientists belonging to several subfields from Electoral Studies to Comparative Politics, and also for International Relations. In this special issue on methods, our paper introduces political scientists to conceptualizing interdependence between units and how to empirically model these interdependencies using spatial regression. First, the paper presents the building blocks of any feature of spatial data (points, polygons, and raster) and the task of georeferencing. Second, the paper discusses what a spatial matrix (W) is, its varieties and the assumptions we make when choosing one. Third, the paper introduces how to investigate spatial clustering through visualizations (e.g. maps) as well as statistical tests (e.g. Moran's index). Fourth and finally, the paper explains how to model spatial relationships that are of substantive interest to some of our research questions. We conclude by inviting researchers to carefully consider space in their analysis and to reflect on the need, or the lack thereof, to use spatial models.

Download Full-text