spind: an R Package to Account for Spatial Autocorrelation in the Analysis of Lattice Data

spind is an R package aiming to provide a useful toolkit to account for spatial dependence in the analysis of lattice data. Grid-based data sets in spatial modelling often exhibit spatial dependence, i.e. values sampled at nearby locations are more similar than those sampled further apart. spind methods, described here, take this kind of two-dimensional dependence into account and are sensitive to its variation across different spatial scales. Methods presented to account for spatial autocorrelation are based on the two fundamentally different approaches of generalised estimating equations as well as wavelet-revised methods. Both methods are extensions to generalised linear models. spind also provides functions for multi-model inference and scaling by wavelet multiresolution regression. Since model evaluation is essential for assessing prediction accuracy in species distribution modelling, spind additionally supplies users with spatial accuracy measures, i.e. measures that are sensitive to the spatial arrangement of the predictions.

Download Full-text

Car Ownership and the Built Environment: A Spatial Modeling Approach

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211049409 ◽

2021 ◽

pp. 036119812110494

Author(s):

Jerome Laviolette ◽

Catherine Morency ◽

Owen D. Waygood ◽

Konstadinos G. Goulias

Keyword(s):

Built Environment ◽

Spatial Dependence ◽

Linear Models ◽

Census Data ◽

Model Specification ◽

Car Ownership ◽

Grocery Stores ◽

Modeling Framework ◽

Explanatory Variables ◽

Environment Variables

Car ownership is linked to higher car use, which leads to important environmental, social and health consequences. As car ownership keeps increasing in most countries, it remains relevant to examine what factors and policies can help contain this growth. This paper uses an advanced spatial econometric modeling framework to investigate spatial dependences in household car ownership rates measured at fine geographical scales using administrative data of registered vehicles and census data of household counts for the Island of Montreal, Canada. The use of a finer level of spatial resolution allows for the use of more explanatory variables than previous aggregate models of car ownership. Theoretical considerations and formal testing suggested the choice of the Spatial Durbin Error Model (SDEM) as an appropriate modeling option. The final model specification includes sociodemographic and built environment variables supported by theory and achieves a Nagelkerke pseudo-R2 of 0.93. Despite the inclusion of those variables the spatial linear models with and without lagged explanatory variables still exhibit residual spatial dependence. This indicates the presence of unobserved autocorrelated factors influencing car ownership rates. Model results indicate that sociodemographic variables explain much of the variance, but that built environment characteristics, including transit level of service and local commercial accessibility (e.g., to grocery stores) are strongly and negatively associated with neighborhood car ownership rates. Comparison of estimates between the SDEM and a non-spatial model indicates that failing to control for spatial dependence leads to an overestimation of the strength of the direct influence of built environment variables.

Download Full-text

Effect of Savanna windrow wood burning on the spatial variability of soil properties

Pesquisa Agropecuária Tropical ◽

10.1590/1983-40632021v5166853 ◽

2021 ◽

Vol 51 ◽

Author(s):

Diogo Neia Eberhardt ◽

Robélio Leandro Marchão ◽

Pedro Rodolfo Siqueira Vendrame ◽

Marc Corbeels ◽

Osvaldo Guedes Filho ◽

...

Keyword(s):

Spatial Distribution ◽

Spatial Variability ◽

Spatial Dependence ◽

Chemical Properties ◽

Spatial Arrangement ◽

Nugget Effect ◽

Tropical Savannas ◽

Geostatistical Methods ◽

Exchangeable Ca ◽

Spherical Models

ABSTRACT Tropical Savannas cover an area of approximately 1.9 billion hectares around the word and are subject to regular fires every 1 to 4 years. This study aimed to evaluate the influence of burning windrow wood from Cerrado (Brazilian Savanna) deforestation on the spatial variability of soil chemical properties, in the field. The data were analysed by using geostatistical methods. The semivariograms for pH(H2O), pH(CaCl2), Ca, Mg and K were calculated according to spherical models, whereas the phosphorus showed a nugget effect. The cross semi-variograms showed correlations between pH(H2O) and pH(CaCl2) with other variables with spatial dependence (exchangeable Ca and Mg and available K). The spatial variability maps for the pH(H2O), pH(CaCl2), Ca, Mg and K concentrations also showed similar patterns of spatial variability, indicating that burning the vegetation after deforestation caused a well-defined spatial arrangement. Even after 20 years of use with agriculture, the spatial distribution of pH(H2O), pH(CaCl2), Ca, Mg and available K was affected by the wood windrow burning that took place during the initial deforestation.

Download Full-text

tscount: An R Package for Analysis of Count Time Series Following Generalized Linear Models

Journal of Statistical Software ◽

10.18637/jss.v082.i05 ◽

2017 ◽

Vol 82 (5) ◽

Cited By ~ 28

Author(s):

Tobias Liboschik ◽

Konstantinos Fokianos ◽

Roland Fried

Keyword(s):

Time Series ◽

Generalized Linear Models ◽

Linear Models ◽

R Package ◽

Count Time Series

Download Full-text

Are Distribution Patterns Correlated with Plant Traits?

10.26686/wgtn.16998847.v1 ◽

2021 ◽

Author(s):

◽

Benjamin Magana-Rodriguez

Keyword(s):

Life History ◽

Leaf Area ◽

Spatial Patterns ◽

Life History Traits ◽

Multiple Scales ◽

Spatial Scales ◽

Regional Scale ◽

Distribution Patterns ◽

Spatial Arrangement ◽

Scaling Relationship

<p>The current crisis in loss of biodiversity requires rapid action. Knowledge of species' distribution patterns across scales is of high importance in determining their current status. However, species display many different distribution patterns on multiple scales. A positive relationship between regional (broad-scale) distribution and local abundance (fine-scale) of species is almost a constant pattern in macroecology. Nevertheless interspecific relationships typically contain much scatter. For example, species that possess high local abundance and narrow ranges, or species that are widespread, but locally rare. One way to describe these spatial features of distribution patterns is by analysing the scaling properties of occupancy (e.g., aggregation) in combination with knowledge of the processes that are generating the specific spatial pattern (e.g., reproduction, dispersal, and colonisation). The main goal of my research was to investigate if distribution patterns correlate with plant life-history traits across multiple scales. First, I compared the performance of five empirical models for their ability to describe the scaling relationship of occupancy in two datasets from Molesworth Station, New Zealand. Secondly, I analysed the association between spatial patterns and life history traits at two spatial scales in an assemblage of 46 grassland species in Molesworth Station. The spatial arrangement was quantified using the parameter k from the Negative Binomial Distribution (NBD). Finally, I investigated the same association between spatial patterns and life-history traits across local, regional and national scales, focusing in one of the most diverse families of plant species in New Zealand, the Veronica sect. Hebe (Plantaginaceae). The spatial arrangement was investigated using the mass fractal dimension. Cross-species correlations and phylogenetically independent contrasts were used to investigate the relationships between plant life-history traits and spatial patterns on both data bases. There was no superior occupancy-area model overall for describing the scaling relationship, however the results showed that a variety of occupancy-area models can be fit to different data sets at diverse spatial scales using nonlinear regression. Additionally, here I showed that it is possible to deduce and extrapolate information on occupancy at fine scales from coarse-scale data. For the 46 plantassemblage in Molesworth Station, Specific leaf area (SLA) exhibits a positive association with aggregation in cross-species analysis, while leaf area showed a negative association, and dispersule mass a positive correlation with degree of aggregation in phylogenetic contrast analysis at a local-scale (20 × 20 m resolution). Plant height was the only life-history trait that was associated with degree of aggregation at a regional-scale (100 × 60 mresolution). For the Veronica sect. Hebe dataset, leaf area showed a positive correlation with aggregation while specific leaf area showed a negative correlation with aggregation at a fine local-scale (2.5-60 m resolution). Inflorescence length, breeding system and leaf area showed a negative correlation with degree of aggregation at a regional-scale (2.5-20 km resolution). Height was positively associated with aggregation at national-scale (20-100 km resolution). Although life-history traits showed low predictive ability in explaining aggregation throughout this thesis, there was a general pattern about which processes and traits were important at different scales. At local scales traits related to dispersal and completion such as SLA , leaf area, dispersule mass and the presence of structures in seeds for dispersal, were important; while at regional scales traits related to reproduction such as breeding system, inflorescence length and traits related to dispersal (seed mass) were significant. At national scales only plant height was important in predicting aggregation. Here, it was illustrated how the parameters of these scaling models capture an important aspect of spatial pattern that can be related to other macroecological relationships and the life-history traits of species. This study shows that when several scales of analysis are considered, we can improve our understanding about the factors that are related to species' distribution patterns.</p>

Download Full-text

modelBuildR: an R package for model building and feature selection with erroneous classifications

PeerJ ◽

10.7717/peerj.10849 ◽

2021 ◽

Vol 9 ◽

pp. e10849

Author(s):

Maximilian Knoll ◽

Jennifer Furkel ◽

Juergen Debus ◽

Amir Abdollahi

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Model Building ◽

Linear Models ◽

Binary Classification ◽

Ground Truth ◽

R Package ◽

Methylation Array ◽

Survival Difference ◽

Error Probabilities

Background Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5–15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups. Methods Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2–10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation. Results V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3–10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54–1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28–1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59–1.00) for V1 and 0.54 (range 0.32–0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method. Conclusions The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR).

Download Full-text

Developing the use of convolutional neural networking in benthic habitat classification and species distribution modelling

ICES Journal of Marine Science ◽

10.1093/icesjms/fsaa208 ◽

2020 ◽

Author(s):

Jennifer I Fincham ◽

Christian Wilson ◽

Jon Barry ◽

Stefan Bolam ◽

Geoffrey French

Keyword(s):

Species Distribution ◽

Spatial Scales ◽

Species Distribution Modelling ◽

Statistical Testing ◽

Training Dataset ◽

Benthic Habitat ◽

Distribution Modelling ◽

Habitat Classification ◽

Fish And Shellfish ◽

Multibeam Data

Abstract Management of the marine environment is increasingly being conducted in accordance with an ecosystem-based approach, which requires an integrated approach to monitoring. Simultaneous acquisition of the different data types needed is often difficult, largely due to specific gear requirements (grabs, trawls, and video and acoustic approaches) and mismatches in their spatial and temporal scales. We present an example to resolve this using a convolutional neural network (CNN), using ad hoc multibeam data collected during multi-disciplinary surveys to predict the distribution of seabed habitats across the western English Channel. We adopted a habitat classification system, based on seabed morphology and sediment dynamics, and trained a CNN to label images generated from the multibeam data. The probability of the correct classification by the CNN varied per habitat, with accuracy above 60% for 85% of habitats in a training dataset. Statistical testing revealed that the spatial distribution of 57 of the 100 demersal fish and shellfish species sampled across the region during the surveys possessed a non-random relationship with the multibeam-derived habitats using CNN. CNNs, therefore, offer the potential to aid habitat mapping and facilitate species distribution modelling at the large spatial scales required under an ecosystem-based management framework.

Download Full-text

Author response for "smartR: an R package for spatial modelling of fisheries and scenario simulation of management strategies"

10.1111/2041-210x.13394/v2/response1 ◽

2020 ◽

Author(s):

Lorenzo D’Andrea ◽

Antonio Parisi ◽

Fabio Fiorentino ◽

Germana Garofalo ◽

Michele Gristina ◽

...

Keyword(s):

Management Strategies ◽

R Package ◽

Spatial Modelling ◽

Author Response ◽

Scenario Simulation

Download Full-text

Patch dynamics and the development of structural and spatial heterogeneity in Pacific Northwest forests

Canadian Journal of Forest Research ◽

10.1139/x11-128 ◽

2011 ◽

Vol 41 (12) ◽

pp. 2276-2291 ◽

Cited By ~ 38

Author(s):

Van R. Kane ◽

Rolf F. Gersonde ◽

James A. Lutz ◽

Robert J. McGaughey ◽

Jonathan D. Bakker ◽

...

Keyword(s):

Pacific Northwest ◽

Stand Structure ◽

Spatial Scales ◽

Canopy Structure ◽

Patch Dynamics ◽

Spatial Arrangement ◽

Airborne Lidar ◽

Small Scale ◽

Individual Site ◽

Time Required

Over time, chronic small-scale disturbances within forests should create distinct stand structures and spatial patterns. We tested this hypothesis by measuring the structure and spatial arrangement of gaps and canopy patches. We used airborne LiDAR data from 100 sites (cumulative 11.2 km2) in the Pacific Northwest, USA, across a 643 year chronosequence to measure canopy structure, patch and gap diversity, and scales of variance. We used airborne LiDAR’s ability to identify strata in canopy surface height to distinguish patch spatial structures as homogeneous canopy structure, matrix–patch structures, or patch mosaics. We identified six distinct stand structure classes that were associated with the canopy closure, competitive exclusion, maturation, and three patch mosaics stages of late seral forest development. Structural variance peaked in all classes at the tree-to-tree and tree-to-gap scales (10–15 m), but many sites maintained high variance at scales >30 m and up to 200 m, emphasizing the high patch-to-patch heterogeneity. The time required to develop complex patch and gap structures was highly variable and was likely linked to individual site circumstances. The high variance at larger scales appears to be an emergent property that is not a simple propagation of processes observed at smaller spatial scales.

Download Full-text

On selection of spatial linear models for lattice data

Journal of the Royal Statistical Society Series B (Statistical Methodology) ◽

10.1111/j.1467-9868.2010.00739.x ◽

2010 ◽

Vol 72 (3) ◽

pp. 389-402 ◽

Cited By ~ 27

Author(s):

Jun Zhu ◽

Hsin-Cheng Huang ◽

Perla E. Reyes

Keyword(s):

Linear Models ◽

Lattice Data ◽

Selection Of

Download Full-text

Correction of Spatial Bias in Oligonucleotide Array Data

Advances in Bioinformatics ◽

10.1155/2013/167915 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Philippe Serhal ◽

Sébastien Lemieux

Keyword(s):

Mutual Information ◽

Spatial Autocorrelation ◽

Differentially Expressed Gene ◽

R Package ◽

Oligonucleotide Array ◽

Hybridization Signal ◽

Gene Detection ◽

High Throughput Gene Expression ◽

Probe Set ◽

Free Open Source

Background. Oligonucleotide microarrays allow for high-throughput gene expression profiling assays. The technology relies on the fundamental assumption that observed hybridization signal intensities (HSIs) for each intended target, on average, correlate with their target’s true concentration in the sample. However, systematic, nonbiological variation from several sources undermines this hypothesis. Background hybridization signal has been previously identified as one such important source, one manifestation of which appears in the form of spatial autocorrelation. Results. We propose an algorithm, pyn, for the elimination of spatial autocorrelation in HSIs, exploiting the duality of desirable mutual information shared by probes in a common probe set and undesirable mutual information shared by spatially proximate probes. We show that this correction procedure reduces spatial autocorrelation in HSIs; increases HSI reproducibility across replicate arrays; increases differentially expressed gene detection power; and performs better than previously published methods. Conclusions. The proposed algorithm increases both precision and accuracy, while requiring virtually no changes to users’ current analysis pipelines: the correction consists merely of a transformation of raw HSIs (e.g., CEL files for Affymetrix arrays). A free, open-source implementation is provided as an R package, compatible with standard Bioconductor tools. The approach may also be tailored to other platform types and other sources of bias.

Download Full-text