scholarly journals A wavelet-based method to remove spatial autocorrelation in the analysis of species distributional data

Web Ecology ◽  
2008 ◽  
Vol 8 (1) ◽  
pp. 22-29 ◽  
Author(s):  
G. Carl ◽  
C. F. Dormann ◽  
I. Kühn

Abstract. Species distributional data based on lattice data often display spatial autocorrelation. In such cases, the assumption of independently and identically distributed errors can be violated in standard regression models. Based on a recently published review on methods to account for spatial autocorrelation, we describe here a new statistical approach which relies on the theory of wavelets. It provides a powerful tool for removing spatial autocorrelation without any prior knowledge of the underlying correlation structure. Our wavelet-revised model (WRM) is applied to artificial datasets of species’ distributions, for both presence/absence (binary response) and species abundance data (Poisson or normally distributed response). Making use of these published data enables us to compare WRM to other recently tested models and to recommend it as an attractive option for effective and computationally efficient autocorrelation removal.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Janet C. Siebert ◽  
Martine Saint-Cyr ◽  
Sarah J. Borengasser ◽  
Brandie D. Wagner ◽  
Catherine A. Lozupone ◽  
...  

Abstract Background One goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be “ome aware.” Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but lacking an obvious approach for cohesive and interpretable summaries of these catalogs. Methods We present a novel workflow for building predictive regression models from network neighborhoods in multi-omic networks. First, we generate pairwise regression models across all pairs of analytes from all omes, encoding the resulting “top table” of relationships in a network. Then, we build predictive logistic regression models using the analytes in network neighborhoods of interest. We call this method CANTARE (Consolidated Analysis of Network Topology And Regression Elements). Results We applied CANTARE to previously published data from healthy controls and patients with inflammatory bowel disease (IBD) consisting of three omes: gut microbiome, metabolomics, and microbial-derived enzymes. We identified 8 unique predictive models with AUC > 0.90. The number of predictors in these models ranged from 3 to 13. We compare the results of CANTARE to random forests and elastic-net penalized regressions, analyzing AUC, predictions, and predictors. CANTARE AUC values were competitive with those generated by random forests and  penalized regressions. The top 3 CANTARE models had a greater dynamic range of predicted probabilities than did random forests and penalized regressions (p-value = 1.35 × 10–5). CANTARE models were significantly more likely to prioritize predictors from multiple omes than were the alternatives (p-value = 0.005). We also showed that predictive models from a network based on pairwise models with an interaction term for IBD have higher AUC than predictive models built from a correlation network (p-value = 0.016). R scripts and a CANTARE User’s Guide are available at https://sourceforge.net/projects/cytomelodics/files/CANTARE/. Conclusion CANTARE offers a flexible approach for building parsimonious, interpretable multi-omic models. These models yield quantitative and directional effect sizes for predictors and support the generation of hypotheses for follow-up investigation.


2019 ◽  
pp. 232102221886979
Author(s):  
Radhika Pandey ◽  
Amey Sapre ◽  
Pramod Sinha

Identification of primary economic activity of firms is a prerequisite for compiling several macro aggregates. In this paper, we take a statistical approach to understand the extent of changes in primary economic activity of firms over time and across different industries. We use the history of economic activity of over 46,000 firms spread over 25 years from CMIE Prowess to identify the number of times firms change the nature of their business. Using the count of changes, we estimate Poisson and Negative Binomial regression models to gain predictability over changing economic activity across industry groups. We show that a Poisson model accurately characterizes the distribution of count of changes across industries and that firms with a long history are more likely to have changed their primary economic activity over the years. Findings show that classification can be a crucial problem in a large data set like the MCA21 and can even lead to distortions in value addition estimates at the industry level. JEL Classifications: D22, E00, E01


2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Sunghee Oh ◽  
Seongho Song ◽  
Gregory Grabowski ◽  
Hongyu Zhao ◽  
James P. Noonan

RNA-seq is becoming thede factostandard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.


2019 ◽  
pp. 088626051988852
Author(s):  
Louise Almond ◽  
Elias Matin ◽  
Michelle McManus

Offender profiling follows the idea that if offenders’ crime scene actions can be empirically linked to their background characteristics, it will be possible to predict one from the other. There is a lack of research exploring whether homicide offenders’ crime scene actions are predictive of their criminal histories, despite the potential utility of such information. The current study addresses this gap in the literature. A sample of 213 adult male-on-female homicides with sexual or unknown motive was drawn from a U.K.-wide database. Relationships between 13 preconviction variables and 29 crime scene behaviors were explored using a bivariate statistical approach. Subsequently, binary logistic regression models were used to predict the presence, or absence, of specific preconvictions based on a combination of offense behaviors. Analyses highlighted 16 statistically significant associations between key offense behaviors and previous convictions, these associations were often “less likely” to result in previous conviction. The analysis failed to find any association for various other variables, most notably sexual preconvictions. Results indicate offenders’ criminal histories can be predicted from their offense behaviors, though not all preconvictions may be similarly suited. Implications for practice are discussed.


2020 ◽  
Vol 12 (10) ◽  
pp. 4324
Author(s):  
Felipa de Mello-Sampayo

This manuscript develops a theoretical spatial interaction model using the entropy approach to relax the assumption of the deterministic utility function. The spatial healthcare accessibility improves as the demand for healthcare increases or the opportunity cost of traveling to and from healthcare providers decreases. The empirical application used different spatial econometric techniques and multilevel modeling to evaluate the spatial distribution of existing hospitals in Texas and their social and economic correlates. To control for spatial autocorrelation, spatial autoregressive regression models were estimated, and geographically weighted regression models examined potential spatial non-stationarity. The multilevel modeling controlled for spatial autocorrelation and also allowed local variation and spatial non-stationarity. The empirical analysis showed that healthcare accessibility was not stationary in Texas in 2015, with areas of poor accessibility in rural and peripheral areas in Texas, when using hospitals’ location and county data. The model of spatial interaction applied to healthcare accessibility can be used to evaluate policies aiming at the provision of health services, such as closures of hospitals and capacity increases.


Biometrika ◽  
1995 ◽  
Vol 82 (4) ◽  
pp. 747-769 ◽  
Author(s):  
JIM ALBERT ◽  
SIDDHARTHA CHIB

Sign in / Sign up

Export Citation Format

Share Document