Evaluation of redundancy analysis to identify signatures of local adaptation

ABSTRACTOrdination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast.

Download Full-text

Performing Land-Capability Evaluation by Use of Numerical Taxonomy: Land Use and Environmental Decisionmaking Made Hard?

Environment and Planning A Economy and Space ◽

10.1068/a100915 ◽

1978 ◽

Vol 10 (8) ◽

pp. 915-921 ◽

Cited By ~ 2

Author(s):

S I Gordon

Keyword(s):

Land Use ◽

Environmental Variables ◽

Numerical Taxonomy ◽

Planning Process ◽

Environmental Data ◽

Grid Cells ◽

Data Set ◽

Land Capability ◽

Land Capability Evaluation ◽

Capability Evaluation

Researchers have attempted to incorporate environmental variables into the land-use planning process by use of several ranking and mapping formulations. Most of these are based on some type of classification scheme. More recently, multivariate statistical techniques have been utilized to classify land areas into groups with similar suitability for urban development. A test was made of one of these numerical taxonomic techniques on a data set from Medford Township, New Jersey, and the results analyzed in terms of the pros and cons of these methods. A group of 484 forty-acre grid cells with forty-two environmental variables was collapsed into a ten variable set for ten groups of grid cells having like characteristics. The analysis involved two steps, factor analysis followed by a euclidean-distance-classification algorithm. The results show that numerical taxonomy can greatly facilitate the analysis of large environmental data sets and can help to identify the ecological relationships in quantitative terms. However, the complexity of the statistical methods involved greatly limits the wide application of these techniques, and the use of numerical taxonomic results in land-capability evaluation cannot release the researcher from making many judgmental decisions.

Download Full-text

Population Genomics Reveals Gene Flow and Adaptive Signature in Invasive Weed Mikania micrantha

Genes ◽

10.3390/genes12081279 ◽

2021 ◽

Vol 12 (8) ◽

pp. 1279

Author(s):

Xiaoxian Ruan ◽

Zhen Wang ◽

Yingjuan Su ◽

Ting Wang

Keyword(s):

Gene Flow ◽

Environmental Variables ◽

Mixed Model ◽

Population Genomics ◽

Southern China ◽

Defense Responses ◽

Economic Losses ◽

Functional Genes ◽

Invasive Weed ◽

Mikania Micrantha

A long-standing and unresolved issue in invasion biology concerns the rapid adaptation of invaders to nonindigenous environments. Mikania micrantha is a notorious invasive weed that causes substantial economic losses and negative ecological consequences in southern China. However, the contributions of gene flow, environmental variables, and functional genes, all generally recognized as important factors driving invasive success, to its successful invasion of southern China are not fully understood. Here, we utilized a genotyping-by-sequencing approach to sequence 306 M. micrantha individuals from 21 invasive populations. Based on the obtained genome-wide single nucleotide polymorphism (SNP) data, we observed that all the populations possessed similar high levels of genetic diversity that were not constrained by longitude and latitude. Mikania micrantha was introduced multiple times and subsequently experienced rapid-range expansion with recurrent high gene flow. Using FST outliers, a latent factor mixed model, and the Bayesian method, we identified 38 outlier SNPs associated with environmental variables. The analysis of these outlier SNPs revealed that soil composition, temperature, precipitation, and ecological variables were important determinants affecting the invasive adaptation of M. micrantha. Candidate genes with outlier signatures were related to abiotic stress response. Gene family clustering analysis revealed 683 gene families unique to M. micrantha which may have significant implications for the growth, metabolism, and defense responses of M. micrantha. Forty-one genes showing significant positive selection signatures were identified. These genes mainly function in binding, DNA replication and repair, signature transduction, transcription, and cellular components. Collectively, these findings highlight the contribution of gene flow to the invasion and spread of M. micrantha and indicate the roles of adaptive loci and functional genes in invasive adaptation.

Download Full-text

A Neural Networks Approach to Determine Factors Associated With Self-Reported Discomfort in Picking Tasks

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1177/00187208211047640 ◽

2021 ◽

pp. 001872082110476

Author(s):

Olfa Haj Mahmoud ◽

Charles Pontonnier ◽

Georges Dumont ◽

Stéphane Poli ◽

Franck Multon

Keyword(s):

Neural Network ◽

Neural Networks ◽

Environmental Variables ◽

Environmental Data ◽

Neural Network Approach ◽

Data Set ◽

Simulation Tools ◽

The Neural Network ◽

Input Variables ◽

Similar Accuracy

Objective A neural networks approach has been proposed to handle various inputs such as postural, anthropometric and environmental variables in order to estimate self-reported discomfort in picking tasks. An input reduction method has been proposed, reducing the input variables to the minimum data required to estimate self-reported discomfort with similar accuracy as the neural network fed with all variables. Background Previous works have attempted to explore the relationship between several factors and self-reported discomfort using observational methods. The results showed that this relationship was not a simple linear relationship. Another study used neural networks to model the function returning reported discomfort according to static posture, age, and anthropometrics variables. The results demonstrated the model’s ability to predict reported discomfort. But all the available variables were used to design the neural network. Method Eleven subjects carried-out picking tasks with various masses (0, 1, 3 kg) and imposed duration (5, 10, or 15 s). Continuous REBA score, anthropometric and environmental data were computed, and subjects’ discomfort were collected. The data set of this work consisted in the computed continuous REBA score, anthropometric, environmental data and collected subjects’ discomfort. Results The results showed that the correlation between the estimated and experimental tested data was equal to 0.775 when using all the 14 available variables. After data reduction, only 6 variables were left, with a very close performance when predicting discomfort. Conclusion A neural network approach has been proposed to estimate self-reported discomfort according to a minimum set of postural, anthropometric and environmental variables in picking tasks. Application This method has the potential to support ergonomists in workstation designing processes, by adding discomfort prediction to virtual manikins’ behaviors in simulation tools.

Download Full-text

Power of Modified Brown-Forsythe and Mixed-Model Approaches in Split-Plot Designs

Methodology ◽

10.1027/1614-2241/a000124 ◽

2017 ◽

Vol 13 (1) ◽

pp. 9-22 ◽

Cited By ~ 1

Author(s):

Pablo Livacic-Rojas ◽

Guillermo Vallejo ◽

Paula Fernández ◽

Ellián Tuero-Herrero

Keyword(s):

Repeated Measures ◽

Statistical Power ◽

Mixed Model ◽

Covariance Structure ◽

Simulation Method ◽

Future Research ◽

Repeated Measures Design ◽

Fixed And Random Effects ◽

Split Plot ◽

High Level

Abstract. Low precision of the inferences of data analyzed with univariate or multivariate models of the Analysis of Variance (ANOVA) in repeated-measures design is associated to the absence of normality distribution of data, nonspherical covariance structures and free variation of the variance and covariance, the lack of knowledge of the error structure underlying the data, and the wrong choice of covariance structure from different selectors. In this study, levels of statistical power presented the Modified Brown Forsythe (MBF) and two procedures with the Mixed-Model Approaches (the Akaike’s Criterion, the Correctly Identified Model [CIM]) are compared. The data were analyzed using Monte Carlo simulation method with the statistical package SAS 9.2, a split-plot design, and considering six manipulated variables. The results show that the procedures exhibit high statistical power levels for within and interactional effects, and moderate and low levels for the between-groups effects under the different conditions analyzed. For the latter, only the Modified Brown Forsythe shows high level of power mainly for groups with 30 cases and Unstructured (UN) and Autoregressive Heterogeneity (ARH) matrices. For this reason, we recommend using this procedure since it exhibits higher levels of power for all effects and does not require a matrix type that underlies the structure of the data. Future research needs to be done in order to compare the power with corrected selectors using single-level and multilevel designs for fixed and random effects.

Download Full-text

Islands in the desert: environmental distribution modelling of endemic flora reveals the extent of Pleistocene tropical relict vegetation in southern Arabia

Annals of Botany ◽

10.1093/aob/mcz085 ◽

2019 ◽

Vol 124 (3) ◽

pp. 411-422 ◽

Cited By ~ 2

Author(s):

James S Borrell ◽

Ghudaina Al Issaey ◽

Darach A Lupton ◽

Thomas Starnes ◽

Abdulrahman Al Hinai ◽

...

Keyword(s):

Environmental Variables ◽

Narrow Range ◽

Biodiversity Hotspot ◽

Environmental Data ◽

Distribution Data ◽

Endemic Plants ◽

Environmental Niche ◽

Environmental Distribution ◽

Southern Arabia ◽

Endemic Flora

AbstractBackground and AimsSouthern Arabia is a global biodiversity hotspot with a high proportion of endemic desert-adapted plants. Here we examine evidence for a Pleistocene climate refugium in the southern Central Desert of Oman, and its role in driving biogeographical patterns of endemism.MethodsDistribution data for seven narrow-range endemic plants were collected systematically across 195 quadrats, together with incidental and historic records. Important environmental variables relevant to arid coastal areas, including night-time fog and cloud cover, were developed for the study area. Environmental niche models using presence/absence data were built and tuned for each species, and spatial overlap was examined.Key ResultsA region of the Jiddat Al Arkad reported independent high model suitability for all species. Examination of environmental data across southern Oman indicates that the Jiddat Al Arkad displays a regionally unique climate with higher intra-annual stability, due in part to the influence of the southern monsoon. Despite this, the relative importance of environmental variables was highly differentiated among species, suggesting that characteristic variables such as coastal fog are not major cross-species predictors at this scale.ConclusionsThe co-occurrence of a high number of endemic study species within a narrow monsoon-influenced region is indicative of a refugium with low climate change velocity. Combined with climate analysis, our findings provide strong evidence for a southern Arabian Pleistocene refugium in Oman’s Central Desert. We suggest that this refugium has acted as an isolated temperate and mesic island in the desert, resulting in the evolution of these narrow-range endemic flora. Based on the composition of species, this system may represent the northernmost remnant of a continuous belt of mesic vegetation formerly ranging from Africa to Asia, with close links to the flora of East Africa. This has significant implications for future conservation of endemic plants in an arid biodiversity hotspot.

Download Full-text

Estimating the effective sample size in association studies of quantitative traits

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab057 ◽

2021 ◽

Author(s):

Andrey Ziyatdinov ◽

Jihye Kim ◽

Dmitry Prokopenko ◽

Florian Privé ◽

Fabien Laporte ◽

...

Keyword(s):

Statistical Power ◽

Quantitative Traits ◽

Mixed Model ◽

Association Studies ◽

Effective Sample Size ◽

Environment Interaction ◽

Uk Biobank ◽

Gene Environment Interaction ◽

Gene Environment ◽

The Uk

Abstract The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.

Download Full-text

Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18031333 ◽

2021 ◽

Vol 18 (3) ◽

pp. 1333

Author(s):

Ahmad R. Alsaber ◽

Jiazhu Pan ◽

Adeeba Al-Hurban

Keyword(s):

Air Quality ◽

Missing Data ◽

Random Forest ◽

Missing Values ◽

Imputation Method ◽

Environmental Data ◽

Environmental Research ◽

Quality Data ◽

Data Set ◽

Air Quality Data

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.

Download Full-text

Functional Delineation of Prefrontal Networks Underlying Working Memory in Schizophrenia: A Cross-data-set Examination

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01726 ◽

2021 ◽

pp. 1-29

Author(s):

Nicole Sanford ◽

Todd S. Woodward

Keyword(s):

Working Memory ◽

Mixed Model ◽

Data Sets ◽

Generation Task ◽

Data Set ◽

Set Switching ◽

Dorsolateral Prefrontal ◽

Task Conditions ◽

Experiment Analysis ◽

Internal Attention

Abstract Background: Working memory (WM) impairment in schizophrenia substantially impacts functional outcome. Although the dorsolateral pFC has been implicated in such impairment, a more comprehensive examination of brain networks comprising pFC is warranted. The present research used a whole-brain, multi-experiment analysis to delineate task-related networks comprising pFC. Activity was examined in schizophrenia patients across a variety of cognitive demands. Methods: One hundred schizophrenia patients and 102 healthy controls completed one of four fMRI tasks: a Sternberg verbal WM task, a visuospatial WM task, a Stroop set-switching task, and a thought generation task (TGT). Task-related networks were identified using multi-experiment constrained PCA for fMRI. Effects of task conditions and group differences were examined using mixed-model ANOVA on the task-related time series. Correlations between task performance and network engagement were also performed. Results: Four spatially and temporally distinct networks with pFC activation emerged and were postulated to subserve (1) internal attention, (2) auditory–motor attention, (3) motor responses, and (4) task energizing. The “energizing” network—engaged during WM encoding and diminished in patients—exhibited consistent trend relationships with WM capacity across different data sets. The dorsolateral-prefrontal-cortex-dominated “internal attention” network exhibited some evidence of hypoactivity in patients, but was not correlated with WM performance. Conclusions: Multi-experiment analysis allowed delineation of task-related, pFC-anchored networks across different cognitive constructs. Given the results with respect to the early-responding “energizing” network, WM deficits in schizophrenia may arise from disruption in the “energization” process described by Donald Stuss' model of pFC functions.

Download Full-text

Learnings From Strain Measurements on an In-Field Conductor and Wellhead System

Volume 3: Structures, Safety, and Reliability ◽

10.1115/omae2018-78521 ◽

2018 ◽

Author(s):

Rohit Shankaran ◽

Alexander Rimmer ◽

Alan Haig

Keyword(s):

Fatigue Damage ◽

Strain Gauge ◽

Transfer Functions ◽

Fatigue Loading ◽

Accurate Method ◽

Environmental Data ◽

Analytical Methodology ◽

Data Set ◽

Soil Model ◽

Drilling Operations

In recent years due to use of drilling risers with larger and heavier BOP/LMRP stacks, fatigue loading on subsea wellheads has increased, which poses potential restrictions on the duration of drilling operations. In order to track wellhead and conductor fatigue capacity consumption to support safe drilling operations a range of methods have been applied: • Analytical riser model and measured environmental data; • BOP motion measurement and transfer functions; • Strain gauge data. Strain gauge monitoring is considered the most accurate method for measuring fatigue capacity consumption. To compare the three approaches and establish recommendations for an optimal approach and method to establish fatigue accumulation of the wellhead, a monitoring data set is obtained on a well offshore West of Shetland. This paper presents an analysis of measured strain, motions and analytical predictions with the objective of better understanding the accuracy, limitations, or conservatism in each of the three methods defined above. Of the various parameters that affect the accuracy of the fatigue damage estimates, the paper identifies that the selection of analytical conductor-soil model is critical to narrowing the gap between fatigue life predictions from the different approaches. The work presented here presents the influence of alternative approaches to model conductor-soil interaction than the traditionally used API soil model. Overall, the paper presents the monitoring equipment and analytical methodology to advance the accuracy of wellhead fatigue damage measurements.

Download Full-text

Optimal models in the yield analysis of new flax cultivars

Canadian Journal of Plant Science ◽

10.1139/cjps-2017-0282 ◽

2018 ◽

Vol 98 (4) ◽

pp. 897-907

Author(s):

Gaofeng Jia ◽

Helen M. Booker

Keyword(s):

Mixed Models ◽

Statistical Power ◽

Mixed Model ◽

Trial Data ◽

Residual Error ◽

Yield Data ◽

Symmetry Model ◽

Compound Symmetry ◽

Future Data ◽

Variety Performance

Multi-environment trials are conducted to evaluate the performance of cultivars. In a combined analysis, the mixed model is superior to an analysis of variance for evaluating and comparing cultivars and dealing with an unbalanced data structure. This study seeks to identify the optimal models using the Saskatchewan Variety Performance Group post-registration regional trial data for flax. Yield data were collected for 15 entries in post-registration tests conducted in Saskatchewan from 2007 to 2016 (except 2011) and 16 mixed models with homogeneous or heterogeneous residual errors were compared. A compound symmetry model with heterogeneous residual error (CSR) had the best fit, with a normal distribution of residuals and a mean of zero fitted to the trial data for each year. The compound symmetry model with homogeneous residual error (CS) and a model extending the CSR to higher dimensions (DIAGR) were the next best models in most cases. Five hundred random samples from a two-stage sampling method were produced to determine the optimal models suitable for various environments. The CSR model was superior to other models for 396 out of 500 samples (79.2%). The top three models, CSR, CS, and DIAGR, had higher statistical power and could be used to access the yield stability of the new flax cultivars. Optimal mixed models are recommended for future data analysis of new flax cultivars in regional tests.

Download Full-text