scholarly journals Flexible modelling of spatial variation in agricultural field trials with the R package INLA

2019 ◽  
Vol 132 (12) ◽  
pp. 3277-3293 ◽  
Author(s):  
Maria Lie Selle ◽  
Ingelin Steinsland ◽  
John M. Hickey ◽  
Gregor Gorjanc

Abstract Key message Established spatial models improve the analysis of agricultural field trials with or without genomic data and can be fitted with the open-source R package INLA. Abstract The objective of this paper was to fit different established spatial models for analysing agricultural field trials using the open-source R package INLA. Spatial variation is common in field trials, and accounting for it increases the accuracy of estimated genetic effects. However, this is still hindered by the lack of available software implementations. We compare some established spatial models and show possibilities for flexible modelling with respect to field trial design and joint modelling over multiple years and locations. We use a Bayesian framework and for statistical inference the integrated nested Laplace approximations (INLA) implemented in the R package INLA. The spatial models we use are the well-known independent row and column effects, separable first-order autoregressive ($$\mathrm{AR1} \otimes \mathrm{AR1}$$ AR 1 ⊗ AR 1 ) models and a Gaussian random field (Matérn) model that is approximated via the stochastic partial differential equation approach. The Matérn model can accommodate flexible field trial designs and yields interpretable parameters. We test the models in a simulation study imitating a wheat breeding programme with different levels of spatial variation, with and without genome-wide markers and with combining data over two locations, modelling spatial and genetic effects jointly. The results show comparable predictive performance for both the $$\mathrm{AR1} \otimes \mathrm{AR1}$$ AR 1 ⊗ AR 1 and the Matérn models. We also present an example of fitting the models to a real wheat breeding data and simulated tree breeding data with the Nelder wheel design to show the flexibility of the Matérn model and the R package INLA.

2019 ◽  
Author(s):  
Maria Lie Selle ◽  
Ingelin Steinsland ◽  
John M. Hickey ◽  
Gregor Gorjanc

AbstractThe objective of this paper was to fit different established spatial models for analysing agricultural field trials using the open-source R package INLA. Spatial variation is common in field trials and accounting for it increases the accuracy of estimated genetic effects. However, this is still hindered by the lack of available software implementations. Here we compare some established spatial models and show possibilities for flexible modelling with respect to field trial design and joint modelling over multiple years and locations. We use a Bayesian framework and for statistical inference the Integrated Nested Laplace Approximations (INLA) implemented in the R package INLA. The spatial models we use are the well-known independent row and column effects, separable first-order autoregressive (AR1⊗AR1) models and a geostatistical model using the stochastic partial differential equation (SPDE) approach. The SPDE approach models a Gaussian random field, which can accommodate flexible field trial designs and yields interpretable parameters. We test the models in a simulation study imitating a wheat breeding program with different levels of spatial variation, with and without genome-wide markers, and with combining data over two locations, modelling spatial and genetic effects jointly. We evaluate predictive performance by correlation between true and estimated breeding values, the continuous rank probability score and how often the best individuals rank at the top. The results show best predictive performance with the AR1⊗AR1 and the SPDE. We also present an example of fitting the models to real wheat breeding data and simulated tree breeding data with the Nelder wheel design.Key messageEstablished spatial models improve the analysis of agricultural field trials with or without genomic data and can be fitted with the open-source R package INLA.


tppj ◽  
2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Filipe Inácio Matias ◽  
Maria V. Caraza‐Harter ◽  
Jeffrey B. Endelman

2015 ◽  
Author(s):  
Zheng Ning ◽  
Yakov A. Tsepilov ◽  
Sodbo Zh. Sharapov ◽  
Alexander K. Grishenko ◽  
Xiao Feng ◽  
...  

AbstractThe ever-growing genome-wide association studies (GWAS) have revealed widespread pleiotropy. To exploit this, various methods which consider variant association with multiple traits jointly have been developed. However, most effort has been put on improving discovery power: how to replicate and interpret these discovered pleiotropic loci using multivariate methods has yet to be discussed fully. Using only multiple publicly available single-trait GWAS summary statistics, we develop a fast and flexible multi-trait framework that contains modules for (i) multi-trait genetic discovery, (ii) replication of locus pleiotropic profile, and (iii) multi-trait conditional analysis. The procedure is able to handle any level of sample overlap. As an empirical example, we discovered and replicated 23 novel pleiotropic loci for human anthropometry and evaluated their pleiotropic effects on other traits. By applying conditional multivariate analysis on the 23 loci, we discovered and replicated two additional multi-trait associated SNPs. Our results provide empirical evidence that multi-trait analysis allows detection of additional, replicable, highly pleiotropic genetic associations without genotyping additional individuals. The methods are implemented in a free and open source R package MultiABEL.Author summaryBy analyzing large-scale genomic data, geneticists have revealed widespread pleiotropy, i.e. single genetic variation can affect a wide range of complex traits. Methods have been developed to discover such genetic variants. However, we still lack insights into the relevant genetic architecture - What more can we learn from knowing the effects of these genetic variants?Here, we develop a fast and flexible statistical analysis procedure that includes discovery, replication, and interpretation of pleiotropic effects. The whole analysis pipeline only requires established genetic association study results. We also provide the mathematical theory behind the pleiotropic genetic effects testing.Most importantly, we show how a replication study can be essential to reveal new biology rather than solely increasing sample size in current genomic studies. For instance, we show that, using our proposed replication strategy, we can detect the difference in genetic effects between studies of different geographical origins.We applied the method to the GIANT consortium anthropometric traits to discover new genetic associations, replicated in the UK Biobank, and provided important new insights into growth and obesity.Our pipeline is implemented in an open-source R package MultiABEL, sufficiently efficient that allows researchers to immediately apply on personal computers in minutes.


2021 ◽  
Vol 39 ◽  
pp. 100284
Author(s):  
Joseph Molloy ◽  
Felix Becker ◽  
Basil Schmid ◽  
Kay W. Axhausen

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Charlie M. Carpenter ◽  
Daniel N. Frank ◽  
Kayla Williamson ◽  
Jaron Arbet ◽  
Brandie D. Wagner ◽  
...  

Abstract Background The drive to understand how microbial communities interact with their environments has inspired innovations across many fields. The data generated from sequence-based analyses of microbial communities typically are of high dimensionality and can involve multiple data tables consisting of taxonomic or functional gene/pathway counts. Merging multiple high dimensional tables with study-related metadata can be challenging. Existing microbiome pipelines available in R have created their own data structures to manage this problem. However, these data structures may be unfamiliar to analysts new to microbiome data or R and do not allow for deviations from internal workflows. Existing analysis tools also focus primarily on community-level analyses and exploratory visualizations, as opposed to analyses of individual taxa. Results We developed the R package “tidyMicro” to serve as a more complete microbiome analysis pipeline. This open source software provides all of the essential tools available in other popular packages (e.g., management of sequence count tables, standard exploratory visualizations, and diversity inference tools) supplemented with multiple options for regression modelling (e.g., negative binomial, beta binomial, and/or rank based testing) and novel visualizations to improve interpretability (e.g., Rocky Mountain plots, longitudinal ordination plots). This comprehensive pipeline for microbiome analysis also maintains data structures familiar to R users to improve analysts’ control over workflow. A complete vignette is provided to aid new users in analysis workflow. Conclusions tidyMicro provides a reliable alternative to popular microbiome analysis packages in R. We provide standard tools as well as novel extensions on standard analyses to improve interpretability results while maintaining object malleability to encourage open source collaboration. The simple examples and full workflow from the package are reproducible and applicable to external data sets.


2021 ◽  
Author(s):  
Jason Hunter ◽  
Mark Thyer ◽  
Dmitri Kavetski ◽  
David McInerney

<p>Probabilistic predictions provide crucial information regarding the uncertainty of hydrological predictions, which are a key input for risk-based decision-making. However, they are often excluded from hydrological modelling applications because suitable probabilistic error models can be both challenging to construct and interpret, and the quality of results are often reliant on the objective function used to calibrate the hydrological model.</p><p>We present an open-source R-package and an online web application that achieves the following two aims. Firstly, these resources are easy-to-use and accessible, so that users need not have specialised knowledge in probabilistic modelling to apply them. Secondly, the probabilistic error model that we describe provides high-quality probabilistic predictions for a wide range of commonly-used hydrological objective functions, which it is only able to do by including a new innovation that resolves a long-standing issue relating to model assumptions that previously prevented this broad application.  </p><p>We demonstrate our methods by comparing our new probabilistic error model with an existing reference error model in an empirical case study that uses 54 perennial Australian catchments, the hydrological model GR4J, 8 common objective functions and 4 performance metrics (reliability, precision, volumetric bias and errors in the flow duration curve). The existing reference error model introduces additional flow dependencies into the residual error structure when it is used with most of the study objective functions, which in turn leads to poor-quality probabilistic predictions. In contrast, the new probabilistic error model achieves high-quality probabilistic predictions for all objective functions used in this case study.</p><p>The new probabilistic error model and the open-source software and web application aims to facilitate the adoption of probabilistic predictions in the hydrological modelling community, and to improve the quality of predictions and decisions that are made using those predictions. In particular, our methods can be used to achieve high-quality probabilistic predictions from hydrological models that are calibrated with a wide range of common objective functions.</p>


2019 ◽  
Author(s):  
Corey R. Lawrence ◽  
Jeffery Beem-Miller ◽  
Alison M. Hoyt ◽  
Grey Monroe ◽  
Carlos A. Sierra ◽  
...  

Abstract. Radiocarbon is a critical constraint on our estimates of the timescales of soil carbon cycling that can aid in identifying mechanisms of carbon stabilization and destabilization, and improve forecast of soil carbon response to management or environmental change. Despite the wealth of soil radiocarbon data that has been reported over the past 75 years, the ability to apply these data to global scale questions is limited by our capacity to synthesis and compare measurements generated using a variety of methods. Here we describe the International Soil Radiocarbon Database (ISRaD, soilradiocarbon.org), an open-source archive of soils data that include data from bulk soils, or whole-soils; distinct soil carbon pools isolated in the laboratory by a variety of soil fractionation methods; samples of soil gas or water collected interstitially from within an intact soil profile; CO2 gas isolated from laboratory soil incubations; and fluxes collected in situ from a soil surface. The core of ISRaD is a relational database structured around individual datasets (entries) and organized hierarchically to report soil radiocarbon data, measured at different physical and temporal scales, as well as other soil or environmental properties that may also be measured at one or more levels of the hierarchy that may assist with interpretation and context. Anyone may contribute their own data to the database by entering it into the ISRaD template and subjecting it to quality assurance protocols. ISRaD can be accessed through: (1) a web-based interface, (2) an R package (ISRaD), or (3) direct access to code and data through the GitHub repository, which hosts both code and data. The design of ISRaD allows for participants to become directly involved in the management, design, and application of ISRaD data. The synthesized dataset is available in two forms: the original data as reported by the authors of the datasets; and an enhanced dataset that includes ancillary geospatial data calculated within the ISRaD framework. ISRaD also provides data management tools in the ISRaD-R package that provide a starting point for data analysis. This community-based dataset and platform for soil radiocarbon and a wide array of additional soils data information in soils where data are easy to contribute and the community is invited to add tools and ideas for improvement. As a whole, ISRaD provides resources that can aid our evaluation of soil dynamics and improve our understanding of controls on soil carbon dynamics across a range of spatial and temporal scales. The ISRaD v1.0 dataset (Lawrence et al., 2019) is archived and freely available at https://doi.org/10.5281/zenodo.2613911.


2017 ◽  
Author(s):  
J.A. Grogan ◽  
A.J. Connor ◽  
B. Markelc ◽  
R.J. Muschel ◽  
P.K. Maini ◽  
...  

AbstractSpatial models of vascularized tissues are widely used in computational physiology, to study for example, tumour growth, angiogenesis, osteogenesis, coronary perfusion and oxygen delivery. Composition of such models is time-consuming, with many researchers writing custom software for this purpose. Recent advances in imaging have produced detailed three-dimensional (3D) datasets of vascularized tissues at the scale of individual cells. To fully exploit such data there is an increasing need for software that allows user-friendly composition of efficient, 3D models of vascularized tissue growth, and comparison of predictions with in vivo or in vitro experiments and other models. Microvessel Chaste is a new open-source library for building spatial models of vascularized tissue growth. It can be used to simulate vessel growth and adaptation in response to mechanical and chemical stimuli, intra- and extra-vascular transport of nutrient, growth factor and drugs, and cell proliferation in complex 3D geometries. The library provides a comprehensive Python interface to solvers implemented in C++, allowing user-friendly model composition, and integration with experimental data. Such integration is facilitated by interoperability with a growing collection of scientific Python software for image processing, statistical analysis, model annotation and visualization. The library is available under an open-source Berkeley Software Distribution (BSD) licence at https://jmsgrogan.github.io/MicrovesselChaste. This article links to two reproducible example problems, showing how the library can be used to model tumour growth and angiogenesis with realistic vessel networks.


2020 ◽  
Author(s):  
Rodrigo Gazaffi ◽  
Rodrigo R. Amadeu ◽  
Marcelo Mollinari ◽  
João R. B. F. Rosa ◽  
Cristiane H. Taniguti ◽  
...  

ABSTRACTAccurate QTL mapping in outcrossing species requires software programs which consider genetic features of these populations, such as markers with different segregation patterns and different level of information. Although the available mapping procedures to date allow inferring QTL position and effects, they are mostly not based on multilocus genetic maps. Having a QTL analysis based in such maps is crucial since they allow informative markers to propagate their information to less informative intervals of the map. We developed fullsibQTL, a novel and freely available R package to perform composite interval QTL mapping considering outcrossing populations and markers with different segregation patterns. It allows to estimate QTL position, effects, segregation patterns, and linkage phase with flanking markers. Additionally, several statistical and graphical tools are implemented, for straightforward analysis and interpretations. fullsibQTL is an R open source package with C and R source code (GPLv3). It is multiplatform and can be installed from https://github.com/augusto-garcia/fullsibQTL.


Sign in / Sign up

Export Citation Format

Share Document