Making WAVES in Breedbase: An Integrated Spectral Data Storage and Analysis Pipeline for Plant Breeding Programs

ABSTRACTVisible and near-infrared (vis-NIRS) spectroscopy is a promising tool for increasing phenotyping throughput in plant breeding programs, but existing analysis software packages are not optimized for a breeding context. Additionally, commercial software options are often outside of budget constraints for some breeding and research programs. To that end, we developed an open-source R package, waves, for the streamlined analysis of spectral data with several cross-validation schemes to assess prediction accuracy. Waves is compatible with a wide range of spectrometer models and performs visualization, filtering, aggregation, cross-validation set formation, model training, and prediction functions for the association of vis-NIRS spectra with reference measurements. Furthermore, we have integrated this package into the Breedbase family of open-source databases, expanding the analysis capabilities of this growing digital ecosystem to a number of crop species. Taken together, the standalone and Breedbase versions of waves enhance the accessibility of tools for the analysis of spectral data during the plant breeding process.Core ideaswaves is an open-source R package for spectral data analysis in plant breedingBreeding relevant cross-validation schemes to evaluate predictive accuracy of modelsExtension of Breedbase—an open-source database—to support spectral data storageGraphical user interface developed for implementation of waves in Breedbase

Download Full-text

Making waves in Breedbase: An integrated spectral data storage and analysis pipeline for plant breeding programs

tppj ◽

10.1002/ppj2.20012 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Jenna Hershberger ◽

Nicolas Morales ◽

Christiano C. Simoes ◽

Bryan Ellerbrock ◽

Guillaume Bauchet ◽

...

Keyword(s):

Plant Breeding ◽

Spectral Data ◽

Data Storage ◽

Analysis Pipeline ◽

Breeding Programs

Download Full-text

An open-source R-package and web application for high-quality probabilistic predictions in hydrology

10.5194/egusphere-egu21-8549 ◽

2021 ◽

Author(s):

Jason Hunter ◽

Mark Thyer ◽

Dmitri Kavetski ◽

David McInerney

Keyword(s):

Open Source ◽

Web Application ◽

R Package ◽

Error Model ◽

Objective Functions ◽

High Quality ◽

Wide Range ◽

Probabilistic Error

Probabilistic predictions provide crucial information regarding the uncertainty of hydrological predictions, which are a key input for risk-based decision-making. However, they are often excluded from hydrological modelling applications because suitable probabilistic error models can be both challenging to construct and interpret, and the quality of results are often reliant on the objective function used to calibrate the hydrological model.We present an open-source R-package and an online web application that achieves the following two aims. Firstly, these resources are easy-to-use and accessible, so that users need not have specialised knowledge in probabilistic modelling to apply them. Secondly, the probabilistic error model that we describe provides high-quality probabilistic predictions for a wide range of commonly-used hydrological objective functions, which it is only able to do by including a new innovation that resolves a long-standing issue relating to model assumptions that previously prevented this broad application. &#160;We demonstrate our methods by comparing our new probabilistic error model with an existing reference error model in an empirical case study that uses 54 perennial Australian catchments, the hydrological model GR4J, 8 common objective functions and 4 performance metrics (reliability, precision, volumetric bias and errors in the flow duration curve). The existing reference error model introduces additional flow dependencies into the residual error structure when it is used with most of the study objective functions, which in turn leads to poor-quality probabilistic predictions. In contrast, the new probabilistic error model achieves high-quality probabilistic predictions for all objective functions used in this case study.The new probabilistic error model and the open-source software and web application aims to facilitate the adoption of probabilistic predictions in the hydrological modelling community, and to improve the quality of predictions and decisions that are made using those predictions. In particular, our methods can be used to achieve high-quality probabilistic predictions from hydrological models that are calibrated with a wide range of common objective functions.

Download Full-text

MoBPS - Modular Breeding Program Simulator

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401193 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1915-1918 ◽

Cited By ~ 6

Author(s):

Torsten Pook ◽

Martin Schlather ◽

Henner Simianer

Keyword(s):

Data Storage ◽

Large Scale ◽

R Package ◽

Single Step ◽

Computationally Efficient ◽

Breeding Programs ◽

Genetic Impact ◽

Founder Populations ◽

Genetic Contributions ◽

Impact Simulations

The R-package MoBPS provides a computationally efficient and flexible framework to simulate complex breeding programs and compare their economic and genetic impact. Simulations are performed on the base of individuals. MoBPS utilizes a highly efficient implementation with bit-wise data storage and matrix multiplications from the associated R-package miraculix allowing to handle large scale populations. Individual haplotypes are not stored but instead automatically derived based on points of recombination and mutations. The modular structure of MoBPS allows to combine rather coarse simulations, as needed to generate founder populations, with a very detailed modeling of todays’ complex breeding programs, making use of all available biotechnologies. MoBPS provides pre-implemented functions for common breeding practices such as optimum genetic contributions and single-step GBLUP but also allows the user to replace certain steps with personalized and/or self-written solutions.

Download Full-text

blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

10.1101/357798 ◽

2018 ◽

Cited By ~ 3

Author(s):

Roozbeh Valavi ◽

Jane Elith ◽

José J. Lahoz-Monfort ◽

Gurutzeta Guillera-Arroita

Keyword(s):

Species Distribution ◽

Cross Validation ◽

Species Distribution Models ◽

Predictive Performance ◽

R Package ◽

Species Distribution Modelling ◽

List Type ◽

Distribution Models ◽

Distribution Modelling ◽

Evaluation Approaches

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

Download Full-text

Motif: an open-source R tool for pattern-based spatial analysis

10.32942/osf.io/kj7fu ◽

2020 ◽

Author(s):

Jakub Nowosad

Keyword(s):

Spatial Analysis ◽

Land Cover ◽

Open Source ◽

Spatial Patterns ◽

Forest Cover ◽

R Package ◽

Growth Monitoring ◽

Forest Cover Change ◽

Land Cover Data ◽

Wide Range

*Context* Pattern-based spatial analysis provides methods to describe and quantitatively compare spatial patterns for categorical raster datasets. It allows for spatial search, change detection, and clustering of areas with similar patterns. *Objectives* We developed an R package **motif** as a set of open-source tools for pattern-based spatial analysis. *Methods* This package provides most of the functionality of existing software (except spatial segmentation), but also extends the existing ideas through support for multi-layer raster datasets. It accepts larger-than-RAM datasets and works across all of the major operating systems. *Results* In this study, we describe the software design of the tool, its capabilities, and present four case studies. They include calculation of spatial signatures based on land cover data for regular and irregular areas, search for regions with similar patterns of geomorphons, detection of changes in land cover patterns, and clustering of areas with similar spatial patterns of land cover and landforms. *Conclusions* The methods implemented in **motif** should be useful in a wide range of applications, including land management, sustainable development, environmental protection, forest cover change and urban growth monitoring, and agriculture expansion studies. The **motif** package homepage is https://nowosad.github.io/motif.

Download Full-text

AlphaSimR: an R package for breeding program simulations

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkaa017 ◽

2020 ◽

Vol 11 (2) ◽

Author(s):

R Chris Gaynor ◽

Gregor Gorjanc ◽

John M Hickey

Keyword(s):

Software Package ◽

R Package ◽

Animal Breeding ◽

Breeding Program ◽

Stochastic Simulations ◽

Breeding Programs ◽

Detailed Design ◽

Building Simulations ◽

Wide Range ◽

Commercial Breeding

Abstract This paper introduces AlphaSimR, an R package for stochastic simulations of plant and animal breeding programs. AlphaSimR is a highly flexible software package able to simulate a wide range of plant and animal breeding programs for diploid and autopolyploid species. AlphaSimR is ideal for testing the overall strategy and detailed design of breeding programs. AlphaSimR utilizes a scripting approach to building simulations that is particularly well suited for modeling highly complex breeding programs, such as commercial breeding programs. The primary benefit of this scripting approach is that it frees users from preset breeding program designs and allows them to model nearly any breeding program design. This paper lists the main features of AlphaSimR and provides a brief example simulation to show how to use the software.

Download Full-text

Beyond power: Multivariate discovery, replication, and interpretation of pleiotropic loci using summary association statistics

10.1101/022269 ◽

2015 ◽

Cited By ~ 5

Author(s):

Zheng Ning ◽

Yakov A. Tsepilov ◽

Sodbo Zh. Sharapov ◽

Alexander K. Grishenko ◽

Xiao Feng ◽

...

Keyword(s):

Open Source ◽

Genetic Variants ◽

Association Studies ◽

R Package ◽

Genetic Effects ◽

Pleiotropic Effects ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Study Results ◽

Wide Range

AbstractThe ever-growing genome-wide association studies (GWAS) have revealed widespread pleiotropy. To exploit this, various methods which consider variant association with multiple traits jointly have been developed. However, most effort has been put on improving discovery power: how to replicate and interpret these discovered pleiotropic loci using multivariate methods has yet to be discussed fully. Using only multiple publicly available single-trait GWAS summary statistics, we develop a fast and flexible multi-trait framework that contains modules for (i) multi-trait genetic discovery, (ii) replication of locus pleiotropic profile, and (iii) multi-trait conditional analysis. The procedure is able to handle any level of sample overlap. As an empirical example, we discovered and replicated 23 novel pleiotropic loci for human anthropometry and evaluated their pleiotropic effects on other traits. By applying conditional multivariate analysis on the 23 loci, we discovered and replicated two additional multi-trait associated SNPs. Our results provide empirical evidence that multi-trait analysis allows detection of additional, replicable, highly pleiotropic genetic associations without genotyping additional individuals. The methods are implemented in a free and open source R package MultiABEL.Author summaryBy analyzing large-scale genomic data, geneticists have revealed widespread pleiotropy, i.e. single genetic variation can affect a wide range of complex traits. Methods have been developed to discover such genetic variants. However, we still lack insights into the relevant genetic architecture - What more can we learn from knowing the effects of these genetic variants?Here, we develop a fast and flexible statistical analysis procedure that includes discovery, replication, and interpretation of pleiotropic effects. The whole analysis pipeline only requires established genetic association study results. We also provide the mathematical theory behind the pleiotropic genetic effects testing.Most importantly, we show how a replication study can be essential to reveal new biology rather than solely increasing sample size in current genomic studies. For instance, we show that, using our proposed replication strategy, we can detect the difference in genetic effects between studies of different geographical origins.We applied the method to the GIANT consortium anthropometric traits to discover new genetic associations, replicated in the UK Biobank, and provided important new insights into growth and obesity.Our pipeline is implemented in an open-source R package MultiABEL, sufficiently efficient that allows researchers to immediately apply on personal computers in minutes.

Download Full-text

Understanding photothermal interactions will help expand production range and increase genetic diversity of lentil (Lens culinaris Medik.)

10.1101/2020.07.18.207761 ◽

2020 ◽

Cited By ~ 1

Author(s):

Derek M. Wright ◽

Sandesh Neupane ◽

Taryn Heidecker ◽

Teketel A. Haile ◽

Clarice J. Coyne ◽

...

Keyword(s):

Genetic Diversity ◽

Lens Culinaris ◽

Photoperiod Sensitivity ◽

Future Climate Change ◽

Climate Change Scenarios ◽

List Type ◽

Exotic Germplasm ◽

Breeding Programs ◽

Wide Range ◽

Production Areas

SummaryLentil (Lens culinaris Medik.) is cultivated under a wide range of environmental conditions, which led to diverse phenological adaptations and resulted in a decrease in genetic variability within breeding programs due to reluctance in using genotypes from other environments.We phenotyped 324 genotypes across nine locations over three years to assess their phenological response to the environment of major lentil production regions and to predict days from sowing to flowering (DTF) using a photothermal model.DTF was highly influenced by the environment and is sufficient to explain adaptation. We were able to predict DTF reliably in most environments using a simple photothermal model, however, in certain site-years, results suggest there may be additional environmental factors at play. Hierarchical clustering of principal components revealed the presence of eight groups based on the responses of DTF to contrasting environments. These groups are associated with the coefficients of the photothermal model and revealed differences in temperature and photoperiod sensitivity.Expanding genetic diversity is critical to the success of a breeding program; understanding adaptation will facilitate the use of exotic germplasm. Future climate change scenarios will result in increase temperature and/or shifts in production areas, we can use the photothermal model to identify genotypes most likely to succeed in these new environments.

Download Full-text

The Chinese Ideophone Database (CHIDEOD)

Cahiers de linguistique - Asie orientale ◽

10.1163/19606028-bja10006 ◽

2020 ◽

Vol 49 (2) ◽

pp. 136-167

Author(s):

Thomas VAN HOEY ◽

Arthur Lewis THOMPSON

Keyword(s):

Open Source ◽

R Package ◽

Future Research ◽

Data Repository ◽

Old Chinese ◽

Wide Range

Abstract This article introduces the Chinese Ideophone Database (CHIDEOD), an open-source dataset, which collects 4948 unique onomatopoeia and ideophones (mimetics, expressives) of Mandarin, as well as Middle Chinese and Old Chinese. These are analyzed according to a wide range of variables, e.g., description, frequency. Apart from an overview of these variables, we provide a tutorial that shows how the database can be accessed in different formats (.rds, .xlsx, .csv, R package and online app interface), and how the database can be used to explore skewed tonal distribution across Mandarin ideophones. Since CHIDEOD is a data repository, potential future research applications are discussed.

Download Full-text

Motif: an open-source R tool for pattern-based spatial analysis

Landscape Ecology ◽

10.1007/s10980-020-01135-0 ◽

2020 ◽

Author(s):

Jakub Nowosad

Keyword(s):

Spatial Analysis ◽

Land Cover ◽

Open Source ◽

Spatial Patterns ◽

Forest Cover ◽

R Package ◽

Growth Monitoring ◽

Forest Cover Change ◽

Land Cover Data ◽

Wide Range

Abstract Context Pattern-based spatial analysis provides methods to describe and quantitatively compare spatial patterns for categorical raster datasets. It allows for spatial search, change detection, and clustering of areas with similar patterns. Objectives We developed an R package motif as a set of open-source tools for pattern-based spatial analysis. Methods This package provides most of the functionality of existing software (except spatial segmentation), but also extends the existing ideas through support for multi-layer raster datasets. It accepts larger-than-RAM datasets and works across all of the major operating systems. Results In this study, we describe the software design of the tool, its capabilities, and present four case studies. They include calculation of spatial signatures based on land cover data for regular and irregular areas, search for regions with similar patterns of geomorphons, detection of changes in land cover patterns, and clustering of areas with similar spatial patterns of land cover and landforms. Conclusions The methods implemented in motif should be useful in a wide range of applications, including land management, sustainable development, environmental protection, forest cover change and urban growth monitoring, and agriculture expansion studies. The motif package homepage is https://nowosad.github.io/motif.

Download Full-text