scholarly journals OpenStats: A Robust and Scalable Software Package for Reproducible Analysis of High-Throughput Phenotypic Data

2020 ◽  
Author(s):  
Hamed Haselimashhadi ◽  
Jeremy C Mason ◽  
Ann-Marie Mallon ◽  
Damian Smedley ◽  
Terrence F Meehan ◽  
...  

AbstractReproducibility in the statistical analyses of data from high-throughput phenotyping screens requires a robust and reliable analysis foundation that allows modelling of different possible statistical scenarios. Regular challenges are scalability and extensibility of the analysis software. In this manuscript, we describe OpenStats, a freely available software package that addresses these challenges. We show the performance of the software in a high-throughput phenomic pipeline in the International Mouse Phenotyping Consortium (IMPC) and compare the agreement of the results with the most similar implementation in the literature. OpenStats has significant improvements in speed and scalability compared to existing software packages including a 13-fold improvement in computational time to the current production analysis pipeline in the IMPC. Reduced complexity also promotes FAIR data analysis by providing transparency and benefiting other groups in reproducing and re-usability of the statistical methods and results. OpenStats is freely available under a Creative Commons license at www.bioconductor.org/packages/OpenStats.

PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0242933
Author(s):  
Hamed Haselimashhadi ◽  
Jeremy C. Mason ◽  
Ann-Marie Mallon ◽  
Damian Smedley ◽  
Terrence F. Meehan ◽  
...  

Reproducibility in the statistical analyses of data from high-throughput phenotyping screens requires a robust and reliable analysis foundation that allows modelling of different possible statistical scenarios. Regular challenges are scalability and extensibility of the analysis software. In this manuscript, we describe OpenStats, a freely available software package that addresses these challenges. We show the performance of the software in a high-throughput phenomic pipeline in the International Mouse Phenotyping Consortium (IMPC) and compare the agreement of the results with the most similar implementation in the literature. OpenStats has significant improvements in speed and scalability compared to existing software packages including a 13-fold improvement in computational time to the current production analysis pipeline in the IMPC. Reduced complexity also promotes FAIR data analysis by providing transparency and benefiting other groups in reproducing and re-usability of the statistical methods and results. OpenStats is freely available under a Creative Commons license at www.bioconductor.org/packages/OpenStats.


2021 ◽  
Vol 22 (15) ◽  
pp. 8266
Author(s):  
Minsu Kim ◽  
Chaewon Lee ◽  
Subin Hong ◽  
Song Lim Kim ◽  
Jeong-Ho Baek ◽  
...  

Drought is a main factor limiting crop yields. Modern agricultural technologies such as irrigation systems, ground mulching, and rainwater storage can prevent drought, but these are only temporary solutions. Understanding the physiological, biochemical, and molecular reactions of plants to drought stress is therefore urgent. The recent rapid development of genomics tools has led to an increasing interest in phenomics, i.e., the study of phenotypic plant traits. Among phenomic strategies, high-throughput phenotyping (HTP) is attracting increasing attention as a way to address the bottlenecks of genomic and phenomic studies. HTP provides researchers a non-destructive and non-invasive method yet accurate in analyzing large-scale phenotypic data. This review describes plant responses to drought stress and introduces HTP methods that can detect changes in plant phenotypes in response to drought.


Author(s):  
Daoliang Li ◽  
Chaoqun Quan ◽  
Zhaoyang Song ◽  
Xiang Li ◽  
Guanghui Yu ◽  
...  

Food scarcity, population growth, and global climate change have propelled crop yield growth driven by high-throughput phenotyping into the era of big data. However, access to large-scale phenotypic data has now become a critical barrier that phenomics urgently must overcome. Fortunately, the high-throughput plant phenotyping platform (HT3P), employing advanced sensors and data collection systems, can take full advantage of non-destructive and high-throughput methods to monitor, quantify, and evaluate specific phenotypes for large-scale agricultural experiments, and it can effectively perform phenotypic tasks that traditional phenotyping could not do. In this way, HT3Ps are novel and powerful tools, for which various commercial, customized, and even self-developed ones have been recently introduced in rising numbers. Here, we review these HT3Ps in nearly 7 years from greenhouses and growth chambers to the field, and from ground-based proximal phenotyping to aerial large-scale remote sensing. Platform configurations, novelties, operating modes, current developments, as well the strengths and weaknesses of diverse types of HT3Ps are thoroughly and clearly described. Then, miscellaneous combinations of HT3Ps for comparative validation and comprehensive analysis are systematically present, for the first time. Finally, we consider current phenotypic challenges and provide fresh perspectives on future development trends of HT3Ps. This review aims to provide ideas, thoughts, and insights for the optimal selection, exploitation, and utilization of HT3Ps, and thereby pave the way to break through current phenotyping bottlenecks in botany.


2018 ◽  
Author(s):  
Malachy Campbell ◽  
Harkamal Walia ◽  
Gota Morota

AbstractThe accessibility of high-throughput phenotyping platforms in both the greenhouse and field, as well as the relatively low cost of unmanned aerial vehicles, have provided researchers with an effective means to characterize large populations throughout the growing season. These longitudinal phenotypes can provide important insight into plant development and responses to the environment. Despite the growing use of these new phenotyping approaches in plant breeding, the use of genomic prediction models for longitudinal phenotypes is limited in major crop species. The objective of this study is to demonstrate the utility of random regression (RR) models using Legendre polynomials for genomic prediction of shoot growth trajectories in rice (Oryza sativa). An estimate of shoot biomass, projected shoot area (PSA), was recored over a period of 20 days for a panel of 357 diverse rice accessions using an image-based greenhouse phenotyping platform. A RR that included a fixed second-order Legendre polynomial, a random second-order Legendre polynomial for the additive genetic effect, a first-order Legendre polynomial for the environmental effect, and heterogeneous residual variances was used to model PSA trajectories. The utility of the RR model over a single time point (TP) approach, where PSA is fit at each time point independently, is shown through four prediction scenarios. In the first scenario, the RR and TP approaches were used to predict PSA for a set of lines lacking phenotypic data. The RR approach showed a 11.6% increase in prediction accuracy over the TP approach. Much of this improvement could be attributed to the greater additive genetic variance captured by the RR approach. The remaining scenarios focused forecasting future phenotypes using a subset of early time points for known lines with phenotypic data, as well new lines lacking phenotypic data. In all cases, PSA could be predicted with high accuracy (r: 0.79 to 0.89 and 0.55 to 0.58 for known and unknown lines, respectively). This study provides the first application of RR models for genomic prediction of a longitudinal trait in rice, and demonstrates that RR models can be effectively used to improve the accuracy of genomic prediction for complex traits compared to a TP approach.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Anjin Chang ◽  
Jinha Jung ◽  
Junho Yeom ◽  
Murilo M. Maeda ◽  
Juan A. Landivar ◽  
...  

Yield prediction and variety selection are critical components for assessing production and performance in breeding programs and precision agriculture. Since plants integrate their genetics, surrounding environments, and management conditions, crop phenotypes have been measured over cropping seasons to represent the traits of varieties. These days, UAS (unmanned aircraft system) provides a new opportunity to collect high-quality images and generate reliable phenotypic data efficiently. Here, we propose high-throughput phenotyping (HTP) from multitemporal UAS images for tomato yield estimation. UAS-based RGB and multispectral images were collected weekly and biweekly, respectively. The shape of the features of tomatoes such as canopy cover, canopy, volume, and vegetation indices derived from UAS imagery was estimated throughout the entire season. To extract time-series features from UAS-based phenotypic data, crop growth and growth rate curves were fitted using mathematical curves and first derivative equations. Time-series features such as the maximum growth rate, day at a specific event, and duration were extracted from the fitted curves of different phenotypes. The linear regression model produced high R 2 values even with different variable selection methods: all variables (0.79), forward selection (0.7), and backward selection (0.77). With factor analysis, we figured out two significant factors, growth speed and timing, related to high-yield varieties. Then, five time-series phenotypes were selected for yield prediction models explaining 65 percent of the variance in the actual harvest. The phenotypic features derived from RGB images played more important roles in prediction yield. This research also demonstrates that it is possible to select lower-performing tomato varieties successfully. The results from this work may be useful in breeding programs and research farms for selecting high-yielding and disease-/pest-resistant varieties.


2021 ◽  
Author(s):  
Moritz D Luerig

Digital images are a ubiquitous way to represent phenotypes. More and more ecologists and evolutionary biologists are using images to capture and analyze high dimensional phenotypic data to understand complex developmental and evolutionary processes. As a consequence, images are being collected at ever increasing rates, already outpacing our abilities for processing and analysis of the contained phenotypic information. phenopype is a high throughput phenotyping package for the programming language Python to support ecologists and evolutionary biologists in extracting high dimensional phenotypic data from digital images. phenopype integrates existing state-of-the-art computer vision functions (using the OpenCV library as a backend), GUI-based interactions, and a project management ecosystem to facilitate rapid data collection and reproducibility. phenopype offers three different workflow types that support users during different stages of scientific image analysis (prototyping, low-throughput, and high-throughput). In the high-throughput workflow, users interact with human-readable YAML configuration files to effectively modify settings for different images. These settings are stored along with processed images and results, so that the acquired phenotypic information becomes highly reproducible. phenopype combines the advantages of the Python environment, with its state-of-the-art computer vision, array manipulation and data handling libraries, and basic GUI capabilities, which allow users to step into the automatic workflow when necessary. Overall, phenopype is aiming to augment, rather than replace the utility of existing Python CV libraries, allowing biologists to focus on rapid and reproducible data collection.


2019 ◽  
Vol 36 (5) ◽  
pp. 1492-1500 ◽  
Author(s):  
Hamed Haselimashhadi ◽  
Jeremy C Mason ◽  
Violeta Munoz-Fuentes ◽  
Federico López-Gómez ◽  
Kolawole Babalola ◽  
...  

Abstract Motivation High-throughput phenomic projects generate complex data from small treatment and large control groups that increase the power of the analyses but introduce variation over time. A method is needed to utlize a set of temporally local controls that maximizes analytic power while minimizing noise from unspecified environmental factors. Results Here we introduce ‘soft windowing’, a methodological approach that selects a window of time that includes the most appropriate controls for analysis. Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight. We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach. Validation was performed using a resampling approach in which we demonstrate a 10% reduction of false positives from 2.5 million analyses. We applied the method to our production analysis pipeline that establishes genotype–phenotype associations by comparing mutant versus control data. We report an increase of 30% in significant P-values, as well as linkage to 106 versus 99 disease models via phenotype overlap with the soft-windowed and non-windowed approaches, respectively, from a set of 2082 mutant mouse lines. Our method is generalizable and can benefit large-scale human phenomic projects such as the UK Biobank and the All of Us resources. Availability and implementation The method is freely available in the R package SmoothWin, available on CRAN http://CRAN.R-project.org/package=SmoothWin. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Hamed Haselimashhadi ◽  
Mason C. Jeremy ◽  
Violeta Munoz-Fuentes ◽  
Federico López-Gómez ◽  
Kolawole Babalola ◽  
...  

AbstractMotivationHigh-throughput phenomic projects generate complex data from small treatment and large control groups that increase the power of the analyses but introduce variation over time. A method is needed to utlize a set of temporally local controls that maximises analytic power while minimising noise from unspecified environmental factors.ResultsHere we introduce “soft windowing”, a methodological approach that selects a window of time that includes the most appropriate controls for analysis. Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight. We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach. Validation was performed using a resampling approach in which we demonstrate a 10% reduction of false positives from 2.5 million analyses. We applied the method to our production analysis pipeline that establishes genotype-phenotype associations by comparing mutant versus control data. We report an increase of 30% in significant p-values, as well as linkage to 106 versus 99 disease models via phenotype overlap with the soft windowed and non-windowed approaches, respectively, from a set of 2,082 mutant mouse lines. Our method is generalisable and can benefit large-scale human phenomic projects such as the UK Biobank and the All of Us resources.Availability and ImplementationThe method is freely available in the R package SmoothWin, available on CRAN http://CRAN.R-project.org/package=SmoothWin.


2011 ◽  
Author(s):  
E. Kyzar ◽  
S. Gaikwad ◽  
M. Pham ◽  
J. Green ◽  
A. Roth ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document