scholarly journals Soft windowing application to improve analysis of high-throughput phenotyping data

2019 ◽  
Vol 36 (5) ◽  
pp. 1492-1500 ◽  
Author(s):  
Hamed Haselimashhadi ◽  
Jeremy C Mason ◽  
Violeta Munoz-Fuentes ◽  
Federico López-Gómez ◽  
Kolawole Babalola ◽  
...  

Abstract Motivation High-throughput phenomic projects generate complex data from small treatment and large control groups that increase the power of the analyses but introduce variation over time. A method is needed to utlize a set of temporally local controls that maximizes analytic power while minimizing noise from unspecified environmental factors. Results Here we introduce ‘soft windowing’, a methodological approach that selects a window of time that includes the most appropriate controls for analysis. Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight. We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach. Validation was performed using a resampling approach in which we demonstrate a 10% reduction of false positives from 2.5 million analyses. We applied the method to our production analysis pipeline that establishes genotype–phenotype associations by comparing mutant versus control data. We report an increase of 30% in significant P-values, as well as linkage to 106 versus 99 disease models via phenotype overlap with the soft-windowed and non-windowed approaches, respectively, from a set of 2082 mutant mouse lines. Our method is generalizable and can benefit large-scale human phenomic projects such as the UK Biobank and the All of Us resources. Availability and implementation The method is freely available in the R package SmoothWin, available on CRAN http://CRAN.R-project.org/package=SmoothWin. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Hamed Haselimashhadi ◽  
Mason C. Jeremy ◽  
Violeta Munoz-Fuentes ◽  
Federico López-Gómez ◽  
Kolawole Babalola ◽  
...  

AbstractMotivationHigh-throughput phenomic projects generate complex data from small treatment and large control groups that increase the power of the analyses but introduce variation over time. A method is needed to utlize a set of temporally local controls that maximises analytic power while minimising noise from unspecified environmental factors.ResultsHere we introduce “soft windowing”, a methodological approach that selects a window of time that includes the most appropriate controls for analysis. Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight. We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach. Validation was performed using a resampling approach in which we demonstrate a 10% reduction of false positives from 2.5 million analyses. We applied the method to our production analysis pipeline that establishes genotype-phenotype associations by comparing mutant versus control data. We report an increase of 30% in significant p-values, as well as linkage to 106 versus 99 disease models via phenotype overlap with the soft windowed and non-windowed approaches, respectively, from a set of 2,082 mutant mouse lines. Our method is generalisable and can benefit large-scale human phenomic projects such as the UK Biobank and the All of Us resources.Availability and ImplementationThe method is freely available in the R package SmoothWin, available on CRAN http://CRAN.R-project.org/package=SmoothWin.


Author(s):  
Xiuwen Zheng ◽  
J Wade Davis

Abstract Summary Phenome-wide association studies (PheWASs) are known to be a powerful tool in discovery and replication of genetic association studies. To reduce the computational burden of PheWAS in the large cohorts, such as the UK Biobank, the SAIGE method has been proposed to control for case–control imbalance and sample relatedness in a tractable manner. However, SAIGE is still computationally intensive when deployed in analyzing the associations of thousands of ICD10-coded phenotypes with whole-genome imputed genotype data. Here, we present a new high-performance statistical R package (SAIGEgds) for large-scale PheWAS using generalized linear mixed models. The package implements the SAIGE method in optimized C++ codes, taking advantage of sparse genotype dosages and integrating the efficient genomic data structure file format. Benchmarks using the UK Biobank White British genotype data (N ≈ 430 K) with coronary heart disease and simulated cases show that the implementation in SAIGEgds is 5–6 times faster than the SAIGE R package. When used in conjunction with high-performance computing clusters, SAIGEgds provides an efficient analysis pipeline for biobank-scale PheWAS. Availability and implementation https://bioconductor.org/packages/SAIGEgds; vignettes included. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (15) ◽  
pp. 8266
Author(s):  
Minsu Kim ◽  
Chaewon Lee ◽  
Subin Hong ◽  
Song Lim Kim ◽  
Jeong-Ho Baek ◽  
...  

Drought is a main factor limiting crop yields. Modern agricultural technologies such as irrigation systems, ground mulching, and rainwater storage can prevent drought, but these are only temporary solutions. Understanding the physiological, biochemical, and molecular reactions of plants to drought stress is therefore urgent. The recent rapid development of genomics tools has led to an increasing interest in phenomics, i.e., the study of phenotypic plant traits. Among phenomic strategies, high-throughput phenotyping (HTP) is attracting increasing attention as a way to address the bottlenecks of genomic and phenomic studies. HTP provides researchers a non-destructive and non-invasive method yet accurate in analyzing large-scale phenotypic data. This review describes plant responses to drought stress and introduces HTP methods that can detect changes in plant phenotypes in response to drought.


2020 ◽  
Author(s):  
Hamed Haselimashhadi ◽  
Jeremy C Mason ◽  
Ann-Marie Mallon ◽  
Damian Smedley ◽  
Terrence F Meehan ◽  
...  

AbstractReproducibility in the statistical analyses of data from high-throughput phenotyping screens requires a robust and reliable analysis foundation that allows modelling of different possible statistical scenarios. Regular challenges are scalability and extensibility of the analysis software. In this manuscript, we describe OpenStats, a freely available software package that addresses these challenges. We show the performance of the software in a high-throughput phenomic pipeline in the International Mouse Phenotyping Consortium (IMPC) and compare the agreement of the results with the most similar implementation in the literature. OpenStats has significant improvements in speed and scalability compared to existing software packages including a 13-fold improvement in computational time to the current production analysis pipeline in the IMPC. Reduced complexity also promotes FAIR data analysis by providing transparency and benefiting other groups in reproducing and re-usability of the statistical methods and results. OpenStats is freely available under a Creative Commons license at www.bioconductor.org/packages/OpenStats.


2020 ◽  
Vol 36 (12) ◽  
pp. 3632-3636 ◽  
Author(s):  
Weibo Zheng ◽  
Jing Chen ◽  
Thomas G Doak ◽  
Weibo Song ◽  
Ying Yan

Abstract Motivation Programmed DNA elimination (PDE) plays a crucial role in the transitions between germline and somatic genomes in diverse organisms ranging from unicellular ciliates to multicellular nematodes. However, software specific for the detection of DNA splicing events is scarce. In this paper, we describe Accurate Deletion Finder (ADFinder), an efficient detector of PDEs using high-throughput sequencing data. ADFinder can predict PDEs with relatively low sequencing coverage, detect multiple alternative splicing forms in the same genomic location and calculate the frequency for each splicing event. This software will facilitate research of PDEs and all down-stream analyses. Results By analyzing genome-wide DNA splicing events in two micronuclear genomes of Oxytricha trifallax and Tetrahymena thermophila, we prove that ADFinder is effective in predicting large scale PDEs. Availability and implementation The source codes and manual of ADFinder are available in our GitHub website: https://github.com/weibozheng/ADFinder. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (20) ◽  
pp. 3898-3905 ◽  
Author(s):  
Ziyi Li ◽  
Zhijin Wu ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation Samples from clinical practices are often mixtures of different cell types. The high-throughput data obtained from these samples are thus mixed signals. The cell mixture brings complications to data analysis, and will lead to biased results if not properly accounted for. Results We develop a method to model the high-throughput data from mixed, heterogeneous samples, and to detect differential signals. Our method allows flexible statistical inference for detecting a variety of cell-type specific changes. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method compared with existing ones serving similar purpose. Availability and implementation The proposed method is implemented as an R package and is freely available on GitHub (https://github.com/ziyili20/TOAST). Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zachary B Abrams ◽  
Dwayne G Tally ◽  
Lynne V Abruzzo ◽  
Kevin R Coombes

Abstract Summary Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure. Recently developed computational tools such as CytoGPS have enabled large-scale computational analyses of karyotypes. To further enable such analyses, we have now developed RCytoGPS, an R package that takes JSON files generated from CytoGPS.org and converts them into objects in R. This conversion facilitates the analysis and visualizations of karyotype data. In effect this tool streamlines the process of performing large-scale karyotype analyses, thus advancing the field of computational cytogenetic pathology. Availability and Implementation Freely available at https://CRAN.R-project.org/package=RCytoGPS. The code for the underlying CytoGPS software can be found at https://github.com/i2-wustl/CytoGPS. Supplementary information There is no supplementary data.


Author(s):  
Daoliang Li ◽  
Chaoqun Quan ◽  
Zhaoyang Song ◽  
Xiang Li ◽  
Guanghui Yu ◽  
...  

Food scarcity, population growth, and global climate change have propelled crop yield growth driven by high-throughput phenotyping into the era of big data. However, access to large-scale phenotypic data has now become a critical barrier that phenomics urgently must overcome. Fortunately, the high-throughput plant phenotyping platform (HT3P), employing advanced sensors and data collection systems, can take full advantage of non-destructive and high-throughput methods to monitor, quantify, and evaluate specific phenotypes for large-scale agricultural experiments, and it can effectively perform phenotypic tasks that traditional phenotyping could not do. In this way, HT3Ps are novel and powerful tools, for which various commercial, customized, and even self-developed ones have been recently introduced in rising numbers. Here, we review these HT3Ps in nearly 7 years from greenhouses and growth chambers to the field, and from ground-based proximal phenotyping to aerial large-scale remote sensing. Platform configurations, novelties, operating modes, current developments, as well the strengths and weaknesses of diverse types of HT3Ps are thoroughly and clearly described. Then, miscellaneous combinations of HT3Ps for comparative validation and comprehensive analysis are systematically present, for the first time. Finally, we consider current phenotypic challenges and provide fresh perspectives on future development trends of HT3Ps. This review aims to provide ideas, thoughts, and insights for the optimal selection, exploitation, and utilization of HT3Ps, and thereby pave the way to break through current phenotyping bottlenecks in botany.


2020 ◽  
Author(s):  
Inoussa Sanane ◽  
Judith Legrand ◽  
Christine Dillmann ◽  
Frederic Marion-Poll

Lepidopteran pests cause considerable damage to all crops over the world. As larvae are directly responsible for these damages, many research efforts are devoted to find plant cultivars which are resistant against them. However, such studies take time, efforts and are costly, especially when one wants to not only find resistance traits but also evaluate their heritability. We present here a high throughput approach to screen plants for resistance or chemicals for their deterrence, using a leaf-disk consumption assay, which is both suitable for large scale tests and economically affordable. To monitor larvae feeding on leaf disks placed over a layer of agar, we designed 3D models of 50 cages arrays. One webcam can sample simultaneously 3 of such arrays at a rate of 1 image/min, and follow individual feeding activities in each cage as the movements of 150 larvae. The resulting image stacks are first processed with a custom program running under an open-source image analysis package (Icy) to measure the surface of each leaf disk over time. We further developed statistical procedures running under the R package, to analyze the time course of the feeding activities of the larvae and to compare them between treatments. As a test case, we compared how European corn borer larvae respond to quinine, considered as a bitter alkaloid for many organisms, and to Neemazal containing azadirachtin, which is a common antifeedant against pest insects. We found that increasing doses of azadirachtin reduce and delay feeding. However, contrary to our expectation, quinine was found poorly effective at the range of concentrations tested. The 3D printed model of the cage, of the camera holder, the plugins running under Icy, and the R procedures are freely available, and can be modified according to the particular needs of the users.


2019 ◽  
Author(s):  
L Cao ◽  
C Clish ◽  
FB Hu ◽  
MA Martínez-González ◽  
C Razquin ◽  
...  

AbstractMotivationLarge-scale untargeted metabolomics experiments lead to detection of thousands of novel metabolic features as well as false positive artifacts. With the incorporation of pooled QC samples and corresponding bioinformatics algorithms, those measurement artifacts can be well quality controlled. However, it is impracticable for all the studies to apply such experimental design.ResultsWe introduce a post-alignment quality control method called genuMet, which is solely based on injection order of biological samples to identify potential false metabolic features. In terms of the missing pattern of metabolic signals, genuMet can reach over 95% true negative rate and 85% true positive rate with suitable parameters, compared with the algorithm utilizing pooled QC samples. genu-Met makes it possible for studies without pooled QC samples to reduce false metabolic signals and perform robust statistical analysis.Availability and implementationgenuMet is implemented in a R package and available on https://github.com/liucaomics/genuMet under GPL-v2 license.ContactLiming Liang: [email protected] informationSupplementary data are available at ….


Sign in / Sign up

Export Citation Format

Share Document