scholarly journals Patternize: An R Package For Quantifying Color Pattern Variation

2017 ◽  
Author(s):  
Steven M. Van Belleghem ◽  
Riccardo Papa ◽  
Humberto Ortiz-Zuazaga ◽  
Frederik Hendrickx ◽  
Chris Jiggins ◽  
...  

The use of image data to quantify, study and compare variation in the colors and patterns of organisms requires the alignment of images to establish homology, followed by color-based segmentation of images. Here we describe an R package for image alignment and segmentation that has applications to quantify color patterns in a wide range of organisms. patternize is an R package that quantifies variation in color patterns obtained from image data. patternize first defines homology between pattern positions across specimens either through manually placed homologous landmarks or automated image registration. Pattern identification is performed by categorizing the distribution of colors using an RGB threshold, k-means clustering or watershed transformation. We demonstrate that patternize can be used for quantification of the color patterns in a variety of organisms by analyzing image data for butterflies, guppies, spiders and salamanders. Image data can be compared between sets of specimens, visualized as heatmaps and analyzed using principal component analysis (PCA). patternize has potential applications for fine scale quantification of color pattern phenotypes in population comparisons, genetic association studies and investigating the basis of color pattern variation across a wide range of organisms.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Matthias Munz ◽  
Inken Wohlers ◽  
Eric Simon ◽  
Tobias Reinberger ◽  
Hauke Busch ◽  
...  

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).


2019 ◽  
Author(s):  
Drew C. Wham ◽  
Briana Ezray ◽  
Heather M. Hines

ABSTRACTA wide range of research relies upon the accurate and repeatable measurement of the degree to which organisms resemble one another. Here, we present an unsupervised workflow for analyzing the relationships between organismal color patterns. This workflow utilizes several recent advancements in deep learning based computer vision techniques to calculate perceptual distance. We validate this approach using previously published datasets surrounding diverse applications of color pattern analysis including mimicry, population differentiation, heritability, and development. We demonstrate that our approach is able to reproduce the biologically relevant color pattern relationships originally reported in these studies. Importantly, these results are achieved without any task-specific training. In many cases, we were able to reproduce findings directly from original photographs or plates with minimum standardization, avoiding the need for intermediate representations such as a cartoonized images or trait matrices. We then present two artificial datasets designed to highlight how this approach handles aspects of color patterns, such as changes in pattern location and the perception of color contrast. These results suggest that this approach will generalize well to support the study of a wide range of biological processes in a diverse set of taxa while also accommodating a variety of data formats, preprocessing techniques, and study designs.


2016 ◽  
Author(s):  
Damian Brzyski ◽  
Christine B. Peterson ◽  
Piotr Sobczyk ◽  
Emmanuel J. Candés ◽  
Malgorzata Bogdan ◽  
...  

AbstractWith the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on pre-screening to identify the level of resolution of distinct hypotheses. We show how FDR controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single marker and multivariate regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the NFBC66 cohort study.


2017 ◽  
Author(s):  
Paul Bastide ◽  
Cécile Ané ◽  
Stéphane Robin ◽  
Mahendra Mariadassou

AbstractTo study the evolution of several quantitative traits, the classical phylogenetic comparative framework consists of a multivariate random process running along the branches of a phylogenetic tree. The Ornstein-Uhlenbeck (OU) process is sometimes preferred to the simple Brownian Motion (BM) as it models stabilizing selection toward an optimum. The optimum for each trait is likely to be changing over the long periods of time spanned by large modern phylogenies. Our goal is to automatically detect the position of these shifts on a phylogenetic tree, while accounting for correlations between traits, which might exist because of structural or evolutionary constraints. We show that, in the presence shifts, phylogenetic Principal Component Analysis (pPCA) fails to decorrelate traits efficiently, so that any method aiming at finding shift needs to deal with correlation simultaneously. We introduce here a simplification of the full multivariate OU model, named scalar OU (scOU), which allows for noncausal correlations and is still computationally tractable. We extend the equivalence between the OU and a BM on a re-scaled tree to our multivariate framework. We describe an Expectation Maximization algorithm that allows for a maximum likelihood estimation of the shift positions, associated with a new model selection criterion, accounting for the identifiability issues for the shift localization on the tree. The method, freely available as an R-package (PhylogeneticEM) is fast, and can deal with missing values. We demonstrate its efficiency and accuracy compared to another state-of-the-art method (ℓ1ou) on a wide range of simulated scenarios, and use this new framework to re-analyze recently gathered datasets on New World Monkeys and Anolis lizards.


2020 ◽  
Author(s):  
Jennifer J. Valvo ◽  
F. Helen Rodd ◽  
David Houle ◽  
J. David Aponte ◽  
Mitchel J. Daniel ◽  
...  

AbstractColor variation is one of the most obvious examples of variation in nature. Objective quantification and interpretation of variation in color and complex patterns is challenging. Assessment of variation in color patterns is limited by the reduction of color into categorical measures and lack of spatial information. We present Colormesh as a novel method for analyzing complex color patterns that offers unique capabilities. Compared to other methods, Colormesh maintains the continuous measure of color at individual sampling points throughout the pattern. This is particularly useful for analyses of variation in color patterns, whether interest is in specific locations or the pattern as a whole. In our approach, the use of Delaunay triangulation to determine sampling location eliminates the need for color patterns to have clearly defined pattern elements, and users are not required to identify discrete color categories. This method is complementary to several other methods available for color pattern quantification, and can be usefully deployed to address a wide range of questions about color pattern variation.


2015 ◽  
Author(s):  
Zheng Ning ◽  
Yakov A. Tsepilov ◽  
Sodbo Zh. Sharapov ◽  
Alexander K. Grishenko ◽  
Xiao Feng ◽  
...  

AbstractThe ever-growing genome-wide association studies (GWAS) have revealed widespread pleiotropy. To exploit this, various methods which consider variant association with multiple traits jointly have been developed. However, most effort has been put on improving discovery power: how to replicate and interpret these discovered pleiotropic loci using multivariate methods has yet to be discussed fully. Using only multiple publicly available single-trait GWAS summary statistics, we develop a fast and flexible multi-trait framework that contains modules for (i) multi-trait genetic discovery, (ii) replication of locus pleiotropic profile, and (iii) multi-trait conditional analysis. The procedure is able to handle any level of sample overlap. As an empirical example, we discovered and replicated 23 novel pleiotropic loci for human anthropometry and evaluated their pleiotropic effects on other traits. By applying conditional multivariate analysis on the 23 loci, we discovered and replicated two additional multi-trait associated SNPs. Our results provide empirical evidence that multi-trait analysis allows detection of additional, replicable, highly pleiotropic genetic associations without genotyping additional individuals. The methods are implemented in a free and open source R package MultiABEL.Author summaryBy analyzing large-scale genomic data, geneticists have revealed widespread pleiotropy, i.e. single genetic variation can affect a wide range of complex traits. Methods have been developed to discover such genetic variants. However, we still lack insights into the relevant genetic architecture - What more can we learn from knowing the effects of these genetic variants?Here, we develop a fast and flexible statistical analysis procedure that includes discovery, replication, and interpretation of pleiotropic effects. The whole analysis pipeline only requires established genetic association study results. We also provide the mathematical theory behind the pleiotropic genetic effects testing.Most importantly, we show how a replication study can be essential to reveal new biology rather than solely increasing sample size in current genomic studies. For instance, we show that, using our proposed replication strategy, we can detect the difference in genetic effects between studies of different geographical origins.We applied the method to the GIANT consortium anthropometric traits to discover new genetic associations, replicated in the UK Biobank, and provided important new insights into growth and obesity.Our pipeline is implemented in an open-source R package MultiABEL, sufficiently efficient that allows researchers to immediately apply on personal computers in minutes.


Genetics ◽  
2003 ◽  
Vol 165 (3) ◽  
pp. 1117-1126
Author(s):  
Wei Geng ◽  
Pamela Cosman ◽  
Joong-Hwan Baek ◽  
Charles C Berry ◽  
William R Schafer

Abstract Genetic analysis of nervous system function relies on the rigorous description of behavioral phenotypes. However, standard methods for classifying the behavioral patterns of mutant Caenorhabditis elegans rely on human observation and are therefore subjective and imprecise. Here we describe the application of machine learning to quantitatively define and classify the behavioral patterns of C. elegans nervous system mutants. We have used an automated tracking and image processing system to obtain measurements of a wide range of morphological and behavioral features from recordings of representative mutant types. Using principal component analysis, we represented the behavioral patterns of eight mutant types as data clouds distributed in multidimensional feature space. Cluster analysis using the k-means algorithm made it possible to quantitatively assess the relative similarities between different behavioral phenotypes and to identify natural phenotypic clusters among the data. Since the patterns of phenotypic similarity identified in this study closely paralleled the functional similarities of the mutant gene products, the complex phenotypic signatures obtained from these image data appeared to represent an effective diagnostic of the mutants' underlying molecular defects.


Sign in / Sign up

Export Citation Format

Share Document