scholarly journals Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy

2018 ◽  
Vol 116 (3) ◽  
pp. 900-908 ◽  
Author(s):  
Hamutal Arbel ◽  
Sumanta Basu ◽  
William W. Fisher ◽  
Ann S. Hammonds ◽  
Kenneth H. Wan ◽  
...  

Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.

2018 ◽  
Author(s):  
Hamutal Arbel ◽  
William W. Fisher ◽  
Ann S. Hammonds ◽  
Kenneth H. Wan ◽  
Soo Park ◽  
...  

AbstractIdentifying functional enhancers elements in metazoan systems is a major challenge. For example, large-scale validation of enhancers predicted by ENCODE reveal false positive rates of at least 70%. Here we use the pregrastrula patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held out data results from heterogeneity of functional signatures in enhancer elements. We show that two classes of enhancer are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, over 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well predicted elements is composed predominantly of enhancers driving multi-stage, segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome, 916 of which are novel. An analysis of 32 novel SDEs using wholemount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.Significance StatementWe demonstrate a high accuracy method for predicting enhancers genome wide with > 85% precision as validated by transgenic reporter assays in Drosophila embryos. This is the first time such accuracy has been achieved in a metazoan system, allowing us to predict with high-confidence 1640 enhancers, 916 of which are novel. The predicted enhancers are demarcated by heterogeneous collections of epigenetic marks; many strong enhancers are free from classical indicators of activity, including H3K27ac, but are bound by key transcription factors. H3K27ac, often used as a one-dimensional predictor of enhancer activity, is an uninformative parameter in our data.


2006 ◽  
Vol 38 (7) ◽  
pp. 794-800 ◽  
Author(s):  
Anelia Horvath ◽  
Sosipatros Boikos ◽  
Christoforos Giatzakis ◽  
Audrey Robinson-White ◽  
Lionel Groussin ◽  
...  

Genetics ◽  
2013 ◽  
Vol 196 (3) ◽  
pp. 829-840 ◽  
Author(s):  
Timothy M. Beissinger ◽  
Candice N. Hirsch ◽  
Brieanne Vaillancourt ◽  
Shweta Deshpande ◽  
Kerrie Barry ◽  
...  

1999 ◽  
Vol 17 (S1) ◽  
pp. S49-S54 ◽  
Author(s):  
J.S. Barnholtz ◽  
M. de Andrade ◽  
G.P. Page ◽  
T.M. King ◽  
L.E. Peterson ◽  
...  

2007 ◽  
Vol 144B (2) ◽  
pp. 193-199 ◽  
Author(s):  
M.A. Escamilla ◽  
A. Ontiveros ◽  
H. Nicolini ◽  
H. Raventos ◽  
R. Mendoza ◽  
...  

1999 ◽  
Vol 17 (S1) ◽  
pp. S621-S626
Author(s):  
Li Hsu ◽  
Corinne Aragaki ◽  
Filemon Quiaoit ◽  
Xiangjing Wang ◽  
Xiubin Xu ◽  
...  

2021 ◽  
Author(s):  
◽  
Noémie Valenza-Troubat

<p><b>Understanding the relationship between DNA sequence variation and the diversity of observable traits across the tree of life is a central research theme in biology. In all organisms, most traits vary continuously between individuals. Explaining the genetic basis of this quantitative variation requires disentangling genetic from non-genetic factors, as well as their interactions. The identification of causal genetic variants yields fundamental insights into how evolution creates diversity across the tree of life. Ultimately, this information can be used for medical, environmental and agricultural applications. Aquaculture is an industry that is experiencing significant global growth and is benefiting from the advances of genomic research. Genomic information helps to improve complex commercial phenotypes such as growth traits, which are easily quantified visually, but influenced by polygenes and multiple environmental factors, such as temperature. In the context of a global food crisis and environmental change, there is an urgent need not only to understand which genetic variants are potential candidates for selection gains, but also how the architecture of these traits are composed (e.g. monogenes, polygenes) and how they are influenced by and interact with the environment. The overall goal of this thesis research was to generate a genome-wide multi-omics dataset matched with exhaustive phenotypic information derived from a F0-F1 pedigree to investigate the quantitative genetic basis of growth in the New Zealand silver trevally (Pseudocaranx georgianus). These data were used to identify genomic regions that co-segregate with growth traits, and to describe the regulation of the genes involved in response to temperature fluctuations. The findings of this research helped gain fundamental insights into the genotype–phenotype map in an important teleost species and understand its ability to dynamically respond to temperature variations. This will ultimately support the establishment of a genomics-informed New Zealand aquaculture breeding programme. </b></p> <p>Chapter 1 of this thesis provides an overview of how genes interact with the environment to produce various growth phenotypes and how an understanding of this is important in aquaculture. This first chapter provides the deeper context for the research in subsequent data chapters. </p> <p>Chapter 2 describes the study population, the collection of phenotypic and genotypic data, and a first description of the genetic parameters of growth traits in trevally. A combination of Whole Genome Sequencing (WGS) and Genotyping-By-Sequencing (GBS) techniques were used to generate 60 thousand Single Nucleotide Polymorphism (SNP) markers for individuals in a two-generation pedigree. Together with phenotypic data, the genotyping data were used to reconstruct the pedigree, measure inbreeding levels, and estimate heritability for 10 growth traits. Parents were identified for 63% of the offspring and successful pedigree reconstruction indicated highly uneven contributions of each parent, and between the sexes, to the subsequent generation. The average inbreeding levels did not change between generations, but were significantly different between families. Growth patterns were found to be similar to that of other carangids and subject to seasonal variations. Heritability as well as genetic and phenotypic correlations were estimated using both a pedigree and a genomic relatedness matrix. All growth trait heritability estimates and correlations were found to be consistently high and positively correlated to each other. </p> <p>In Chapter 3, genotypic and phenotypic data were used to carry out linkage mapping and a genome-wide association study (GWAS) to map quantitative trait loci (QTLs) associated with growth differences in the F1 population. A linkage map was generated using the largest family, which allowed to scan for rare variants associated with the traits. The linkage map reported in this thesis is the first one for the Pseudocaranx genus and one of the densest for the carangid family. It included 19,861 SNPs contained in 24 linkage groups, which correspond to the 24 trevally chromosomes. Eight significant QTLs associated with height, length and weight were discovered on three linkage groups. Using GWAS, 113 SNPs associated with nine traits were identified and 29 genetic growth hot spots were uncovered. Two of the GWAS markers co-located with the QTLs discovered with the linkage mapping analysis. This demonstrates that combining QTL mapping and GWAS represents a powerful approach for the identification and validation of loci controlling complex phenotypes, such as growth, and provides important insights into the genetic architecture of these traits. </p> <p>Chapter 4, the last data chapter, investigates plasticity in gene expression patterns and growth of juvenile trevally, in response to different temperatures. Temperature conditions were experimentally manipulated for 1 month to mimic seasonal extremes. Phenotypic differences in growth were measured in 400 individuals, and the gene expression patterns of the pituitary gland and the liver were compared across treatments in a subset of 100 individuals, using RNA sequencing. Results showed that growth increased 50% more in the warmer compared with the colder condition, suggesting that temperature has a large impact on the metabolic activity associated with growth. We were able to annotate 27,887 gene models and found 39 differentially expressed genes (DEGs) in the pituitary, and 238 in the liver. Of these, 6 DEGs showed a common expression pattern between the tissues. Annotated blast matches of all DEGs revealed genes linked to major pathways affecting metabolism and reproduction. Our results indicate that native New Zealand trevally exhibit predictable plastic regulatory responses to temperature stress and the genes identified provide excellent for selective breeding objectives and studied how populations may adapt to increasing temperatures.</p> <p>Finally, Chapter 5 discusses the implications, future directions, and application of this research for trevally and other breeding programmes. It more broadly highlights the insights that were gained on the genetic architecture of growth, and the role of temperature in interacting and modulating genes involved in plastic growth responses.</p>


2021 ◽  
Author(s):  
Sabrina Lehmann ◽  
Bibi Atika ◽  
Daniela Grossmann ◽  
Christian Schmitt-Engel ◽  
Nadi Strohlein ◽  
...  

Abstract Background Functional genomics uses unbiased systematic genome-wide gene disruption or analyzes natural variations such as gene expression profiles of different tissues from multicellular organisms to link gene functions to particular phenotypes. Functional genomics approaches are of particular importance to identify large sets of genes that are specifically important for a particular biological process beyond known candidate genes, or when the process has not been studied with genetic methods before. Results Here, we present a large set of genes whose disruption interferes with the function of the odoriferous defensive stink glands of the red flour beetle Tribolium castaneum. This gene set is the result of a large-scale systematic phenotypic screen using a reverse genetics strategy based on RNA interference applied in a genome-wide forward genetics manner. In this first-pass screen, 130 genes were identified, of which 69 genes could be confirmed to cause knock-down gland phenotypes, which vary from necrotic tissue and irregular reservoir size to irregular color or separation of the secreted gland compounds. The knock-down of 13 genes caused specifically a strong reduction of para-benzoquinones, suggesting a specific function in the synthesis of these toxic compounds. Only 14 of the 69 confirmed gland genes are differentially overexpressed in stink gland tissue and thus could have been detected in a transcriptome-based analysis. Moreover, of the 29 previously transcriptomics-identified genes causing a gland phenotype, only one gene was recognized by this phenotypic screen despite the fact that 13 of them were covered by the screen. Conclusion Our results indicate the importance of combining diverse and independent methodologies to identify genes necessary for the function of a certain biological tissue, as the different approaches do not deliver redundant results but rather complement each other. The presented phenotypic screen together with a transcriptomics approach are now providing a set of close to hundred genes important for odoriferous defensive stink gland physiology in beetles.


Sign in / Sign up

Export Citation Format

Share Document