scholarly journals Learning retention mechanisms and evolutionary parameters of duplicate genes from their expression data

2020 ◽  
Author(s):  
Michael DeGiorgio ◽  
Raquel Assis

AbstractLearning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. To date, only one method—CDROM—has been developed with this goal in mind. In particular, CDROM employs gene expression distances as proxies for functional divergence, and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the underlying parameters of duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built upon a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents the best available method for classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby also highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.

Author(s):  
Michael DeGiorgio ◽  
Raquel Assis

Abstract Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.


2018 ◽  
Author(s):  
Xueyuan Jiang ◽  
Raquel Assis

AbstractGene duplication has played an important role in the evolution and domestication of flowering plants. Yet little is known about how plant duplicate genes evolve and are retained over long timescales, particularly those arising from small-scale duplication (SSD) rather than whole-genome duplication (WGD) events. Here we address this question in the Poaceae (grass) family by analyzing gene expression data from nine tissues of Brachypodium distachyon, Oryza sativa japonica (rice), and Sorghum bicolor (sorghum). Consistent with theoretical predictions, expression profiles of most grass genes are conserved after SSD, suggesting that functional conservation is the primary outcome of SSD in grasses. However, we also uncover support for widespread functional divergence, much of which occurs asymmetrically via the process of neofunctionalization. Moreover, neofunctionalization preferentially targets younger (child) duplicate gene copies, is associated with RNA-mediated duplication, and occurs quickly after duplication. Further analysis reveals that functional divergence of SSD-derived genes is positively correlated with both sequence divergence and tissue specificity in all three grass species, and particularly with anther expression in B. distachyon. Therefore, as found in many animal species, SSD-derived grass genes often undergo rapid functional divergence that may be driven by natural selection on male-specific phenotypes.


2014 ◽  
Author(s):  
Raquel Assis ◽  
Doris Bachtrog

Gene duplication provides raw material for the evolution of functional innovation. We recently developed a phylogenetic method to classify the evolutionary processes underlying the retention and functional evolution of duplicate genes by quantifying divergence of their gene expression profiles. Here, we apply our method to pairs of duplicate genes in eight mammalian genomes, using data from 11 distinct tissues to construct spatial gene expression profiles. We find that young mammalian duplicates are often functionally conserved, and that functional divergence gradually increases with evolutionary distance between species. Examination of expression patterns in genes with conserved and new functions supports the ?out-of-testes? hypothesis, in which new genes arise with testis-specific functions and acquire functions in other tissues over time. While new functions tend to be tissue-specific, there is no bias toward expression in any particular tissue. Thus, duplicate genes acquire a diversity of functions outside of the testes, possibly contributing to the origin of a multitude of complex phenotypes during mammalian evolution.


2016 ◽  
Author(s):  
Kousuke Hanada ◽  
Ayumi Tezuka ◽  
Masafumi Nozawa ◽  
Yutaka Suzuki ◽  
Sumio Sugano ◽  
...  

AbstractLineage-specifically duplicated genes likely contribute to the phenotypic divergence in closely related species. However, neither the frequency of duplication events nor the degree of selective pressures immediately after gene duplication is clear in the speciation process. Plants have substantially higher gene duplication rates than most other eukaryotes. Here, using Illumina short reads from Arabidopsis halleri, which has highly qualified plant genomes in close species (Brassica rapa, A. thaliana and A. lyrata), we succeeded in generating orthologous gene groups among B. rapa, A. thaliana, A. lyrata and A. halleri. The frequency of duplication events in the Arabidopsis lineage was approximately 10 times higher than the frequency inferred by comparative genomics of Arabidopsis, poplar, rice and moss. Of the currently retained genes in A. halleri, 11–24% had undergone gene duplication in the Arabidopsis lineage. To examine the degree of selective pressure for duplicated genes, we calculated the ratios of nonsynonymous to synonymous substitution rates (KA/KS) in the A. halleri-lyrata and A. halleri lineages. Using a maximum-likelihood framework, we examined positive (KA/KS > 1) and purifying selection (KA/KS < 1) at a significant level (P < 0.01). Duplicate genes tended to have a higher proportion of positive selection compared with non-duplicated genes. More interestingly, we found that functional divergence of duplicated genes was accelerated several million years after gene duplication at a higher proportion than immediately after gene duplication.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8813 ◽  
Author(s):  
Kyle T. David ◽  
Jamie R. Oaks ◽  
Kenneth M. Halanych

Background Eukaryotic genes typically form independent evolutionary lineages through either speciation or gene duplication events. Generally, gene copies resulting from speciation events (orthologs) are expected to maintain similarity over time with regard to sequence, structure and function. After a duplication event, however, resulting gene copies (paralogs) may experience a broader set of possible fates, including partial (subfunctionalization) or complete loss of function, as well as gain of new function (neofunctionalization). This assumption, known as the Ortholog Conjecture, is prevalent throughout molecular biology and notably plays an important role in many functional annotation methods. Unfortunately, studies that explicitly compare evolutionary processes between speciation and duplication events are rare and conflicting. Methods To provide an empirical assessment of ortholog/paralog evolution, we estimated ratios of nonsynonymous to synonymous substitutions (ω = dN/dS) for 251,044 lineages in 6,244 gene trees across 77 vertebrate taxa. Results Overall, we found ω to be more similar between lineages descended from speciation events (p < 0.001) than lineages descended from duplication events, providing strong support for the Ortholog Conjecture. The asymmetry in ω following duplication events appears to be largely driven by an increase along one of the paralogous lineages, while the other remains similar to the parent. This trend is commonly associated with neofunctionalization, suggesting that gene duplication is a significant mechanism for generating novel gene functions.


2009 ◽  
Vol 91 (4) ◽  
pp. 267-280 ◽  
Author(s):  
KENTARO M. TANAKA ◽  
K. RYO TAKAHASI ◽  
TOSHIYUKI TAKANO-SHIMIZU

SummarySegmental duplications are enriched within many eukaryote genomes, and their potential consequence is gene duplication. While previous theoretical studies of gene duplication have mainly focused on the gene silencing process after fixation, the process leading to fixation is even more important for segmental duplications, because the majority of duplications would be lost before reaching a significant frequency in a population. Here, by a series of computer simulations, we show that purifying selection against loss-of-function mutations increases the fixation probability of a new duplicate gene, especially when the gene is haplo-insufficient. Theoretically, the probability of simultaneous preservation of both duplicate genes becomes twice the loss-of-function mutation rate (uc) when the population size (N), the degree of dominance of mutations (h) and the recombination rate between the duplicate genes (c) are all sufficiently large (Nuc>1, h>0·1 and c>uc). The preservation probability declines rapidly with h and becomes 0 when h=0 (haplo-sufficiency). We infer that masking deleterious loss-of-function mutations give duplicate genes an immediate selective advantage and, together with effects of increased gene dosage, would predominantly determine the fates of the duplicate genes in the early phase of their evolution.


Author(s):  
Guanjing Hu ◽  
Corrinne E Grover ◽  
Mark A Arick ◽  
Meiling Liu ◽  
Daniel G Peterson ◽  
...  

Abstract Polyploidy is a widespread phenomenon throughout eukaryotes. Due to the coexistence of duplicated genomes, polyploids offer unique challenges for estimating gene expression levels, which is essential for understanding the massive and various forms of transcriptomic responses accompanying polyploidy. Although previous studies have explored the bioinformatics of polyploid transcriptomic profiling, the causes and consequences of inaccurate quantification of transcripts from duplicated gene copies have not been addressed. Using transcriptomic data from the cotton genus (Gossypium) as an example, we present an analytical workflow to evaluate a variety of bioinformatic method choices at different stages of RNA-seq analysis, from homoeolog expression quantification to downstream analysis used to infer key phenomena of polyploid expression evolution. In general, EAGLE-RC and GSNAP-PolyCat outperform other quantification pipelines tested, and their derived expression dataset best represents the expected homoeolog expression and co-expression divergence. The performance of co-expression network analysis was less affected by homoeolog quantification than by network construction methods, where weighted networks outperformed binary networks. By examining the extent and consequences of homoeolog read ambiguity, we illuminate the potential artifacts that may affect our understanding of duplicate gene expression, including an overestimation of homoeolog co-regulation and the incorrect inference of subgenome asymmetry in network topology. Taken together, our work points to a set of reasonable practices that we hope are broadly applicable to the evolutionary exploration of polyploids.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yuan Lu ◽  
Mikki Boswell ◽  
William Boswell ◽  
Raquel Ybanez Salinas ◽  
Markita Savage ◽  
...  

Abstract Background Studying functional divergences between paralogs that originated from genome duplication is a significant topic in investigating molecular evolution. Genes that exhibit basal level cyclic expression patterns including circadian and light responsive genes are important physiological regulators. Temporal shifts in basal gene expression patterns are important factors to be considered when studying genetic functions. However, adequate efforts have not been applied to studying basal gene expression variation on a global scale to establish transcriptional activity baselines for each organ. Furthermore, the investigation of cyclic expression pattern comparisons between genome duplication created paralogs, and potential functional divergence between them has been neglected. To address these questions, we utilized a teleost fish species, Xiphophorus maculatus, and profiled gene expression within 9 organs at 3-h intervals throughout a 24-h diurnal period. Results Our results showed 1.3–21.9% of genes in different organs exhibited cyclic expression patterns, with eye showing the highest fraction of cycling genes while gonads yielded the lowest. A majority of the duplicated gene pairs exhibited divergences in their basal level expression patterns wherein only one paralog exhibited an oscillating expression pattern, or both paralogs exhibit oscillating expression patterns, but each gene duplicate showed a different peak expression time, and/or in different organs. Conclusions These observations suggest cyclic genes experienced significant sub-, neo-, or non-functionalization following the teleost genome duplication event. In addition, we developed a customized, web-accessible, gene expression browser to facilitate data mining and data visualization for the scientific community.


2020 ◽  
Vol 37 (8) ◽  
pp. 2322-2331
Author(s):  
Carl J Dyson ◽  
Michael A D Goodisman

Abstract Gene duplication serves a critical role in evolutionary adaptation by providing genetic raw material to the genome. The evolution of duplicated genes may be influenced by epigenetic processes such as DNA methylation, which affects gene function in some taxa. However, the manner in which DNA methylation affects duplicated genes is not well understood. We studied duplicated genes in the honeybee Apis mellifera, an insect with a highly sophisticated social structure, to investigate whether DNA methylation was associated with gene duplication and genic evolution. We found that levels of gene body methylation were significantly lower in duplicate genes than in single-copy genes, implicating a possible role of DNA methylation in postduplication gene maintenance. Additionally, we discovered associations of gene body methylation with the location, length, and time since divergence of paralogous genes. We also found that divergence in DNA methylation was associated with divergence in gene expression in paralogs, although the relationship was not completely consistent with a direct link between DNA methylation and gene expression. Overall, our results provide further insight into genic methylation and how its association with duplicate genes might facilitate evolutionary processes and adaptation.


Gene ◽  
2008 ◽  
Vol 426 (1-2) ◽  
pp. 65-71 ◽  
Author(s):  
Mehdi Layeghifard ◽  
Razieh Rabani ◽  
Leila Pirhaji ◽  
Bagher Yakhchali

Sign in / Sign up

Export Citation Format

Share Document