The power-law distribution of gene family size is driven by the pseudogenisation rate's heterogeneity between gene families

Gene ◽  
2008 ◽  
Vol 414 (1-2) ◽  
pp. 85-94 ◽  
Author(s):  
Timothy Hughes ◽  
David A. Liberles
2021 ◽  
Author(s):  
Kim Vertacnik ◽  
Danielle Herrig ◽  
R Keating Godfrey ◽  
Tom Hill ◽  
Scott Geib ◽  
...  

A central goal in evolutionary biology is to determine the predictability of adaptive genetic changes. Despite many documented cases of convergent evolution at individual loci, little is known about the repeatability of gene family expansions and contractions. To address this void, we examined gene family evolution in the redheaded pine sawfly Neodiprion lecontei, a non-eusocial hymenopteran and exemplar of a pine-specialized lineage evolved from angiosperm-feeding ancestors. After assembling and annotating a draft genome, we manually annotated multiple gene families with chemosensory, detoxification, or immunity functions and characterized their genomic distributions and evolutionary history. Our results suggest that expansions of bitter gustatory receptor (GR), clan 3 cytochrome P450 (CYP3), and antimicrobial peptide (AMP) subfamilies may have contributed to pine adaptation. By contrast, there was no evidence of recent gene family contraction via pseudogenization. Next, we compared the number of genes in these same families across insect taxa that vary in diet, dietary specialization, and social behavior. In Hymenoptera, herbivory was associated with large GR and small olfactory receptor (OR) families, eusociality was associated with large OR and small AMP families, and--unlike investigations among more closely related taxa--ecological specialization was not related to gene family size. Overall, our results suggest that gene families that mediate ecological interactions may expand and contract predictably in response to particular selection pressures, however, the ecological drivers and temporal pace of gene gain and loss likely varies considerably across gene families.


2016 ◽  
Vol 3 (8) ◽  
pp. 160275 ◽  
Author(s):  
Wentian Li ◽  
Oscar Fontanelli ◽  
Pedro Miramontes

The sizes of paralogues—gene families produced by ancestral duplication—are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.


2006 ◽  
Vol 42 (2) ◽  
pp. 373-376 ◽  
Author(s):  
RICHARD ARNOLD ◽  
LAURIE BAUER

Wichmann (2005) discusses the power-law distribution n=ar−b as a description of the relationship between the number of languages n in a language family, and the rank r of that family in a list ordered by decreasing n. Two datasets are used by Wichmann, one from Ethnologue (Grimes 2000), which lists 130 language families, and one from Ruhlen (1987), listing 21 families. We have reanalysed these data and find that the method of fitting a power-law used in the paper is not optimal because it does not allow for a sensible maximum value for the family size n.


2019 ◽  
Author(s):  
Milton Tan ◽  
Anthony K. Redmond ◽  
Helen Dooley ◽  
Ryo Nozu ◽  
Keiichi Sato ◽  
...  

AbstractDue to their key phylogenetic position, cartilaginous fishes, which includes the largest fish speciesRhincodon typus(whale shark), are an important vertebrate lineage for understanding the origin and evolution of vertebrates. However, until recently, this lineage has been understudied in vertebrate genomics. Using newly-generated long read sequences, we produced the best gapless cartilaginous fish genome assembly to date. The assembly has fewer missing ancestral genes thanCallorhinchus milii, which has been widely-used for evolutionary studies up to now. We used the new assembly to study the evolution of gene families in the whale shark and other vertebrates, focusing on historical patterns of gene family origins and loss across early vertebrate evolution, innate immune receptor repertoire evolution, and dynamics of gene family evolution size in relation to gigantism. From inferring the pattern of origin of gene families across the most recent common ancestors of major vertebrate clades, we found that there were many shared gene families between the whale shark and bony vertebrates that were present in the most recent common ancestor of jawed vertebrates, with a large increase in novel genes at the origin of jawed vertebrates independent of whole genome duplication events. The innate immune system in the whale shark, which consisted of diverse pathogen recognition receptors (PRRs) including NOD-like receptors, RIG-like receptors, and Toll-like receptors. We discovered a unique complement of Toll-like receptors and triplication of NOD1 in the whale shark genome. Further, we found diverse patterns of gene family evolution between PRRs within vertebrates demonstrating that the origin of adaptive immunity in jawed vertebrates is more complicated than simply replacing the need for a vast repertoire of germline encoded PRRs. We then studied rates of amino acid substitution and gene family size evolution across origins of vertebrate gigantism. While we found that cartilaginous fishes and giant vertebrates tended to have slower substitution rates than the background rate in vertebrates, the whale shark genome substitution rate was not significantly slower thanCallorhinchus. Furthermore, rates of gene family size evolution varied among giants and the background, suggesting that differences in rate of substitution and gene family size evolution relative to gigantism are decoupled. We found that the gene families that have shifted in duplication rate in whale shark are enriched for genes related to driving cancer in humans, consistent with studies in other giant vertebrates than support the hypothesis that evolution of increased body size requires adaptations that result in reduction of per cell cancer rate.


2018 ◽  
Author(s):  
Peipei Wang ◽  
Bethany M. Moore ◽  
Nicholas L. Panchy ◽  
Fanrui Meng ◽  
Melissa D. Lehti-Shiu ◽  
...  

AbstractGene duplication and loss contribute to gene content differences as well as phenotypic divergence across species. However, the extent to which gene content varies among closely related plant species and the factors responsible for such variation remain unclear. Here, we used the Solanaceae family as a model to investigate differences in gene family size and the likely factors contributing to these differences. We found that genes in highly variable families have high turnover rate and tend to be involved in processes that have diverged between Solanaceae species, whereas genes in low-variability families tend to have housekeeping roles. In addition, genes in high-and low-variability gene families tend to be duplicated by tandem and whole genome duplication, respectively. This finding together with the observation that genes duplicated by different mechanisms experience different selection pressures suggests that duplication mechanism impacts gene family turnover. We explored using pseudogene number as a proxy for gene loss but discovered that a substantial number of pseudogenes are actually products of pseudogene duplication, contrary to the expectation that most plant pseudogenes are remnants of once-functional duplicates. Our findings reveal complex relationships between variation in gene family size, gene functions, duplication mechanism, and evolutionary rate. The patterns of lineage-specific gene family expansion within the Solanaceae provide the foundation for a better understanding of the genetic basis underlying phenotypic diversity in this economically important family.


Genetics ◽  
1996 ◽  
Vol 142 (3) ◽  
pp. 1021-1031 ◽  
Author(s):  
Jianping Hu ◽  
Beth Anderson ◽  
Susan R Wessler

Abstract R and B genes and their homologues encode basic helix-loop-helix (bHLH) transcriptional activators that regulate the anthocyanin biosynthetic pathway in flowering plants. In maize, R/B genes comprise a very small gene family whose organization reflects the unique evolutionary history and genome architecture of maize. To know whether the organization of the R gene family could provide information about the origins of the distantly related grass rice, we characterized members of the R gene family from rice Oryza sativa. Despite being a true diploid, O. sativa has at least two R genes. An active homologue (Ra) with extensive homology with other R genes is located at a position on chromosome 4 previously shown to be in synteny with regions of maize chromosomes 2 and 10 that contain the B and R loci, respectively. A second rice R gene (Rb) of undetermined function was identified on chromosome 1 and found to be present only in rice species with AA genomes. All non-AA species have but one R gene that is Ra-like. These data suggest that the common ancestor shared by maize and rice had a single R gene and that the small R gene families of grasses have arisen recently and independently.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Ghislain Romaric Meleu ◽  
Paulin Yonta Melatagia

AbstractUsing the headers of scientific papers, we have built multilayer networks of entities involved in research namely: authors, laboratories, and institutions. We have analyzed some properties of such networks built from data extracted from the HAL archives and found that the network at each layer is a small-world network with power law distribution. In order to simulate such co-publication network, we propose a multilayer network generation model based on the formation of cliques at each layer and the affiliation of each new node to the higher layers. The clique is built from new and existing nodes selected using preferential attachment. We also show that, the degree distribution of generated layers follows a power law. From the simulations of our model, we show that the generated multilayer networks reproduce the studied properties of co-publication networks.


1993 ◽  
Vol 13 (3) ◽  
pp. 1708-1718 ◽  
Author(s):  
M Schäfer ◽  
D Börsch ◽  
A Hülster ◽  
U Schäfer

We have analyzed a locus of Drosophila melanogaster located at 98C on chromosome 3, which contains two tandemly arranged genes, named Mst98Ca and Mst98Cb. They are two additional members of the Mst(3)CGP gene family by three criteria. (i) Both genes are exclusively transcribed in the male germ line. (ii) Both transcripts encode a protein with a high proportion of the repetitive motif Cys-Gly-Pro. (iii) Their expression is translationally controlled; while transcripts can be detected in diploid stages of spermatogenesis, association with polysomes can be shown only in haploid stages of sperm development. The genes differ markedly from the other members of the gene family in structure; they do not contain introns, they are of much larger size, and they have the Cys-Gly-Pro motifs clustered at the carboxy-terminal end of the encoded proteins. An antibody generated against the Mst98Ca protein recognizes both Mst98C proteins in D. melanogaster. In a male-sterile mutation in which spermiogenesis is blocked before individualization of sperm, both of these proteins are no longer synthesized. This finding provides proof of late translation for the Mst98C proteins and thereby independent proof of translational control of expression. Northern (RNA) and Western immunoblot analyses indicate the presence of homologous gene families in many other Drosophila species. The Mst98C proteins share sequence homology with proteins of the outer dense fibers in mammalian spermatozoa and can be localized to the sperm tail by immunofluorescence with an anti-Mst98Ca antibody.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Zihan Cheng ◽  
Xuemei Zhang ◽  
Wenjing Yao ◽  
Kai Zhao ◽  
Lin Liu ◽  
...  

Abstract Background The Late Embryogenesis-Abundant (LEA) gene families, which play significant roles in regulation of tolerance to abiotic stresses, widely exist in higher plants. Poplar is a tree species that has important ecological and economic values. But systematic studies on the gene family have not been reported yet in poplar. Results On the basis of genome-wide search, we identified 88 LEA genes from Populus trichocarpa and renamed them as PtrLEA. The PtrLEA genes have fewer introns, and their promoters contain more cis-regulatory elements related to abiotic stress tolerance. Our results from comparative genomics indicated that the PtrLEA genes are conserved and homologous to related genes in other species, such as Eucalyptus robusta, Solanum lycopersicum and Arabidopsis. Using RNA-Seq data collected from poplar under two conditions (with and without salt treatment), we detected 24, 22 and 19 differentially expressed genes (DEGs) in roots, stems and leaves, respectively. Then we performed spatiotemporal expression analysis of the four up-regulated DEGs shared by the tissues, constructed gene co-expression-based networks, and investigated gene function annotations. Conclusion Lines of evidence indicated that the PtrLEA genes play significant roles in poplar growth and development, as well as in responses to salt stress.


2021 ◽  
Author(s):  
David A Garcia ◽  
Gregory Fettweis ◽  
Diego M Presman ◽  
Ville Paakinaho ◽  
Christopher Jarzynski ◽  
...  

Abstract Single-molecule tracking (SMT) allows the study of transcription factor (TF) dynamics in the nucleus, giving important information regarding the diffusion and binding behavior of these proteins in the nuclear environment. Dwell time distributions obtained by SMT for most TFs appear to follow bi-exponential behavior. This has been ascribed to two discrete populations of TFs—one non-specifically bound to chromatin and another specifically bound to target sites, as implied by decades of biochemical studies. However, emerging studies suggest alternate models for dwell-time distributions, indicating the existence of more than two populations of TFs (multi-exponential distribution), or even the absence of discrete states altogether (power-law distribution). Here, we present an analytical pipeline to evaluate which model best explains SMT data. We find that a broad spectrum of TFs (including glucocorticoid receptor, oestrogen receptor, FOXA1, CTCF) follow a power-law distribution of dwell-times, blurring the temporal line between non-specific and specific binding, suggesting that productive binding may involve longer binding events than previously believed. From these observations, we propose a continuum of affinities model to explain TF dynamics, that is consistent with complex interactions of TFs with multiple nuclear domains as well as binding and searching on the chromatin template.


Sign in / Sign up

Export Citation Format

Share Document