scholarly journals Sequencing refractory regions in bird genomes are hotspots for accelerated protein evolution

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
R. Huttener ◽  
L. Thorrez ◽  
T. In’t Veld ◽  
M. Granvik ◽  
L. Van Lommel ◽  
...  

Abstract Background Approximately 1000 protein encoding genes common for vertebrates are still unannotated in avian genomes. Are these genes evolutionary lost or are they not yet found for technical reasons? Using genome landscapes as a tool to visualize large-scale regional effects of genome evolution, we reexamined this question. Results On basis of gene annotation in non-avian vertebrate genomes, we established a list of 15,135 common vertebrate genes. Of these, 1026 were not found in any of eight examined bird genomes. Visualizing regional genome effects by our sliding window approach showed that the majority of these "missing" genes can be clustered to 14 regions of the human reference genome. In these clusters, an additional 1517 genes (often gene fragments) were underrepresented in bird genomes. The clusters of “missing” genes coincided with regions of very high GC content, particularly in avian genomes, making them “hidden” because of incomplete sequencing. Moreover, proteins encoded by genes in these sequencing refractory regions showed signs of accelerated protein evolution. As a proof of principle for this idea we experimentally characterized the mRNA and protein products of four "hidden" bird genes that are crucial for energy homeostasis in skeletal muscle: ALDOA, ENO3, PYGM and SLC2A4. Conclusions A least part of the “missing” genes in bird genomes can be attributed to an artifact caused by the difficulty to sequence regions with extreme GC% (“hidden” genes). Biologically, these “hidden” genes are of interest as they encode proteins that evolve more rapidly than the genome wide average. Finally we show that four of these “hidden” genes encode key proteins for energy metabolism in flight muscle.

2019 ◽  
Vol 47 (W1) ◽  
pp. W88-W92 ◽  
Author(s):  
Oren Avram ◽  
Dana Rapoport ◽  
Shir Portugez ◽  
Tal Pupko

Abstract Large-scale mining and analysis of bacterial datasets contribute to the comprehensive characterization of complex microbial dynamics within a microbiome and among different bacterial strains, e.g., during disease outbreaks. The study of large-scale bacterial evolutionary dynamics poses many challenges. These include data-mining steps, such as gene annotation, ortholog detection, sequence alignment and phylogeny reconstruction. These steps require the use of multiple bioinformatics tools and ad-hoc programming scripts, making the entire process cumbersome, tedious and error-prone due to manual handling. This motivated us to develop the M1CR0B1AL1Z3R web server, a ‘one-stop shop’ for conducting microbial genomics data analyses via a simple graphical user interface. Some of the features implemented in M1CR0B1AL1Z3R are: (i) extracting putative open reading frames and comparative genomics analysis of gene content; (ii) extracting orthologous sets and analyzing their size distribution; (iii) analyzing gene presence–absence patterns; (iv) reconstructing a phylogenetic tree based on the extracted orthologous set; (v) inferring GC-content variation among lineages. M1CR0B1AL1Z3R facilitates the mining and analysis of dozens of bacterial genomes using advanced techniques, with the click of a button. M1CR0B1AL1Z3R is freely available at https://microbializer.tau.ac.il/.


2020 ◽  
Vol 2 (7A) ◽  
Author(s):  
Oren Avram ◽  
Dana Rapoport ◽  
Shir Portugez ◽  
Tal Pupko

Large-scale mining and analysis of bacterial datasets contribute to the comprehensive characterization of complex microbial dynamics within a microbiome and among different bacterial strains, e.g., during disease outbreaks. The study of large-scale bacterial evolutionary dynamics poses many challenges. These include data-mining steps, such as gene annotation, ortholog detection, sequence alignment, and phylogeny reconstruction. These steps require the use of multiple bioinformatics tools and ad-hoc programming scripts, making the entire process cumbersome, tedious and error-prone due to manual handling. This motivated us to develop the M1CR0B1AL1Z3R web server, a ‘one-stop shop’ for conducting microbial genomics data analyses via a simple graphical user interface (Avram, et al., Nucleic Acids Res., 2019). Some of the features implemented in M1CR0B1AL1Z3R are: (i) extracting putative open reading frames and comparative genomics analysis of gene content; (ii) extracting orthologous sets and analyzing their size distribution; (iii) analyzing gene presence-absence patterns; (iv) reconstructing a phylogenetic tree based on the extracted orthologous set; (v) inferring GC-content variation among lineages. M1CR0B1AL1Z3R facilitates the mining and analysis of dozens of bacterial genomes using advanced techniques, with the click of a button. M1CR0B1AL1Z3R is freely available at https://microbializer.tau.ac.il/ [https://microbializer.tau.ac.il/].


2021 ◽  
Vol 11 ◽  
Author(s):  
Stephanie van Wyk ◽  
Brenda D. Wingfield ◽  
Lieschen De Vos ◽  
Nicolaas A. van der Merwe ◽  
Emma T. Steenkamp

The Repeat-Induced Point (RIP) mutation pathway is a fungus-specific genome defense mechanism that mitigates the deleterious consequences of repeated genomic regions and transposable elements (TEs). RIP mutates targeted sequences by introducing cytosine to thymine transitions. We investigated the genome-wide occurrence and extent of RIP with a sliding-window approach. Using genome-wide RIP data and two sets of control groups, the association between RIP, TEs, and GC content were contrasted in organisms capable and incapable of RIP. Based on these data, we then set out to determine the extent and occurrence of RIP in 58 representatives of the Ascomycota. The findings were summarized by placing each of the fungi investigated in one of six categories based on the extent of genome-wide RIP. In silico RIP analyses, using a sliding-window approach with stringent RIP parameters, implemented simultaneously within the same genetic context, on high quality genome assemblies, yielded superior results in determining the genome-wide RIP among the Ascomycota. Most Ascomycota had RIP and these mutations were particularly widespread among classes of the Pezizomycotina, including the early diverging Orbiliomycetes and the Pezizomycetes. The most extreme cases of RIP were limited to representatives of the Dothideomycetes and Sordariomycetes. By contrast, the genomes of the Taphrinomycotina and Saccharomycotina contained no detectable evidence of RIP. Also, recent losses in RIP combined with controlled TE proliferation in the Pezizomycotina subphyla may promote substantial genome enlargement as well as the formation of sub-genomic compartments. These findings have broadened our understanding of the taxonomic range and extent of RIP in Ascomycota and how this pathway affects the genomes of fungi harboring it.


Author(s):  
Arli A. Parikesit ◽  
Lydia Steiner ◽  
Peter F. Stadler ◽  
Sonja J. Prohaska

Most investigations into the large-scale patterns of protein evolution are based on gene annotations that have been compiled in reference databases. The use of these resources for quantitative comparisons, however, is complicated by sometimes vast differences in coverage. More importantly, however, we also observe substantial ascertainment biases that cannot be removed by simple normalization procedures. A striking example is provided by the correlations between protein domains. We observe that statistics derived from different computational gene annotation procedure show dramatic discrepancies, and even qualitative changes from negative to positive correlation, when compared to statistics obtained from annotation databases.________________________________________GRAPHICAL ABSTRACT


2006 ◽  
Vol 11 (3) ◽  
pp. 236-246 ◽  
Author(s):  
Laurence H. Lamarcq ◽  
Bradley J. Scherer ◽  
Michael L. Phelan ◽  
Nikolai N. Kalnine ◽  
Yen H. Nguyen ◽  
...  

A method for high-throughput cloning and analysis of short hairpin RNAs (shRNAs) is described. Using this approach, 464 shRNAs against 116 different genes were screened for knockdown efficacy, enabling rapid identification of effective shRNAs against 74 genes. Statistical analysis of the effects of various criteria on the activity of the shRNAs confirmed that some of the rules thought to govern small interfering RNA (siRNA) activity also apply to shRNAs. These include moderate GC content, absence of internal hairpins, and asymmetric thermal stability. However, the authors did not find strong support for positionspecific rules. In addition, analysis of the data suggests that not all genes are equally susceptible to RNAinterference (RNAi).


Genetics ◽  
2002 ◽  
Vol 162 (4) ◽  
pp. 1805-1810 ◽  
Author(s):  
Martin J Lercher ◽  
Nick G C Smith ◽  
Adam Eyre-Walker ◽  
Laurence D Hurst

AbstractThe large-scale systematic variation in nucleotide composition along mammalian and avian genomes has been a focus of the debate between neutralist and selectionist views of molecular evolution. Here we test whether the compositional variation is due to mutation bias using two new tests, which do not assume compositional equilibrium. In the first test we assume a standard population genetics model, but in the second we make no assumptions about the underlying population genetics. We apply the tests to single-nucleotide polymorphism data from noncoding regions of the human genome. Both models of neutral mutation bias fit the frequency distributions of SNPs segregating in low- and medium-GC-content regions of the genome adequately, although both suggest compositional nonequilibrium. However, neither model fits the frequency distribution of SNPs from the high-GC-content regions. In contrast, a simple population genetics model that incorporates selection or biased gene conversion cannot be rejected. The results suggest that mutation biases are not solely responsible for the compositional biases found in noncoding regions.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Stephan Fischer ◽  
Marc Dinh ◽  
Vincent Henry ◽  
Philippe Robert ◽  
Anne Goelzer ◽  
...  

AbstractDetailed whole-cell modeling requires an integration of heterogeneous cell processes having different modeling formalisms, for which whole-cell simulation could remain tractable. Here, we introduce BiPSim, an open-source stochastic simulator of template-based polymerization processes, such as replication, transcription and translation. BiPSim combines an efficient abstract representation of reactions and a constant-time implementation of the Gillespie’s Stochastic Simulation Algorithm (SSA) with respect to reactions, which makes it highly efficient to simulate large-scale polymerization processes stochastically. Moreover, multi-level descriptions of polymerization processes can be handled simultaneously, allowing the user to tune a trade-off between simulation speed and model granularity. We evaluated the performance of BiPSim by simulating genome-wide gene expression in bacteria for multiple levels of granularity. Finally, since no cell-type specific information is hard-coded in the simulator, models can easily be adapted to other organismal species. We expect that BiPSim should open new perspectives for the genome-wide simulation of stochastic phenomena in biology.


Sign in / Sign up

Export Citation Format

Share Document