Sequencing refractory regions in bird genomes are hotspots for accelerated protein evolution

Abstract Background Approximately 1000 protein encoding genes common for vertebrates are still unannotated in avian genomes. Are these genes evolutionary lost or are they not yet found for technical reasons? Using genome landscapes as a tool to visualize large-scale regional effects of genome evolution, we reexamined this question. Results On basis of gene annotation in non-avian vertebrate genomes, we established a list of 15,135 common vertebrate genes. Of these, 1026 were not found in any of eight examined bird genomes. Visualizing regional genome effects by our sliding window approach showed that the majority of these "missing" genes can be clustered to 14 regions of the human reference genome. In these clusters, an additional 1517 genes (often gene fragments) were underrepresented in bird genomes. The clusters of “missing” genes coincided with regions of very high GC content, particularly in avian genomes, making them “hidden” because of incomplete sequencing. Moreover, proteins encoded by genes in these sequencing refractory regions showed signs of accelerated protein evolution. As a proof of principle for this idea we experimentally characterized the mRNA and protein products of four "hidden" bird genes that are crucial for energy homeostasis in skeletal muscle: ALDOA, ENO3, PYGM and SLC2A4. Conclusions A least part of the “missing” genes in bird genomes can be attributed to an artifact caused by the difficulty to sequence regions with extreme GC% (“hidden” genes). Biologically, these “hidden” genes are of interest as they encode proteins that evolve more rapidly than the genome wide average. Finally we show that four of these “hidden” genes encode key proteins for energy metabolism in flight muscle.

Download Full-text

M1CR0B1AL1Z3R—a user-friendly web server for the analysis of large-scale microbial genomics data

Nucleic Acids Research ◽

10.1093/nar/gkz423 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W88-W92 ◽

Cited By ~ 9

Author(s):

Oren Avram ◽

Dana Rapoport ◽

Shir Portugez ◽

Tal Pupko

Keyword(s):

Large Scale ◽

Ad Hoc ◽

Evolutionary Dynamics ◽

Gene Annotation ◽

Gc Content ◽

Web Server ◽

Disease Outbreaks ◽

Open Reading Frames ◽

Microbial Genomics ◽

Bacterial Strains

Abstract Large-scale mining and analysis of bacterial datasets contribute to the comprehensive characterization of complex microbial dynamics within a microbiome and among different bacterial strains, e.g., during disease outbreaks. The study of large-scale bacterial evolutionary dynamics poses many challenges. These include data-mining steps, such as gene annotation, ortholog detection, sequence alignment and phylogeny reconstruction. These steps require the use of multiple bioinformatics tools and ad-hoc programming scripts, making the entire process cumbersome, tedious and error-prone due to manual handling. This motivated us to develop the M1CR0B1AL1Z3R web server, a ‘one-stop shop’ for conducting microbial genomics data analyses via a simple graphical user interface. Some of the features implemented in M1CR0B1AL1Z3R are: (i) extracting putative open reading frames and comparative genomics analysis of gene content; (ii) extracting orthologous sets and analyzing their size distribution; (iii) analyzing gene presence–absence patterns; (iv) reconstructing a phylogenetic tree based on the extracted orthologous set; (v) inferring GC-content variation among lineages. M1CR0B1AL1Z3R facilitates the mining and analysis of dozens of bacterial genomes using advanced techniques, with the click of a button. M1CR0B1AL1Z3R is freely available at https://microbializer.tau.ac.il/.

Download Full-text

M1CR0B1AL1Z3R—a user-friendly web server for the analysis of large-scale microbial genomics data

Access Microbiology ◽

10.1099/acmi.ac2020.po1014 ◽

2020 ◽

Vol 2 (7A) ◽

Author(s):

Oren Avram ◽

Dana Rapoport ◽

Shir Portugez ◽

Tal Pupko

Keyword(s):

Large Scale ◽

Ad Hoc ◽

Evolutionary Dynamics ◽

Gene Annotation ◽

Gc Content ◽

Web Server ◽

Disease Outbreaks ◽

Open Reading Frames ◽

Microbial Genomics ◽

Bacterial Strains

Large-scale mining and analysis of bacterial datasets contribute to the comprehensive characterization of complex microbial dynamics within a microbiome and among different bacterial strains, e.g., during disease outbreaks. The study of large-scale bacterial evolutionary dynamics poses many challenges. These include data-mining steps, such as gene annotation, ortholog detection, sequence alignment, and phylogeny reconstruction. These steps require the use of multiple bioinformatics tools and ad-hoc programming scripts, making the entire process cumbersome, tedious and error-prone due to manual handling. This motivated us to develop the M1CR0B1AL1Z3R web server, a ‘one-stop shop’ for conducting microbial genomics data analyses via a simple graphical user interface (Avram, et al., Nucleic Acids Res., 2019). Some of the features implemented in M1CR0B1AL1Z3R are: (i) extracting putative open reading frames and comparative genomics analysis of gene content; (ii) extracting orthologous sets and analyzing their size distribution; (iii) analyzing gene presence-absence patterns; (iv) reconstructing a phylogenetic tree based on the extracted orthologous set; (v) inferring GC-content variation among lineages. M1CR0B1AL1Z3R facilitates the mining and analysis of dozens of bacterial genomes using advanced techniques, with the click of a button. M1CR0B1AL1Z3R is freely available at https://microbializer.tau.ac.il/ [https://microbializer.tau.ac.il/].

Download Full-text

Genome-Wide Analyses of Repeat-Induced Point Mutations in the Ascomycota

Frontiers in Microbiology ◽

10.3389/fmicb.2020.622368 ◽

2021 ◽

Vol 11 ◽

Author(s):

Stephanie van Wyk ◽

Brenda D. Wingfield ◽

Lieschen De Vos ◽

Nicolaas A. van der Merwe ◽

Emma T. Steenkamp

Keyword(s):

Defense Mechanism ◽

Point Mutations ◽

Gc Content ◽

Sliding Window ◽

Genome Defense ◽

Genome Wide ◽

Taxonomic Range ◽

Window Approach ◽

Genomic Regions ◽

Genome Assemblies

The Repeat-Induced Point (RIP) mutation pathway is a fungus-specific genome defense mechanism that mitigates the deleterious consequences of repeated genomic regions and transposable elements (TEs). RIP mutates targeted sequences by introducing cytosine to thymine transitions. We investigated the genome-wide occurrence and extent of RIP with a sliding-window approach. Using genome-wide RIP data and two sets of control groups, the association between RIP, TEs, and GC content were contrasted in organisms capable and incapable of RIP. Based on these data, we then set out to determine the extent and occurrence of RIP in 58 representatives of the Ascomycota. The findings were summarized by placing each of the fungi investigated in one of six categories based on the extent of genome-wide RIP. In silico RIP analyses, using a sliding-window approach with stringent RIP parameters, implemented simultaneously within the same genetic context, on high quality genome assemblies, yielded superior results in determining the genome-wide RIP among the Ascomycota. Most Ascomycota had RIP and these mutations were particularly widespread among classes of the Pezizomycotina, including the early diverging Orbiliomycetes and the Pezizomycetes. The most extreme cases of RIP were limited to representatives of the Dothideomycetes and Sordariomycetes. By contrast, the genomes of the Taphrinomycotina and Saccharomycotina contained no detectable evidence of RIP. Also, recent losses in RIP combined with controlled TE proliferation in the Pezizomycotina subphyla may promote substantial genome enlargement as well as the formation of sub-genomic compartments. These findings have broadened our understanding of the taxonomic range and extent of RIP in Ascomycota and how this pathway affects the genomes of fungi harboring it.

Download Full-text

Pitfalls of ascertainment biases in genome annotations—computing comparable protein domain distributions in eukarya

Malaysian Journal of Fundamental and Applied Sciences ◽

10.11113/mjfas.v10n2.57 ◽

2014 ◽

Vol 10 (2) ◽

Cited By ~ 3

Author(s):

Arli A. Parikesit ◽

Lydia Steiner ◽

Peter F. Stadler ◽

Sonja J. Prohaska

Keyword(s):

Protein Evolution ◽

Large Scale ◽

Gene Annotation ◽

Protein Domains ◽

Protein Domain ◽

Positive Correlation ◽

Gene Annotations ◽

Reference Databases ◽

Genome Annotations ◽

Qualitative Changes

Most investigations into the large-scale patterns of protein evolution are based on gene annotations that have been compiled in reference databases. The use of these resources for quantitative comparisons, however, is complicated by sometimes vast differences in coverage. More importantly, however, we also observe substantial ascertainment biases that cannot be removed by simple normalization procedures. A striking example is provided by the correlations between protein domains. We observe that statistics derived from different computational gene annotation procedure show dramatic discrepancies, and even qualitative changes from negative to positive correlation, when compared to statistics obtained from annotation databases.________________________________________GRAPHICAL ABSTRACT

Download Full-text

Faculty Opinions recommendation of Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13345956.14715054 ◽

2011 ◽

Author(s):

Craig Hersh

Keyword(s):

Lung Function ◽

Large Scale ◽

Genome Wide Association ◽

Genome Wide

Download Full-text

Faculty Opinions recommendation of Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.734261365.793558023 ◽

2019 ◽

Author(s):

Jason Flannick

Keyword(s):

Large Scale ◽

Genome Wide

Download Full-text

Large-Scale, High-Throughput Validation of Short Hairpin RNA Sequences for RNA Interference

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057105284342 ◽

2006 ◽

Vol 11 (3) ◽

pp. 236-246 ◽

Cited By ~ 6

Author(s):

Laurence H. Lamarcq ◽

Bradley J. Scherer ◽

Michael L. Phelan ◽

Nikolai N. Kalnine ◽

Yen H. Nguyen ◽

...

Keyword(s):

High Throughput ◽

Large Scale ◽

Strong Support ◽

Gc Content ◽

Rapid Identification ◽

Hairpin Rna ◽

Rna Sequences ◽

Short Hairpin ◽

Short Hairpin Rnas ◽

Interfering Rna

A method for high-throughput cloning and analysis of short hairpin RNAs (shRNAs) is described. Using this approach, 464 shRNAs against 116 different genes were screened for knockdown efficacy, enabling rapid identification of effective shRNAs against 74 genes. Statistical analysis of the effects of various criteria on the activity of the shRNAs confirmed that some of the rules thought to govern small interfering RNA (siRNA) activity also apply to shRNAs. These include moderate GC content, absence of internal hairpins, and asymmetric thermal stability. However, the authors did not find strong support for positionspecific rules. In addition, analysis of the data suggests that not all genes are equally susceptible to RNAinterference (RNAi).

Download Full-text

Publisher Correction: Genome-wide association study of individual differences of human lymphocyte profiles using large-scale cytometry data

Journal of Human Genetics ◽

10.1038/s10038-020-00890-x ◽

2021 ◽

Author(s):

Daigo Okada ◽

Naotoshi Nakamura ◽

Kazuya Setoh ◽

Takahisa Kawaguchi ◽

Koichiro Higasa ◽

...

Keyword(s):

Individual Differences ◽

Association Study ◽

Human Lymphocyte ◽

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Genome Wide

Download Full-text

The Evolution of Isochores: Evidence From SNP Frequency Distributions

Genetics ◽

10.1093/genetics/162.4.1805 ◽

2002 ◽

Vol 162 (4) ◽

pp. 1805-1810 ◽

Cited By ~ 1

Author(s):

Martin J Lercher ◽

Nick G C Smith ◽

Adam Eyre-Walker ◽

Laurence D Hurst

Keyword(s):

Population Genetics ◽

Large Scale ◽

Gc Content ◽

Nucleotide Composition ◽

Compositional Variation ◽

Mutation Bias ◽

Single Nucleotide ◽

Frequency Distributions ◽

Noncoding Regions ◽

Standard Population

AbstractThe large-scale systematic variation in nucleotide composition along mammalian and avian genomes has been a focus of the debate between neutralist and selectionist views of molecular evolution. Here we test whether the compositional variation is due to mutation bias using two new tests, which do not assume compositional equilibrium. In the first test we assume a standard population genetics model, but in the second we make no assumptions about the underlying population genetics. We apply the tests to single-nucleotide polymorphism data from noncoding regions of the human genome. Both models of neutral mutation bias fit the frequency distributions of SNPs segregating in low- and medium-GC-content regions of the genome adequately, although both suggest compositional nonequilibrium. However, neither model fits the frequency distribution of SNPs from the high-GC-content regions. In contrast, a simple population genetics model that incorporates selection or biased gene conversion cannot be rejected. The results suggest that mutation biases are not solely responsible for the compositional biases found in noncoding regions.

Download Full-text

BiPSim: a flexible and generic stochastic simulator for polymerization processes

Scientific Reports ◽

10.1038/s41598-021-92833-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Stephan Fischer ◽

Marc Dinh ◽

Vincent Henry ◽

Philippe Robert ◽

Anne Goelzer ◽

...

Keyword(s):

Large Scale ◽

Stochastic Simulation Algorithm ◽

Specific Information ◽

Whole Cell ◽

Simulation Speed ◽

Genome Wide ◽

Cell Simulation ◽

Stochastic Simulator ◽

Modeling Formalisms ◽

Stochastic Phenomena

AbstractDetailed whole-cell modeling requires an integration of heterogeneous cell processes having different modeling formalisms, for which whole-cell simulation could remain tractable. Here, we introduce BiPSim, an open-source stochastic simulator of template-based polymerization processes, such as replication, transcription and translation. BiPSim combines an efficient abstract representation of reactions and a constant-time implementation of the Gillespie’s Stochastic Simulation Algorithm (SSA) with respect to reactions, which makes it highly efficient to simulate large-scale polymerization processes stochastically. Moreover, multi-level descriptions of polymerization processes can be handled simultaneously, allowing the user to tune a trade-off between simulation speed and model granularity. We evaluated the performance of BiPSim by simulating genome-wide gene expression in bacteria for multiple levels of granularity. Finally, since no cell-type specific information is hard-coded in the simulator, models can easily be adapted to other organismal species. We expect that BiPSim should open new perspectives for the genome-wide simulation of stochastic phenomena in biology.

Download Full-text