Evolution of a Record-Setting AT-Rich Genome: Indel Mutation, Recombination, and Substitution Bias

Duong T Nguyen; Baojun Wu; Shujie Xiao; Weilong Hao

doi:10.1093/gbe/evaa202

Evolution of a Record-Setting AT-Rich Genome: Indel Mutation, Recombination, and Substitution Bias

Genome Biology and Evolution ◽

10.1093/gbe/evaa202 ◽

2020 ◽

Vol 12 (12) ◽

pp. 2344-2354

Author(s):

Duong T Nguyen ◽

Baojun Wu ◽

Shujie Xiao ◽

Weilong Hao

Keyword(s):

Tandem Repeats ◽

Population Genomics ◽

Nucleotide Composition ◽

Genomic Variation ◽

Nucleotide Substitutions ◽

Mutation Bias ◽

Mutation Pressure ◽

Indel Mutation ◽

Genome Wide ◽

Saccharomycodes Ludwigii

Abstract Genome-wide nucleotide composition varies widely among species. Despite extensive research, the source of genome-wide nucleotide composition diversity remains elusive. Yeast mitochondrial genomes (mitogenomes) are highly A + T rich, and they provide a unique opportunity to study the evolution of AT-biased landscape. In this study, we sequenced ten complete mitogenomes of the Saccharomycodes ludwigii yeast with 8% G + C content, the lowest genome-wide %(G + C) in all published genomes to date. The S. ludwigii mitogenomes have high densities of short tandem repeats but severely underrepresented mononucleotide repeats. Comparative population genomics of these record-setting A + T-rich genomes shows dynamic indel mutations and strong mutation bias toward A/T. Indel mutations play a greater role in genomic variation among very closely related strains than nucleotide substitutions. Indels have resulted in presence–absence polymorphism of tRNAArg (ACG) among S. ludwigii mitogenomes. Interestingly, these mitogenomes have undergone recombination, a genetic process that can increase G + C content by GC-biased gene conversion. Finally, the expected equilibrium G + C content under mutation pressure alone is higher than observed G + C content, suggesting existence of mechanisms other than AT-biased mutation operating to increase A/T. Together, our findings shed new lights on mechanisms driving extremely AT-rich genomes.

The rate and molecular spectrum of spontaneous mutations in the GC-rich multi-chromosome genome ofBurkholderia cenocepacia

10.1101/011841 ◽

2014 ◽

Author(s):

Marcus M Dillon ◽

Way Sung ◽

Michael Lynch ◽

Vaughn S Cooper

Keyword(s):

Prokaryotic Genome ◽

Gc Content ◽

Nucleotide Composition ◽

Burkholderia Cenocepacia ◽

Model Organisms ◽

Base Substitution ◽

Spontaneous Mutations ◽

Mutation Pressure ◽

Genome Wide ◽

A Genome

Spontaneous mutations are ultimately essential for evolutionary change and are also the root cause of many diseases. However, until recently, both biological and technical barriers have prevented detailed analyses of mutation profiles, constraining our understanding of the mutation process to a few model organisms and leaving major gaps in our understanding of the role of genome content and structure on mutation. Here, we present a genome-wide view of the molecular mutation spectrum in Burkholderia cenocepacia, a clinically relevant pathogen with high %GC-content and multiple chromosomes. We find that B. cenocepacia has low genome-wide mutation rates with insertion-deletion mutations biased towards deletions, consistent with the idea that deletion pressure reduces prokaryotic genome sizes. Unlike prior studies of other organisms, mutations in B. cenocepacia are not AT-biased, which suggests that at least some genomes with high %GC-content experience unusual base-substitution mutation pressure. Importantly, we also observe variation in both the rates and spectra of mutations among chromosomes and elevated G:C>T:A transversions in late-replicating regions. Thus, although some patterns of mutation appear to be highly conserved across cellular life, others vary between species and even between chromosomes of the same species, potentially influencing the evolution of nucleotide composition and genome architecture.

The Evolution of Isochores: Evidence From SNP Frequency Distributions

Genetics ◽

10.1093/genetics/162.4.1805 ◽

2002 ◽

Vol 162 (4) ◽

pp. 1805-1810 ◽

Cited By ~ 1

Author(s):

Martin J Lercher ◽

Nick G C Smith ◽

Adam Eyre-Walker ◽

Laurence D Hurst

Keyword(s):

Population Genetics ◽

Large Scale ◽

Gc Content ◽

Nucleotide Composition ◽

Compositional Variation ◽

Mutation Bias ◽

Single Nucleotide ◽

Frequency Distributions ◽

Noncoding Regions ◽

Standard Population

AbstractThe large-scale systematic variation in nucleotide composition along mammalian and avian genomes has been a focus of the debate between neutralist and selectionist views of molecular evolution. Here we test whether the compositional variation is due to mutation bias using two new tests, which do not assume compositional equilibrium. In the first test we assume a standard population genetics model, but in the second we make no assumptions about the underlying population genetics. We apply the tests to single-nucleotide polymorphism data from noncoding regions of the human genome. Both models of neutral mutation bias fit the frequency distributions of SNPs segregating in low- and medium-GC-content regions of the genome adequately, although both suggest compositional nonequilibrium. However, neither model fits the frequency distribution of SNPs from the high-GC-content regions. In contrast, a simple population genetics model that incorporates selection or biased gene conversion cannot be rejected. The results suggest that mutation biases are not solely responsible for the compositional biases found in noncoding regions.

Genomic variation in the American pika: signatures of geographic isolation and implications for conservation

BMC Ecology and Evolution ◽

10.1186/s12862-020-01739-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Kelly B. Klingler ◽

Joshua P. Jahner ◽

Thomas L. Parchman ◽

Chris Ray ◽

Mary M. Peacock

Keyword(s):

Genetic Variation ◽

Genetic Structure ◽

Sierra Nevada ◽

Spatial Genetic Structure ◽

Spatial Scales ◽

Thermal Sensitivity ◽

Genomic Variation ◽

Genome Wide ◽

A Genome ◽

American Pika

Abstract Background Distributional responses by alpine taxa to repeated, glacial-interglacial cycles throughout the last two million years have significantly influenced the spatial genetic structure of populations. These effects have been exacerbated for the American pika (Ochotona princeps), a small alpine lagomorph constrained by thermal sensitivity and a limited dispersal capacity. As a species of conservation concern, long-term lack of gene flow has important consequences for landscape genetic structure and levels of diversity within populations. Here, we use reduced representation sequencing (ddRADseq) to provide a genome-wide perspective on patterns of genetic variation across pika populations representing distinct subspecies. To investigate how landscape and environmental features shape genetic variation, we collected genetic samples from distinct geographic regions as well as across finer spatial scales in two geographically proximate mountain ranges of eastern Nevada. Results Our genome-wide analyses corroborate range-wide, mitochondrial subspecific designations and reveal pronounced fine-scale population structure between the Ruby Mountains and East Humboldt Range of eastern Nevada. Populations in Nevada were characterized by low genetic diversity (π = 0.0006–0.0009; θW = 0.0005–0.0007) relative to populations in California (π = 0.0014–0.0019; θW = 0.0011–0.0017) and the Rocky Mountains (π = 0.0025–0.0027; θW = 0.0021–0.0024), indicating substantial genetic drift in these isolated populations. Tajima’s D was positive for all sites (D = 0.240–0.811), consistent with recent contraction in population sizes range-wide. Conclusions Substantial influences of geography, elevation and climate variables on genetic differentiation were also detected and may interact with the regional effects of anthropogenic climate change to force the loss of unique genetic lineages through continued population extirpations in the Great Basin and Sierra Nevada.

DNN-m6A: A Cross-Species Method for Identifying RNA N6-Methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion

Genes ◽

10.3390/genes12030354 ◽

2021 ◽

Vol 12 (3) ◽

pp. 354

Author(s):

Lu Zhang ◽

Xinyi Qin ◽

Min Liu ◽

Ziwei Xu ◽

Guangzhong Liu

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Area Under The Curve ◽

Nucleotide Composition ◽

Computational Method ◽

Feature Subset ◽

Accurate Identification ◽

Genome Wide ◽

Dinucleotide Composition ◽

Optimal Feature Subset

As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58%–83.38% and an area under the curve (AUC) of 81.39%–91.04%. Furthermore, the independent datasets achieved an ACC of 72.95%–83.04% and an AUC of 80.79%–91.09%, which shows an excellent generalization ability of our proposed method.

Genome diversity in Ukraine

GigaScience ◽

10.1093/gigascience/giaa159 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Taras K Oleksyk ◽

Walter W Wolfsberger ◽

Alexandra M Weber ◽

Khrystyna Shchubelka ◽

Olga T Oleksyk ◽

...

Keyword(s):

Sequence Data ◽

Copy Number Variations ◽

Genomic Variation ◽

High Coverage ◽

Genome Data ◽

New Information ◽

Genome Wide ◽

Public Data ◽

Genome Wide Data ◽

Multiple Samples

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.

Genome-Wide Analysis of Codon Usage Patterns of SARS-CoV-2 Virus Reveals Global Heterogeneity of COVID-19

Biomolecules ◽

10.3390/biom11060912 ◽

2021 ◽

Vol 11 (6) ◽

pp. 912

Author(s):

Saadullah Khattak ◽

Mohd Ahmar Rauf ◽

Qamar Zaman ◽

Yasir Ali ◽

Shabeen Fatima ◽

...

Keyword(s):

Standard Deviation ◽

Codon Usage ◽

Codon Usage Bias ◽

Geographic Location ◽

Mutation Pressure ◽

Genome Wide ◽

Margin Of Error ◽

Usage Patterns ◽

Causative Agents ◽

Virus Genomes

The ongoing outbreak of coronavirus disease COVID-19 is significantly implicated by global heterogeneity in the genome organization of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The causative agents of global heterogeneity in the whole genome of SARS-CoV-2 are not well characterized due to the lack of comparative study of a large enough sample size from around the globe to reduce the standard deviation to the acceptable margin of error. To better understand the SARS-CoV-2 genome architecture, we have performed a comprehensive analysis of codon usage bias of sixty (60) strains to get a snapshot of its global heterogeneity. Our study shows a relatively low codon usage bias in the SARS-CoV-2 viral genome globally, with nearly all the over-preferred codons’ A.U. ended. We concluded that the SARS-CoV-2 genome is primarily shaped by mutation pressure; however, marginal selection pressure cannot be overlooked. Within the A/U rich virus genomes of SARS-CoV-2, the standard deviation in G.C. (42.91% ± 5.84%) and the GC3 value (30.14% ± 6.93%) points towards global heterogeneity of the virus. Several SARS-CoV-2 viral strains were originated from different viral lineages at the exact geographic location also supports this fact. Taking all together, these findings suggest that the general root ancestry of the global genomes are different with different genome’s level adaptation to host. This research may provide new insights into the codon patterns, host adaptation, and global heterogeneity of SARS-CoV-2.

Unexpected genomic, biosynthetic and species diversity of Streptomyces bacteria from bats in Arizona and New Mexico, USA

BMC Genomics ◽

10.1186/s12864-021-07546-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Cooper J. Park ◽

Nicole A. Caimi ◽

Debbie C. Buecher ◽

Ernest W. Valdez ◽

Diana E. Northup ◽

...

Keyword(s):

New Mexico ◽

Genetic Manipulation ◽

Distinct Species ◽

Biosynthetic Gene Cluster ◽

Genomic Variation ◽

Species Variation ◽

Nucleotide Polymorphisms ◽

Peptide Synthetases ◽

Genome Wide ◽

Diversity Estimates

Abstract Background Antibiotic-producing Streptomyces bacteria are ubiquitous in nature, yet most studies of its diversity have focused on free-living strains inhabiting diverse soil environments and those in symbiotic relationship with invertebrates. Results We studied the draft genomes of 73 Streptomyces isolates sampled from the skin (wing and tail membranes) and fur surfaces of bats collected in Arizona and New Mexico. We uncovered large genomic variation and biosynthetic potential, even among closely related strains. The isolates, which were initially identified as three distinct species based on sequence variation in the 16S rRNA locus, could be distinguished as 41 different species based on genome-wide average nucleotide identity. Of the 32 biosynthetic gene cluster (BGC) classes detected, non-ribosomal peptide synthetases, siderophores, and terpenes were present in all genomes. On average, Streptomyces genomes carried 14 distinct classes of BGCs (range = 9–20). Results also revealed large inter- and intra-species variation in gene content (single nucleotide polymorphisms, accessory genes and singletons) and BGCs, further contributing to the overall genetic diversity present in bat-associated Streptomyces. Finally, we show that genome-wide recombination has partly contributed to the large genomic variation among strains of the same species. Conclusions Our study provides an initial genomic assessment of bat-associated Streptomyces that will be critical to prioritizing those strains with the greatest ability to produce novel antibiotics. It also highlights the need to recognize within-species variation as an important factor in genetic manipulation studies, diversity estimates and drug discovery efforts in Streptomyces.

Genome-wide characterization of MATE gene family and expression profiles in response to abiotic stresses in rice (Oryza sativa)

BMC Ecology and Evolution ◽

10.1186/s12862-021-01873-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Zhixuan Du ◽

Qitao Su ◽

Zheng Wu ◽

Zhou Huang ◽

Jianzhong Bao ◽

...

Keyword(s):

Oryza Sativa ◽

Gene Family ◽

Tandem Repeats ◽

Expression Profiles ◽

Functional Divergence ◽

Expression Patterns ◽

Regulatory Elements ◽

Genome Wide ◽

Comprehensive Information ◽

Family Gene

AbstractMultidrug and toxic compound extrusion (MATE) proteins are involved in many physiological functions of plant growth and development. Although an increasing number of MATE proteins have been identified, the understanding of MATE proteins is still very limited in rice. In this study, 46 MATE proteins were identified from the rice (Oryza sativa) genome by homology searches and domain prediction. The rice MATE family was divided into four subfamilies based on the phylogenetic tree. Tandem repeats and fragment replication contribute to the expansion of the rice MATE gene family. Gene structure and cis-regulatory elements reveal the potential functions of MATE genes. Analysis of gene expression showed that most of MATE genes were constitutively expressed and the expression patterns of genes in different tissues were analyzed using RNA-seq. Furthermore, qRT-PCR-based analysis showed differential expression patterns in response to salt and drought stress. The analysis results of this study provide comprehensive information on the MATE gene family in rice and will aid in understanding the functional divergence of MATE genes.

Genome-Wide Survey on Genomic Variation, Expression Divergence, and Evolution in Two Contrasting Rice Genotypes under High Salinity Stress

Genome Biology and Evolution ◽

10.1093/gbe/evt152 ◽

2013 ◽

Vol 5 (11) ◽

pp. 2032-2050 ◽

Cited By ~ 19

Author(s):

Shu-Ye Jiang ◽

Ali Ma ◽

Rengasamy Ramamoorthy ◽

Srinivasan Ramachandran

Keyword(s):

Salinity Stress ◽

High Salinity ◽

Genomic Variation ◽

Expression Divergence ◽

High Salinity Stress ◽

Genome Wide ◽

Rice Genotypes ◽

Genome Wide Survey

Spatially coordinated heterochromatinization of distal short tandem repeats in fragile X syndrome

10.1101/2021.04.23.441217 ◽

2021 ◽

Author(s):

Linda Zhou ◽

Chunmin Ge ◽

Thomas Malachowski ◽

Ji Hun Kim ◽

Keerthivasan Raanin Chandradoss ◽

...

Keyword(s):

Fragile X Syndrome ◽

Short Tandem Repeats ◽

Tandem Repeats ◽

Fragile X ◽

Repeat Expansion ◽

Genome Wide ◽

A Genome ◽

In Trans ◽

Surveillance Mechanism ◽

Short Tandem

AbstractShort tandem repeat (STR) instability is causally linked to pathologic transcriptional silencing in a subset of repeat expansion disorders. In fragile X syndrome (FXS), instability of a single CGG STR tract is thought to repress FMR1 via local DNA methylation. Here, we report the acquisition of more than ten Megabase-sized H3K9me3 domains in FXS, including a 5-8 Megabase block around FMR1. Distal H3K9me3 domains encompass synaptic genes with STR instability, and spatially co-localize in trans concurrently with FMR1 CGG expansion and the dissolution of TADs. CRISPR engineering of mutation-length FMR1 CGG to normal-length preserves heterochromatin, whereas cut-out to pre-mutation-length attenuates a subset of H3K9me3 domains. Overexpression of a pre-mutation-length CGG de-represses both FMR1 and distal heterochromatinized genes, indicating that long-range H3K9me3-mediated silencing is exquisitely sensitive to STR length. Together, our data uncover a genome-wide surveillance mechanism by which STR tracts spatially communicate over vast distances to heterochromatinize the pathologically unstable genome in FXS.One-Sentence SummaryHeterochromatinization of distal synaptic genes with repeat instability in fragile X is reversible by overexpression of a pre-mutation length CGG tract.