Whole genome sequencing reveals the genomic diversity, taxonomic classification, and evolutionary relationships of the genus Nocardia

Nocardia is a complex and diverse genus of aerobic actinomycetes that cause complex clinical presentations, which are difficult to diagnose due to being misunderstood. To date, the genetic diversity, evolution, and taxonomic structure of the genus Nocardia are still unclear. In this study, we investigated the pan-genome of 86 Nocardia type strains to clarify their genetic diversity. Our study revealed an open pan-genome for Nocardia containing 265,836 gene families, with about 99.7% of the pan-genome being variable. Horizontal gene transfer appears to have been an important evolutionary driver of genetic diversity shaping the Nocardia genome and may have caused historical taxonomic confusion from other taxa (primarily Rhodococcus, Skermania, Aldersonia, and Mycobacterium). Based on single-copy gene families, we established a high-accuracy phylogenomic approach for Nocardia using 229 genome sequences. Furthermore, we found 28 potentially new species and reclassified 16 strains. Finally, by comparing the topology between a phylogenomic tree and 384 phylogenetic trees (from 384 single-copy genes from the core genome), we identified a novel locus for inferring the phylogeny of this genus. The dapb1 gene, which encodes dipeptidyl aminopeptidase BI, was far superior to commonly used markers for Nocardia and yielded a topology almost identical to that of genome-based phylogeny. In conclusion, the present study provides insights into the genetic diversity, contributes a robust framework for the taxonomic classification, and elucidates the evolutionary relationships of Nocardia. This framework should facilitate the development of rapid tests for the species identification of highly variable species and has given new insight into the behavior of this genus.

Download Full-text

Whole-genome SNP analysis elucidates the genetic population structure and diversity of Acrocomia species

10.1101/2020.10.08.331140 ◽

2020 ◽

Author(s):

Brenda G. Díaz ◽

Maria I. Zucchi ◽

Alessandro. Alves-Pereira ◽

Caléo P. de Almeida ◽

Aline C. L. Moraes ◽

...

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Genetic Structure ◽

Genetic Relationships ◽

Geographical Area ◽

Taxonomic Classification ◽

Genomic Diversity ◽

Productive Capacity ◽

Snp Analysis ◽

Genetic Population

AbstractAcrocomia (Arecaceae) is a genus widely distributed in tropical and subtropical America that has been achieving economic interest due to the great potential of oil production of some of its species. In particular A. aculeata, due to its vocation to supply oil with the same productive capacity as the oil palm even in areas with water deficit. Although eight species are recognized in the genus, the taxonomic classification based on morphology and geographic distribution is still controversial. Knowledge about the genetic diversity and population structure of the species is limited, which has limited the understanding of the genetic relationships and the orientation of management, conservation, and genetic improvement activities of species of the genus. In the present study, we analyzed the genomic diversity and population structure of seven species of Acrocomia including 117 samples of A. aculeata covering a wide geographical area of occurrence, using single nucleotide Polymorphism (SNP) markers originated from Genotyping By Sequencing (GBS). The genetic structure of the Acrocomia species were partially congruent with the current taxonomic classification based on morphological characters, recovering the separation of the species A. aculeata, A. totai, A. crispa and A. intumescens as distinct taxonomic groups. However, the species A. media was attributed to the cluster of A. aculeata while A. hassleri and A. glauscescens were grouped together with A. totai. The species that showed the highest and lowest genetic diversity were A. totai and A. media, respectively. When analyzed separately, the species A. aculeata showed a strong genetic structure, forming two genetic groups, the first represented mainly by genotypes from Brazil and the second by accessions from Central and North American countries. Greater genetic diversity was found in Brazil when compared to the other countries. Our results on the genetic diversity of the genus are unprecedented, as is also establishes new insights on the genomic relationships between Acrocomia species. It is also the first study to provide a more global view of the genomic diversity of A. aculeata. We also highlight the applicability of genomic data as a reference for future studies on genetic diversity, taxonomy, evolution and phylogeny of the Acrocomia genus, as well as to support strategies for the conservation, exploration and breeding of Acrocomia species and in particular A. aculeata.

Download Full-text

Extensive Copy Number Variation in Fermentation-Related Genes amongSaccharomyces cerevisiaeWine Strains

10.1101/105502 ◽

2017 ◽

Author(s):

Jacob Steenwyk ◽

Antonis Rokas

Keyword(s):

Genetic Diversity ◽

Copy Number ◽

Wine Yeast ◽

Gene Families ◽

Genomic Variation ◽

Genomic Diversity ◽

Major Type ◽

Diauxic Shift ◽

Variable Regions ◽

Yeast Strains

AbstractDue to the importance ofSaccharomyces cerevisiaein wine-making, the genomic variation of wine yeast strains has been extensively studied. One of the major insights stemming from these studies is that wine yeast strains harbor low levels of genetic diversity in the form of single nucleotide polymorphisms (SNPs). Genomic structural variants, such as copy number (CN) variants, are another major type of variation segregating in natural populations. To test whether genetic diversity in CN variation is also low across wine yeast strains, we examined genome-wide levels of CN variation in 132 whole-genome sequences ofS. cerevisiaewine strains. We found an average of 97.8 CN variable regions (CNVRs) affecting ~4% of the genome per strain. Using two different measures of CN diversity, we found that gene families involved in fermentation-related processes such as copper resistance (CUP), flocculation (FLO), and glucose metabolism (HXT), as well as theSNOgene family whose members are expressed before or during the diauxic shift showed substantial CN diversity across the 132 strains examined. Importantly, these same gene families have been shown, through comparative transcriptomic and functional assays, to be associated with adaptation to the wine fermentation environment. Our results suggest that CN variation is a substantial contributor to the genomic diversity of wine yeast strains and identify several candidate loci whose levels of CN variation may affect the adaptation and performance of wine yeast strains during fermentation.

Download Full-text

Genomic relatedness and diversity of Swedish native cattle breeds

Genetics Selection Evolution ◽

10.1186/s12711-019-0496-0 ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 7

Author(s):

Maulik Upadhyay ◽

Susanne Eriksson ◽

Sofia Mikko ◽

Erling Strandberg ◽

Hans Stålhammar ◽

...

Keyword(s):

Genetic Diversity ◽

Phylogenetic Trees ◽

Genetic Relatedness ◽

Demographic History ◽

Genomic Diversity ◽

Founder Population ◽

Southern Sweden ◽

Cattle Breeds ◽

Genome Wide ◽

Native Cattle

Abstract Background Native cattle breeds are important genetic resources given their adaptation to the local environment in which they are bred. However, the widespread use of commercial cattle breeds has resulted in a marked reduction in population size of several native cattle breeds worldwide. Therefore, conservation management of native cattle breeds requires urgent attention to avoid their extinction. To this end, we genotyped nine Swedish native cattle breeds with genome-wide 150 K single nucleotide polymorphisms (SNPs) to investigate the level of genetic diversity and relatedness between these breeds. Results We used various SNP-based approaches on this dataset to connect the demographic history with the genetic diversity and population structure of these Swedish cattle breeds. Our results suggest that the Väne and Ringamåla breeds originating from southern Sweden have experienced population isolation and have a low genetic diversity, whereas the Fjäll breed has a large founder population and a relatively high genetic diversity. Based on the shared ancestry and the constructed phylogenetic trees, we identified two major clusters in Swedish native cattle. In the first cluster, which includes Swedish mountain cattle breeds, there was little differentiation among the Fjäll, Fjällnära, Swedish Polled, and Bohus Polled breeds. The second cluster consists of breeds from southern Sweden: Väne, Ringamåla and Swedish Red. Interestingly, we also identified sub-structuring in the Fjällnära breed, which indicates different breeding practices on the farms that maintain this breed. Conclusions This study represents the first comprehensive genome-wide analysis of the genetic relatedness and diversity in Swedish native cattle breeds. Our results show that different demographic patterns such as genetic isolation and cross-breeding have shaped the genomic diversity of Swedish native cattle breeds and that the Swedish mountain breeds have retained their authentic distinct gene pool without significant contribution from any of the other European cattle breeds that were included in this study.

Download Full-text

Massive gene presence/absence variation in the mussel genome as an adaptive strategy: first evidence of a pan-genome in Metazoa

10.1101/781377 ◽

2019 ◽

Cited By ~ 7

Author(s):

Marco Gerdol ◽

Rebeca Moreira ◽

Fernando Cruz ◽

Jessica Gómez-Garrido ◽

Anna Vlasova ◽

...

Keyword(s):

Large Scale ◽

Single Copy ◽

Genomic Diversity ◽

Small Scale ◽

Nucleotide Polymorphisms ◽

Structural Variations ◽

Pan Genome ◽

Adaptive Value ◽

Mediterranean Mussel ◽

The Mediterranean

AbstractMussels are ecologically and economically relevant edible marine bivalves, highly invasive and resilient to biotic and abiotic stressors causing recurrent massive mortalities in other species. Here we show that the Mediterranean mussel Mytilus galloprovincialis has a complex pan-genomic architecture, which includes a core set of 45,000 genes shared by all individuals plus a surprisingly high number of dispensable genes (∼15,000). The latter are subject to presence/absence variation (PAV), i.e., they may be entirely missing in a given individual and, when present, they are frequently found as a single copy. The enrichment of dispensable genes in survival functions suggests an adaptive value for PAV, which might be the key to explain the extraordinary capabilities of adaptation and invasiveness of this species. Our study underpins a unique metazoan pan-genome architecture only previously described in prokaryotes and in a few non-metazoan eukaryotes, but that might also characterize other marine invertebrates.Significance statementIn animals, intraspecific genomic diversity is generally thought to derive from relatively small-scale variants, such as single nucleotide polymorphisms, small indels, duplications, inversions and translocations. On the other hand, large-scale structural variations which involve the loss of genomic regions encoding protein-coding genes in some individuals (i.e. presence/absence variation, PAV) have been so far only described in bacteria and, occasionally, in plants and fungi. Here we report the first evidence of a pan-genome in the animal kingdom, revealing that 25% of the genes of the Mediterranean mussel are subject to PAV. We show that this unique feature might have an adaptive value, due to the involvement of dispensable genes in functions related with defense and survival.

Download Full-text

Comparative Genomics of Pseudomonas stutzeri Complex: Taxonomic Assignments and Genetic Diversity

Frontiers in Microbiology ◽

10.3389/fmicb.2021.755874 ◽

2022 ◽

Vol 12 ◽

Author(s):

Xiangyang Li ◽

Zilin Yang ◽

Zhao Wang ◽

Weipeng Li ◽

Guohui Zhang ◽

...

Keyword(s):

Genetic Diversity ◽

Species Complex ◽

Low Frequency ◽

Pseudomonas Stutzeri ◽

Genotypic Diversity ◽

Gene Families ◽

Genomic Diversity ◽

Phylogenomic Analysis ◽

Biosynthesis Gene Cluster ◽

Taxonomic Assignments

Pseudomonas stutzeri is a species complex with extremely broad phenotypic and genotypic diversity. However, very little is known about its diversity, taxonomy and phylogeny at the genomic scale. To address these issues, we systematically and comprehensively defined the taxonomy and nomenclature for this species complex and explored its genetic diversity using hundreds of sequenced genomes. By combining average nucleotide identity (ANI) evaluation and phylogenetic inference approaches, we identified 123 P. stutzeri complex genomes covering at least six well-defined species among all sequenced Pseudomonas genomes; of these, 25 genomes represented novel members of this species complex. ANI values of ≥∼95% and digital DNA-DNA hybridization (dDDH) values of ≥∼60% in combination with phylogenomic analysis consistently and robustly supported the division of these strains into 27 genomovars (most likely species to some extent), comprising 16 known and 11 unknown genomovars. We revealed that 12 strains had mistaken taxonomic assignments, while 16 strains without species names can be assigned to the species level within the species complex. We observed an open pan-genome of the P. stutzeri complex comprising 13,261 gene families, among which approximately 45% gene families do not match any sequence present in the COG database, and a large proportion of accessory genes. The genome contents experienced extensive genetic gain and loss events, which may be one of the major mechanisms driving diversification within this species complex. Surprisingly, we found that the ectoine biosynthesis gene cluster (ect) was present in all genomes of P. stutzeri species complex strains but distributed at very low frequency (43 out of 9548) in other Pseudomonas genomes, suggesting a possible origin of the ancestors of P. stutzeri species complex in high-osmolarity environments. Collectively, our study highlights the potential of using whole-genome sequences to re-evaluate the current definition of the P. stutzeri complex, shedding new light on its genomic diversity and evolutionary history.

Download Full-text

Genome-wide SNP analysis to assess the genetic population structure and diversity of Acrocomia species

PLoS ONE ◽

10.1371/journal.pone.0241025 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0241025

Author(s):

Brenda Gabriela Díaz ◽

Maria Imaculada Zucchi ◽

Alessandro Alves‐Pereira ◽

Caléo Panhoca de Almeida ◽

Aline Costa Lima Moraes ◽

...

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Genetic Structure ◽

Genetic Relationships ◽

Geographical Area ◽

Taxonomic Classification ◽

Genomic Diversity ◽

Productive Capacity ◽

Snp Analysis ◽

Genetic Population

Acrocomia (Arecaceae) is a genus widely distributed in tropical and subtropical America that has been achieving economic interest due to the great potential of oil production of some of its species. In particular A. aculeata, due to its vocation to supply oil with the same productive capacity as the oil palm (Elaeis guineenses) even in areas with water deficit. Although eight species are recognized in the genus, the taxonomic classification based on morphology and geographic distribution is still controversial. Knowledge about the genetic diversity and population structure of the species is limited, which has limited the understanding of the genetic relationships and the orientation of management, conservation, and genetic improvement activities of species of the genus. In the present study, we analyzed the genomic diversity and population structure of Acrocomia genus, including 172 samples from seven species, with a focus on A. aculeata with 117 samples covering a wide geographical area of occurrence of the species, using Single Nucleotide Polymorphism (SNP) markers originated from Genotyping By Sequencing (GBS).The genetic structure of the Acrocomia species were partially congruent with the current taxonomic classification based on morphological characters, recovering the separation of the species A. aculeata, A. totai, A. crispa and A. intumescens as distinct taxonomic groups. However, the species A. media was attributed to the cluster of A. aculeata while A. hassleri and A. glauscescens were grouped together with A. totai. The species that showed the highest and lowest genetic diversity were A. totai and A. media, respectively. When analyzed separately, the species A. aculeata showed a strong genetic structure, forming two genetic groups, the first represented mainly by genotypes from Brazil and the second by accessions from Central and North American countries. Greater genetic diversity was found in Brazil when compared to the other countries. Our results on the genetic diversity of the genus are unprecedented, as is also establishes new insights on the genomic relationships between Acrocomia species. It is also the first study to provide a more global view of the genomic diversity of A. aculeata. We also highlight the applicability of genomic data as a reference for future studies on genetic diversity, taxonomy, evolution and phylogeny of the Acrocomia genus, as well as to support strategies for the conservation, exploration and breeding of Acrocomia species and in particular A. aculeata.

Download Full-text

Rephine.r: a pipeline for correcting gene calls and clusters to improve phage pangenomes and phylogenies

10.1101/2021.04.26.441508 ◽

2021 ◽

Author(s):

Jason W. Shapiro ◽

Catherine Putonti

Keyword(s):

Phylogenetic Trees ◽

Markov Models ◽

Gene Families ◽

Gene Clusters ◽

Automated Analysis ◽

Single Copy ◽

Bootstrap Support ◽

Homing Endonucleases ◽

Sequence Alignments ◽

Selfish Genetic Element

AbstractBackgroundA pangenome is the collection of all genes found in a set of related genomes. For microbes, these genomes are often different strains of the same species, and the pangenome offers a means to compare gene content variation with differences in phenotypes, ecology, and phylogenetic relatedness. Though most frequently applied to bacteria, there is growing interest in adapting pangenome analysis to bacteriophages. However, working with phage genomes presents new challenges. First, most phage families are under-sampled, and homologous genes in related viruses can be difficult to identify. Second, homing endonucleases and intron-like sequences may be present, resulting in fragmented gene calls. Each of these issues can reduce the accuracy of standard pangenome analysis tools.MethodsWe developed an R pipeline called Rephine.r that takes as input the gene clusters produced by an initial pangenomics workflow. Rephine.r then proceeds in two primary steps. First, it identifies three common causes of fragmented gene calls: 1) indels creating early stop codons and new start codons; 2) interruption by a selfish genetic element; and 3) splitting at the ends of the reported genome. Fragmented genes are then fused to create new sequence alignments. In tandem, Rephine.r searches for distant homologs separated into different gene families using Hidden Markov Models. Significant hits are used to merge families into larger clusters. A final round of fragment identification is then run, and results may be used to infer single-copy core genomes and phylogenetic trees.ResultsWe applied Rephine.r to three well-studied phage groups: the Tevenvirinae (e.g. T4), the Studiervirinae (e.g. T7), and the Pbunaviruses (e.g. PB1). In each case, Rephine.r recovered additional members of the single-copy core genome and increased the overall bootstrap support of the phylogeny. The Rephine.r pipeline is provided through GitHub (https://www.github.com/coevoeco/Rephine.r) as a single script for automated analysis and with utility functions and a walkthrough for researchers with specific use cases for each type of correction.

Download Full-text

Rephine.r: a pipeline for correcting gene calls and clusters to improve phage pangenomes and phylogenies

PeerJ ◽

10.7717/peerj.11950 ◽

2021 ◽

Vol 9 ◽

pp. e11950

Author(s):

Jason W. Shapiro ◽

Catherine Putonti

Keyword(s):

Phylogenetic Trees ◽

Markov Models ◽

Gene Families ◽

Gene Clusters ◽

Automated Analysis ◽

Single Copy ◽

Bootstrap Support ◽

Homing Endonucleases ◽

Sequence Alignments ◽

Selfish Genetic Element

Background A pangenome is the collection of all genes found in a set of related genomes. For microbes, these genomes are often different strains of the same species, and the pangenome offers a means to compare gene content variation with differences in phenotypes, ecology, and phylogenetic relatedness. Though most frequently applied to bacteria, there is growing interest in adapting pangenome analysis to bacteriophages. However, working with phage genomes presents new challenges. First, most phage families are under-sampled, and homologous genes in related viruses can be difficult to identify. Second, homing endonucleases and intron-like sequences may be present, resulting in fragmented gene calls. Each of these issues can reduce the accuracy of standard pangenome analysis tools. Methods We developed an R pipeline called Rephine.r that takes as input the gene clusters produced by an initial pangenomics workflow. Rephine.r then proceeds in two primary steps. First, it identifies three common causes of fragmented gene calls: (1) indels creating early stop codons and new start codons; (2) interruption by a selfish genetic element; and (3) splitting at the ends of the reported genome. Fragmented genes are then fused to create new sequence alignments. In tandem, Rephine.r searches for distant homologs separated into different gene families using Hidden Markov Models. Significant hits are used to merge families into larger clusters. A final round of fragment identification is then run, and results may be used to infer single-copy core genomes and phylogenetic trees. Results We applied Rephine.r to three well-studied phage groups: the Tevenvirinae (e.g., T4), the Studiervirinae (e.g., T7), and the Pbunaviruses (e.g., PB1). In each case, Rephine.r recovered additional members of the single-copy core genome and increased the overall bootstrap support of the phylogeny. The Rephine.r pipeline is provided through GitHub (https://www.github.com/coevoeco/Rephine.r) as a single script for automated analysis and with utility functions to assist in building single-copy core genomes and predicting the sources of fragmented genes.

Download Full-text

Plasmids Related to the Symbiotic Nitrogen Fixation Are Not Only Cooperated Functionally but Also May Have Evolved over a Time Span in Family Rhizobiaceae

Genome Biology and Evolution ◽

10.1093/gbe/evaa152 ◽

2020 ◽

Vol 12 (11) ◽

pp. 2002-2014

Author(s):

Ling-Ling Yang ◽

Zhao Jiang ◽

Yan Li ◽

En-Tao Wang ◽

Xiao-Yang Zhi

Keyword(s):

Nitrogen Fixation ◽

Gene Transfer ◽

Horizontal Gene Transfer ◽

Symbiotic Nitrogen Fixation ◽

Gene Families ◽

Time Span ◽

Soil Conditions ◽

Gene Gain ◽

Nitrogen Fixing ◽

Pan Genome

Abstract Rhizobia are soil bacteria capable of forming symbiotic nitrogen-fixing nodules associated with leguminous plants. In fast-growing legume-nodulating rhizobia, such as the species in the family Rhizobiaceae, the symbiotic plasmid is the main genetic basis for nitrogen-fixing symbiosis, and is susceptible to horizontal gene transfer. To further understand the symbioses evolution in Rhizobiaceae, we analyzed the pan-genome of this family based on 92 genomes of type/reference strains and reconstructed its phylogeny using a phylogenomics approach. Intriguingly, although the genetic expansion that occurred in chromosomal regions was the main reason for the high proportion of low-frequency flexible gene families in the pan-genome, gene gain events associated with accessory plasmids introduced more genes into the genomes of nitrogen-fixing species. For symbiotic plasmids, although horizontal gene transfer frequently occurred, transfer may be impeded by, such as, the host’s physical isolation and soil conditions, even among phylogenetically close species. During coevolution with leguminous hosts, the plasmid system, including accessory and symbiotic plasmids, may have evolved over a time span, and provided rhizobial species with the ability to adapt to various environmental conditions and helped them achieve nitrogen fixation. These findings provide new insights into the phylogeny of Rhizobiaceae and advance our understanding of the evolution of symbiotic nitrogen fixation.

Download Full-text

Phylogeographic Genetic Diversity in the White Sucker Hepatitis B Virus across the Great Lakes Region and Alberta, Canada

Viruses ◽

10.3390/v13020285 ◽

2021 ◽

Vol 13 (2) ◽

pp. 285

Author(s):

Cynthia R. Adams ◽

Vicki S. Blazer ◽

Jim Sherry ◽

Robert Scott Cornman ◽

Luke R. Iwanowicz

Keyword(s):

Genetic Diversity ◽

Hepatitis B Virus ◽

Hepatitis B ◽

Lake Michigan ◽

Illumina Miseq ◽

Genomic Variation ◽

Genomic Diversity ◽

White Sucker ◽

Fish Health ◽

B Virus

Hepatitis B viruses belong to a family of circular, double-stranded DNA viruses that infect a range of organisms, with host responses that vary from mild infection to chronic infection and cancer. The white sucker hepatitis B virus (WSHBV) was first described in the white sucker (Catostomus commersonii), a freshwater teleost, and belongs to the genus Parahepadnavirus. At present, the host range of WSHBV and its impact on fish health are unknown, and neither genetic diversity nor association with fish health have been studied in any parahepadnavirus. Given the relevance of genomic diversity to disease outcome for the orthohepadnaviruses, we sought to characterize genomic variation in WSHBV and determine how it is structured among watersheds. We identified WSHBV-positive white sucker inhabiting tributaries of Lake Michigan, Lake Superior, Lake Erie (USA), and Lake Athabasca (Canada). Copy number in plasma and in liver tissue was estimated via qPCR. Templates from 27 virus-positive fish were amplified and sequenced using a primer-specific, circular long-range amplification method coupled with amplicon sequencing on the Illumina MiSeq. Phylogenetic analysis of the WSHBV genome identified phylogeographical clustering reminiscent of that observed with human hepatitis B virus genotypes. Notably, most non-synonymous substitutions were found to cluster in the pre-S/spacer overlap region, which is relevant for both viral entry and replication. The observed predominance of p1/s3 mutations in this region is indicative of adaptive change in the polymerase open reading frame (ORF), while, at the same time, the surface ORF is under purifying selection. Although the levels of variation we observed do not meet the criteria used to define sub/genotypes of human and avian hepadnaviruses, we identified geographically associated genome variation in the pre-S and spacer domain sufficient to define five WSHBV haplotypes. This study of WSHBV genetic diversity should facilitate the development of molecular markers for future identification of genotypes and provide evidence in future investigations of possible differential disease outcomes.

Download Full-text