pan genome
Recently Published Documents


TOTAL DOCUMENTS

394
(FIVE YEARS 218)

H-INDEX

47
(FIVE YEARS 11)

2022 ◽  
Author(s):  
Tang Li ◽  
Yanbin Yin

Background: Large scale metagenome assembly and binning to generate metagenome-assembled genomes (MAGs) has become possible in the past five years. As a result, millions of MAGs have been produced and increasingly included in pan-genomics workflow. However, pan-genome analyses of MAGs may suffer from the known issues with MAGs: fragmentation, incompleteness, and contamination, due to mis-assembly and mis-binning. Here, we conducted a critical assessment of including MAGs in pan-genome analysis, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs. Results: We found that incompleteness led to more significant core gene loss than fragmentation. Contamination had little effect on core genome size but had major influence on accessory genomes. The core gene loss remained when using different pan-genome analysis tools and when using a mixture of MAGs and complete genomes. Importantly, the core gene loss was partially alleviated by lowering the core gene threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The core gene loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees. Conclusions: We conclude that lowering core gene threshold and predicting genes in metagenome mode (as Anvio does with Prodigal) are necessary in pan-genome analysis of MAGs to alleviate the accuracy loss. Better quality control of MAGs and development of new pan-genome analysis tools specifically designed for MAGs are needed in future studies.


2021 ◽  
Author(s):  
Ran Li ◽  
Mian Gong ◽  
Xinmiao Zhang ◽  
Fei Wang ◽  
Zhenyu Liu ◽  
...  

Structural variations (SVs) are a major contributor of genetic diversity and phenotypic variations, however their prevalence and functions in domestic animals are largely unexplored. Here, we assembled 26 haplotype-resolved genome assemblies from 13 genetically diverse sheep breeds using PacBio HiFi sequencing. We then constructed an ovine graph pan-genome and demonstrated its advantage in discovering 142,593 biallelic SVs (Insertions and deletions), 7,028 divergent alleles and 13,419 multiallelic variations with high accuracy and sensitivity. To link the SVs to genotypes, we genotyped the SVs in 687 resequenced individuals of domestic and wild sheep using a graph-based approach and identified numerous population-stratified variants, of which expression-associated SVs were detected by integrating RNA-seq data. Taking the varying sheep tail morphology as example, we located a putative causative insertion in HOXB13 gene responsible for the long tail and reported multiple large SVs associated with the fat tail. Beyond generating a benchmark resource for ovine structural variants, our study also highlighted that the population genetics analysis based on graph pan-genome rather than reference genome will greatly benefit the animal genetic research.


Forests ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1714
Author(s):  
Faizah N. Alenezi ◽  
Houda Ben Slama ◽  
Ali Chenari Bouket ◽  
Hafsa Cherif-Silini ◽  
Allaoua Silini ◽  
...  

Bacillus velezensis gram-positive bacterium, is frequently isolated from diverse niches mainly soil, water, plant roots, and fermented foods. B. velezensis is ubiquitous, non-pathogenic and endospore forming. Being frequently isolated from diverse plant holobionts it is considered host adapted microorganism and recognized of high economic importance given its ability to promote plant growth under diverse biotic and abiotic stress conditions. Additionally, the species suppress many plant diseases, including bacterial, oomycete, and fungal diseases. It is also able after plant host root colonization to induce unique physiological situation of host plant called primed state. Primed host plants are able to respond more rapidly and/or effectively to biotic or abiotic stress. Moreover, B. velezenis have the ability to resist diverse environmental stresses and help host plants to cope with, including metal and xenobiotic stresses. Within species B. velezensis strains have unique abilities allowing them to adopt different life styles. Strain level abilities knowledge is warranted and could be inferred using the ever-expanding new genomes list available in genomes databases. Pangenome analysis and subsequent identification of core, accessory and unique genomes is actually of paramount importance to decipher species full metabolic capacities and fitness across diverse environmental conditions shaping its life style. Despite the crucial importance of the pan genome, its assessment among large number of strains remains sparse and systematic studies still needed. Extensive knowledge of the pan genome is needed to translate genome sequencing efforts into developing more efficient biocontrol agents and bio-fertilizers. In this study, a genome survey of B. velezensis allowed us to (a) highlight B. velezensis species boundaries and show that Bacillus suffers taxonomic imprecision that blurs the debate over species pangenome; (b) identify drivers of their successful acquisition of specific life styles and colonization of new niches; (c) describe strategies they use to promote plant growth and development; (d) reveal the unlocked strain specific orphan secondary metabolite gene clusters (biosynthetic clusters with corresponding metabolites unknown) that product identification is still awaiting to amend our knowledge of their putative role in suppression of pathogens and plant growth promotion, and (e) to describe a dynamic pangenome with a secondary metabolite rich accessory genome.


2021 ◽  
Vol 6 (12) ◽  
pp. 1526-1536
Author(s):  
Amelia E. Barber ◽  
Tongta Sae-Ong ◽  
Kang Kang ◽  
Bastian Seelbinder ◽  
Jun Li ◽  
...  

2021 ◽  
Vol 7 (11) ◽  
Author(s):  
Aysun Urhan ◽  
Thomas Abeel

Microbial organisms have diverse populations, where using a single linear reference sequence in comparative studies introduces reference-bias in downstream analyses, and leads to a failure to account for variability in the population. Recently, pan-genome graphs have emerged as an alternative to the traditional linear reference with many successful applications and a rapid increase in the number of methods available in the literature. Despite this enthusiasm, there has been no attempt at exploring these graph construction methods in depth, demonstrating their practical use. In this study, we aim to develop a general guide to help researchers who may want to incorporate pan-genomes in their analyses of microbial organisms. We evaluated the state-of-the art pan-genome construction tools to model a collection of 70 Acinetobacter baumannii strains. Our results suggest that all tools produced pan-genome graphs conforming to our expectations based on previous literature, and that their approach to homologue detection is likely to be the most influential in determining the final size and complexity of the pan-genome. The graphs overlapped most in the core pan-genome content while the cloud genes varied significantly among tools. We propose an alternative approach for pan-genome construction by combining two of the tools, Panaroo and Ptolemy, to further exploit them in downstream analyses, and demonstrate the effectiveness of our pipeline for structural variant calling in beta-lactam resistance genes in the same set of A. baumannii isolates, identifying various transposon structures for carbapenem resistance in chromosome, as well as plasmids. We identify a novel plasmid structure in two multidrug-resistant clinical isolates that had previously been studied, and which could be important for their resistance phenotypes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hsuan-Lin Her ◽  
Po-Ting Lin ◽  
Yu-Wei Wu

Abstract Background Discerning genes crucial to antimicrobial resistance (AMR) mechanisms is becoming more and more important to accurately and swiftly identify AMR pathogenic strains. Pangenome-wide association studies (e.g. Scoary) identified numerous putative AMR genes. However, only a tiny proportion of the putative resistance genes are annotated by AMR databases or Gene Ontology. In addition, many putative resistance genes are of unknown function (termed hypothetical proteins). An annotation tool is crucially needed in order to reveal the functional organization of the resistome and expand our knowledge of the AMR gene repertoire. Results We developed an approach (PangenomeNet) for building co-functional networks from pan-genomes to infer functions for hypothetical genes. Using Escherichia coli as an example, we demonstrated that it is possible to build co-functional network from its pan-genome using co-inheritance, domain-sharing, and protein–protein-interaction information. The investigation of the network revealed that it fits the characteristics of biological networks and can be used for functional inferences. The subgraph consisting of putative meropenem resistance genes consists of clusters of stress response genes and resistance gene acquisition pathways. Resistome subgraphs also demonstrate drug-specific AMR genes such as beta-lactamase, as well as functional roles shared among multiple classes of drugs, mostly in the stress-related pathways. Conclusions By demonstrating the idea of pan-genome-based co-functional network on the E. coli species, we showed that the network can infer functional roles of the genes, including those without functional annotations, and provides holistic views on the putative antimicrobial resistomes. We hope that the pan-genome network idea can help formulate hypothesis for targeted experimental works.


2021 ◽  
Author(s):  
Ming Li ◽  
Congjiao Sun ◽  
Naiyi Xu ◽  
Peipei Bian ◽  
Xiaomeng Tian ◽  
...  

The gene numbers and evolutionary rates of birds were assumed to be much lower than that of mammals, which in sharp contrast to the huge species number and morphological diversity of birds. It is very necessary to construct a complete avian genome and analyze its evolution.We constructed a chicken pan-genome from 20 de novo genome assemblies with high sequencing depth, newly identified 1,335 protein-coding genes and 3,011 long noncoding RNAs. The majority of these novel genes were detected across most individuals of the examined transcriptomes but were accidentally measured in each of the DNA sequencing data regardless of Illumina or PacBio technology. Furthermore, different from previous pan-genome models, most of these novel genes were overrepresented on chromosomal sub-telomeric regions, surrounded with extremely high proportions of tandem repeats, and strongly blocked DNA sequencing. These hidden genes were proved to be shared by all chicken genomes, included many housekeeping genes, and enriched in immune pathways. Comparative genomics revealed the novel genes had three-fold elevated substitution rates than known ones, updating the evolutionary rates of birds. Our study provides a framework for constructing a better chicken genome, which will contribute towards the understanding of avian evolution and improvement of poultry breeding.


2021 ◽  
Vol 16 (11) ◽  
pp. 1934578X2110609
Author(s):  
Xiaofan Guo ◽  
Shouming Wang

Inonotus obliquus is a rare, edible and medicinal fungus that is widely used as a remedy for various diseases. Its main bioactive substances are polysaccharides and terpenoids. In this study, we characterized and investigated the pan-genome of three strains of I. obliquus. The genome sizes of JL01, HE, and NBRC8681 were 32.04, 29.04, and 31.78 Mb, respectively. There were 6 543 core gene families and 6 197 accessory gene families among the three strains, with 14 polysaccharide-related core gene families and seven accessory gene families. For terpenoids, there were 13 core gene families and 17 accessory gene families. Pan-genome sequencing of I. obliquus has improved our understanding of biological characteristics related to the biosynthesis of polysaccharides and terpenoids at the molecular level, which in turn will enable us to increase the production of polysaccharides and terpenoids by this mushroom.


2021 ◽  
Vol 7 (10) ◽  
Author(s):  
Ana C. Reis ◽  
Mónica V. Cunha

Animal tuberculosis (TB) is an emergent disease caused by Mycobacterium bovis , one of the animal-adapted ecotypes of the Mycobacterium tuberculosis complex (MTC). In this work, whole-genome comparative analyses of 70 M . bovis were performed to gain insights into the pan-genome architecture. The comparison across M. bovis predicted genome composition enabled clustering into the core- and accessory-genome components, with 2736 CDS for the former, while the accessory moiety included 3897 CDS, of which 2656 are restricted to one/two genomes only. These analyses predicted an open pan-genome architecture, with an average of 32 CDS added by each genome and show the diversification of discrete M. bovis subpopulations supported by both core- and accessory-genome components. The functional annotation of the pan-genome classified each CDS into one or several COG (Clusters of Orthologous Groups) categories, revealing ‘transcription’ (total average CDSs, n=258), ‘lipid metabolism and transport’ (n=242), ‘energy production and conversion’ (n=214) and ‘unknown function’ (n=876) as the most represented. The closer analysis of polymorphisms in virulence-related genes in a restrict group of M. bovis from a multi-host system enabled the identification of clade-monomorphic non-synonymous SNPs, illustrating clade-specific virulence landscapes and correlating with disease severity. This first comparative pan-genome study of a diverse collection of M. bovis encompassing all clonal complexes indicates a high percentage of accessory genes and denotes an open, dynamic non-conservative pan-genome structure, with high evolutionary potential, defying the canons of MTC biology. Furthermore, it shows that M. bovis can shape its virulence repertoire, either by acquisition and loss of genes or by SNP-based diversification, likely towards host immune evasion, adaptation and persistence.


Sign in / Sign up

Export Citation Format

Share Document