A pan-genome method to determine core regions of the Bacillus subtilis and Escherichia coli genomes

Background: Synthetic engineering of bacteria to produce industrial products is a burgeoning field of research and application. In order to optimize genome design, designers need to understand which genes are essential, which are optimal for growth, and locations in the genome that will be tolerated by the organism when inserting engineered cassettes. Methods: We present a pan-genome based method for the identification of core regions in a genome that are strongly conserved at the species level. Results: We show that the core regions determined by our method contain all or almost all essential genes. This demonstrates the accuracy of our method as essential genes should be core genes. We show that we outperform previous methods by this measure. We also explain why there are exceptions to this rule for our method. Conclusions: We assert that synthetic engineers should avoid deleting or inserting into these core regions unless they understand and are manipulating the function of the genes in that region. Similarly, if the designer wishes to streamline the genome, non-core regions and in particular low penetrance genes would be good targets for deletion. Care should be taken to remove entire cassettes with similar penetrance of the genes within cassettes as they may harbor toxin/antitoxin genes which need to be removed in tandem. The bioinformatic approach introduced here saves considerable time and effort relative to knockout studies on single isolates of a given species and captures a broad understanding of the conservation of genes that are core to a species.

Download Full-text

A pan-genome method to determine core regions of the Bacillus subtilis and Escherichia coli genomes

F1000Research ◽

10.12688/f1000research.51873.2 ◽

2021 ◽

Vol 10 ◽

pp. 286

Author(s):

Granger Sutton ◽

Gary B. Fogel ◽

Bradley Abramson ◽

Lauren Brinkac ◽

Todd Michael ◽

...

Keyword(s):

Essential Genes ◽

Considerable Time ◽

The Core ◽

Pan Genome ◽

A Genome ◽

Low Penetrance Genes ◽

Almost All ◽

Bioinformatic Approach ◽

Core Genes ◽

Broad Understanding

Download Full-text

A pan-genome method to determine core regions of the Bacillus subtilis and Escherichia coli genomes

10.1101/2020.06.11.147629 ◽

2020 ◽

Author(s):

Granger Sutton ◽

Gary B. Fogel ◽

Bradley Abramson ◽

Lauren Brinkac ◽

Todd Michael ◽

...

Keyword(s):

Essential Genes ◽

Considerable Time ◽

Laboratory Methods ◽

Pan Genome ◽

Gene Acquisition ◽

A Genome ◽

Low Penetrance Genes ◽

Almost All ◽

Conserved Core ◽

Core Genes

AbstractSynthetic engineering of bacteria to produce industrial products is a burgeoning field of research and application. In order to optimize genome design, designers need to understand which genes are essential, which are optimal for growth, and locations in the genome that will be tolerated by the organism when inserting engineered cassettes. We present a pan-genome based method for the identification of core regions in a genome that are strongly conserved at the species level. We show that these core regions are very likely to contain all or almost all essential genes. We assert that synthetic engineers should avoid deleting or inserting into these core regions unless they understand and are manipulating the function of the genes in that region. Similarly, if the designer wishes to streamline the genome, non-core regions and in particular low penetrance genes would be good targets for deletion. Care should be taken to remove entire cassettes with similar penetrance of the genes within cassettes as they may harbor toxin/antitoxin genes which need to be removed in tandem. The bioinformatic approach introduced here saves considerable time and effort relative to knockout studies on single isolates of a given species and captures a broad understanding of the conservation of genes that are core to a species.ImportanceThe pan-genome approach presented in this paper can be used to determine core regions of a genome and has many possible applications. Synthetic engineering design can be informed by which genes/regions are more conserved (core) versus less conserved. The level of conservation of adjacent non-core genes tends to define cassettes of genes which may be part of a pathway or system that can inform researchers about possible functional significance. The pattern of gene presence across the different genomes of a species can inform the understanding of evolution and horizontal gene acquisition. The approach saves considerable time and effort relative to laboratory methods used to identify essential genes in species.

Download Full-text

Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

Molecular Genetics and Genomics ◽

10.1007/s00438-015-1154-z ◽

2016 ◽

Vol 291 (2) ◽

pp. 905-912 ◽

Cited By ~ 12

Author(s):

Xiaowen Yang ◽

Yajie Li ◽

Juan Zang ◽

Yexia Li ◽

Pengfei Bie ◽

...

Keyword(s):

Essential Genes ◽

The Core ◽

Pan Genome ◽

Core Genes

Download Full-text

Heterogeneity among estimates of the core genome and pan-genome in different pneumococcal populations

10.1101/133991 ◽

2017 ◽

Cited By ~ 5

Author(s):

Andries J van Tonder ◽

James E Bray ◽

Keith A Jolley ◽

Sigríður J Quirk ◽

Gunnsteinn Haraldsson ◽

...

Keyword(s):

Bacterial Population ◽

Core Genome ◽

Bacterial Species ◽

Essential Point ◽

Genetic Lineages ◽

The Core ◽

Pan Genome ◽

Single Dataset ◽

Genomic Regions ◽

Core Genes

AbstractBackgroundUnderstanding the structure of a bacterial population is essential in order to understand bacterial evolution, or which genetic lineages cause disease, or the consequences of perturbations to the bacterial population. Estimating the core genome, the genes common to all or nearly all strains of a species, is an essential component of such analyses. The size and composition of the core genome varies by dataset, but our hypothesis was that variation between different collections of the same bacterial species should be minimal. To test this, the genome sequences of 3,121 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (USA) and Maela (Thailand) were analysed.ResultsThe analyses revealed a ‘supercore’ genome (genes shared by all 3,121 pneumococci) of only 303 genes, although 461 additional core genes were shared by pneumococci from Reykjavik, Southampton and Boston. Overall, the size and composition of the core genomes and pan-genomes among pneumococci recovered in Reykjavik, Southampton and Boston were very similar, but pneumococci from Maela were distinctly different. Inspection of the pan-genome of Maela pneumococci revealed several >25 Kb sequence regions that were homologous to genomic regions found in other bacterial species.ConclusionsSome subsets of the global pneumococcal population are highly heterogeneous and thus our hypothesis was rejected. This is an essential point of consideration before generalising the findings from a single dataset to the wider pneumococcal population.

Download Full-text

Pan-genome of Novel Pantoea stewartii subsp. indologenes Reveal Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer

10.20944/preprints202107.0400.v1 ◽

2021 ◽

Author(s):

Gaurav Agarwal ◽

Ronald D. Gitaitis ◽

Bhabesh Dutta

Keyword(s):

Gene Transfer ◽

Core Genome ◽

Foxtail Millet ◽

Evaluation Study ◽

Full Spectrum ◽

The Core ◽

Pan Genome ◽

Pantoea Stewartii ◽

Comparative Phylogenetic Analysis ◽

Core Genes

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot of foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onion. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onion and millets or on millets only, respectively. In the current study we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n= 4) and pv. setariae (n=13)]. The full spectrum of the pan-genome contained 7,030 genes. Among these, 3,546 (present in genomes of all 17 strains) were the core genes that were a subset of 3,682 soft-core genes (present in ≥16 strains). The accessory genome included 1,308 shell genes and 2,040 cloud genes (present in ≤ 2 strains). The pan-genome showed a clear liner progression with >6,000 genes, suggesting the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison to core genome SNP-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study including four other Pantoea species namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfers observed between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes including seven cluster of genes associated with the pathogenicity phenotype on onion. One of the clusters contain 11 genes with known functions and are found to be chromosomally located.

Download Full-text

Large-scale identification of pathogen essential genes during coinfection with sympatric and allopatric microbes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1907619116 ◽

2019 ◽

Vol 116 (39) ◽

pp. 19685-19694 ◽

Cited By ~ 9

Author(s):

Gina R. Lewin ◽

Apollo Stacy ◽

Kelly L. Michie ◽

Richard J. Lamont ◽

Marvin Whiteley

Keyword(s):

Large Scale ◽

Sympatric Species ◽

Essential Genes ◽

Aerobic Respiration ◽

Universal Functions ◽

Metabolic Capacity ◽

The Core ◽

New Community ◽

Polymicrobial Infections ◽

Core Genes

Recent evidence suggests that the genes an organism needs to survive in an environment drastically differ when alone or in a community. However, it is not known if there are universal functions that enable microbes to persist in a community and if there are functions specific to interactions between microbes native to the same (sympatric) or different (allopatric) environments. Here, we ask how the essential functions of the oral pathogen Aggregatibacter actinomycetemcomitans change during pairwise coinfection in a murine abscess with each of 15 microbes commonly found in the oral cavity and 10 microbes that are not. A. actinomycetemcomitans was more abundant when coinfected with allopatric than with sympatric microbes, and this increased fitness correlated with expanded metabolic capacity of the coinfecting microbes. Using transposon sequencing, we discovered that 33% of the A. actinomycetemcomitans genome is required for coinfection fitness. Fifty-nine “core” genes were required across all coinfections and included genes necessary for aerobic respiration. The core genes were also all required in monoinfection, indicating the essentiality of these genes cannot be alleviated by a coinfecting microbe. Furthermore, coinfection with some microbes, predominately sympatric species, induced the requirement for over 100 new community-dependent essential genes. In contrast, in other coinfections, predominately with nonoral species, A. actinomycetemcomitans required 50 fewer genes than in monoinfection, demonstrating that some allopatric microbes can drastically alleviate gene essentialities. These results expand our understanding of how diverse microbes alter growth and gene essentiality within polymicrobial infections.

Download Full-text

Evaluating the accuracy of Listeria monocytogenes assemblies from quasimetagenomic samples using long and short reads

BMC Genomics ◽

10.1186/s12864-021-07702-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Seth Commichaux ◽

Kiran Javkar ◽

Padmini Ramachandran ◽

Niranjan Nagarajan ◽

Denis Bertrand ◽

...

Keyword(s):

Public Health ◽

Public Health Response ◽

High Quality ◽

Short Read ◽

Short Reads ◽

The Core ◽

Long Reads ◽

Health Response ◽

Long Read ◽

Core Genes

Abstract Background Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. Results We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. Conclusion The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.

Download Full-text

Genome-wide mapping of unexplored essential regions in the Saccharomyces cerevisiae genome: evidence for hidden synthetic lethal combinations in a genetic interaction network

Nucleic Acids Research ◽

10.1093/nar/gku576 ◽

2014 ◽

Vol 42 (15) ◽

pp. 9838-9853 ◽

Cited By ~ 6

Author(s):

Saeed Kaboli ◽

Takuya Yamakawa ◽

Keisuke Sunada ◽

Tao Takagaki ◽

Yu Sasano ◽

...

Keyword(s):

Saccharomyces Cerevisiae ◽

Genetic Interaction ◽

Interaction Network ◽

Essential Genes ◽

Genetic Interactions ◽

Lethal Gene ◽

Synthetic Lethal ◽

Genome Wide ◽

A Genome ◽

Wide Scale

Abstract Despite systematic approaches to mapping networks of genetic interactions in Saccharomyces cerevisiae, exploration of genetic interactions on a genome-wide scale has been limited. The S. cerevisiae haploid genome has 110 regions that are longer than 10 kb but harbor only non-essential genes. Here, we attempted to delete these regions by PCR-mediated chromosomal deletion technology (PCD), which enables chromosomal segments to be deleted by a one-step transformation. Thirty-three of the 110 regions could be deleted, but the remaining 77 regions could not. To determine whether the 77 undeletable regions are essential, we successfully converted 67 of them to mini-chromosomes marked with URA3 using PCR-mediated chromosome splitting technology and conducted a mitotic loss assay of the mini-chromosomes. Fifty-six of the 67 regions were found to be essential for cell growth, and 49 of these carried co-lethal gene pair(s) that were not previously been detected by synthetic genetic array analysis. This result implies that regions harboring only non-essential genes contain unidentified synthetic lethal combinations at an unexpectedly high frequency, revealing a novel landscape of genetic interactions in the S. cerevisiae genome. Furthermore, this study indicates that segmental deletion might be exploited for not only revealing genome function but also breeding stress-tolerant strains.

Download Full-text

Revising the Assumption that Ḥadīṯ Studies Flourished in the 11th/17th-Century Ḥiǧāz: Ibrāhīm al-Kūrānī’s (d. 1101/1690) Contribution

Arabica ◽

10.1163/15700585-12341597 ◽

2021 ◽

Vol 68 (1) ◽

pp. 1-35

Author(s):

Naser Dumairieh

Keyword(s):

General Framework ◽

18Th Century ◽

Critical Role ◽

17Th Century ◽

Main Interest ◽

Islamic World ◽

The Core ◽

Revival Movement ◽

Almost All

Abstract The Ḥiǧāz in the 11th/17th century has long been considered the center of a “revival” movement in ḥadīṯ studies. This assumption has spread widely among scholars of the 11th-/17th- and 12th-/18th-century Islamic world based on the fact that the isnāds of many major ḥadīṯ scholars from almost all parts of the Islamic world from the 11th/17th century onward return to a group of scholars in the Ḥiǧāz. The scholarly group that is assumed to have played a critical role in the flourishing of ḥadīṯ studies in the 11th/17th-century Ḥiǧāz is called the al-Ḥaramayn circle or network. However, to date, there have been no studies that investigate what was actually happening in that century concerning ḥadīṯ studies. Examining the actual ḥadīṯ studies of one of the scholars at the core of al-Ḥaramayn circle, i.e. Ibrāhīm b. Ḥasan al-Kūrānī, will unpack the main interest of Ḥiǧāzī scholars in ḥadīṯ literature, reveal previously unstudied aspects of ḥadīṯ studies in the 11th/17th-century Ḥiǧāz, correct some unexamined assumptions, and situate the ḥadīṯ efforts of scholars of the 11th/17th-century Ḥiǧāz within a general framework of developments within ḥadīṯ studies.

Download Full-text

Identification of CCNB2 expression in triple-negative breast cancer based on bioinformatics results

10.21203/rs.3.rs-506326/v1 ◽

2021 ◽

Author(s):

jintao cao ◽

SHUAI SUN ◽

RAN LI ◽

RUI MIN ◽

XINGYU FAN ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Triple Negative Breast Cancer ◽

Protein Complex ◽

Triple Negative ◽

Expression Profiles ◽

Pathway Enrichment Analysis ◽

Analysis Tool ◽

The Core ◽

Core Genes

Abstract Background The current epidemiology shows that the incidence of breast cancer is increasing year by year and tends to be younger. Triple-negative breast cancer is the most malignant of breast cancer subtypes. The application of bioinformatics in tumor research is becoming more and more extensive. This study provided research ideas and basis for exploring the potential targets of gene therapy for triple-negative breast cancer (TNBC). Methods We analyzed three gene expression profiles (GSE64790、GSE62931、GSE38959) selected from the Gene Expression Omnibus (GEO) database. The GEO2R online analysis tool was used to screen for differentially expressed genes (DEGs) between TNBC and normal tissues. Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were applied to identify the pathways and functional annotation of DEGs. Protein–protein interaction network of these DEGs were visualized by the Metascape gene-list analysis tool so that we could find the protein complex containing the core genes. Subsequently, we investigated the transcriptional data of the core genes in patients with breast cancer from the Oncomine database. Moreover, the online Kaplan–Meier plotter survival analysis tool was used to evaluate the prognostic value of core genes expression in TNBC patients. Finally, immunohistochemistry (IHC) was used to evaluated the expression level and subcellular localization of CCNB2 on TNBC tissues. Results A total of 66 DEGs were identified, including 33 up-regulated genes and 33 down-regulated genes. Among them, a potential protein complex containing five core genes was screened out. The high expression of these core genes was correlated to the poor prognosis of patients suffering breast cancer, especially the overexpression of CCNB2. CCNB2 protein positively expressed in the cytoplasm, and its expression in triple-negative breast cancer tissues was significantly higher than that in adjacent tissues. Conclusions CCNB2 may play a crucial role in the development of TNBC and has the potential as a prognostic biomarker of TNBC.

Download Full-text