scholarly journals Genomic network analysis of environmental and livestock F-type plasmid populations

2021 ◽  
Author(s):  
William Matlock ◽  
◽  
Kevin K. Chau ◽  
Manal AbuOun ◽  
Emma Stubberfield ◽  
...  

AbstractF-type plasmids are diverse and of great clinical significance, often carrying genes conferring antimicrobial resistance (AMR) such as extended-spectrum β-lactamases, particularly in Enterobacterales. Organising this plasmid diversity is challenging, and current knowledge is largely based on plasmids from clinical settings. Here, we present a network community analysis of a large survey of F-type plasmids from environmental (influent, effluent and upstream/downstream waterways surrounding wastewater treatment works) and livestock settings. We use a tractable and scalable methodology to examine the relationship between plasmid metadata and network communities. This reveals how niche (sampling compartment and host genera) partition and shape plasmid diversity. We also perform pangenome-style analyses on network communities. We show that such communities define unique combinations of core genes, with limited overlap. Building plasmid phylogenies based on alignments of these core genes, we demonstrate that plasmid accessory function is closely linked to core gene content. Taken together, our results suggest that stable F-type plasmid backbone structures can persist in environmental settings while allowing dramatic variation in accessory gene content that may be linked to niche adaptation. The association of F-type plasmids with AMR may reflect their suitability for rapid niche adaptation.

2020 ◽  
Author(s):  
William Matlock ◽  
Kevin K. Chau ◽  
Manal AbuOun ◽  
Emma Stubberfield ◽  
Leanne Barker ◽  
...  

AbstractIncF plasmids are diverse and of great clinical significance, often carrying genes conferring antimicrobial resistance (AMR) such as extended-spectrum β-lactamases, particularly in Enterobacteriaceae. Organising this plasmid diversity is challenging, and current knowledge is largely based on plasmids from clinical settings. Here, we present a network community analysis of a large survey of IncF plasmids from environmental (influent, effluent, and upstream/downstream waterways surrounding wastewater treatment works) and livestock settings. We use a tractable and scalable methodology to examine the relationship between plasmid metadata and network communities. This reveals how niche (sampling compartment and host genera) partition and shape plasmid diversity. We also perform pangenome-style analyses on network communities. We show that such communities define unique combinations of core genes, with limited overlap. Building plasmid phylogenies based on alignments of these core genes, we demonstrate that plasmid accessory function is closely linked to core gene content. Taken together, our results suggest that stable IncF plasmid backbone structures can persist in environmental settings while allowing dramatic variation in accessory gene content that may be linked to niche adaptation. The recent association of IncF plasmids with AMR likely reflects their suitability for rapid niche adaptation.


2019 ◽  
Vol 17 (03) ◽  
pp. 1940005 ◽  
Author(s):  
Chun-Yu Lin ◽  
Peiying Ruan ◽  
Ruiming Li ◽  
Jinn-Moon Yang ◽  
Simon See ◽  
...  

Cancer subtype identification is an unmet need in precision diagnosis. Recently, evolutionary conservation has been indicated to contain informative signatures for functional significance in cancers. However, the importance of evolutionary conservation in distinguishing cancer subtypes remains largely unclear. Here, we identified the evolutionarily conserved genes (i.e. core genes) and observed that they are primarily involved in cellular pathways relevant to cell growth and metabolisms. By using these core genes, we developed two novel strategies, namely a feature-based strategy (FES) and an image-based strategy (IMS) by integrating their evolutionary and genomic profiles with the deep learning algorithm. In comparison with the FES using the random set and the strategy using the PAM50 classifier, the core gene set-based FES achieved a higher accuracy for identifying breast cancer subtypes. The IMS and FES using the core gene set yielded better performances than the other strategies, in terms of classifying both breast cancer subtypes and multiple cancer types. Moreover, the IMS is reproducible even using different gene expression data (i.e. RNA-seq and microarray). Comprehensive analysis of eight cancer types demonstrates that our evolutionary conservation-based models represent a valid and helpful approach for identifying cancer subtypes and the core gene set offers distinguishable clues of cancer subtypes.


Open Biology ◽  
2015 ◽  
Vol 5 (1) ◽  
pp. 140133 ◽  
Author(s):  
Nitin Kumar ◽  
Ganesh Lad ◽  
Elisa Giuntini ◽  
Maria E. Kaye ◽  
Piyachat Udomwong ◽  
...  

Biological species may remain distinct because of genetic isolation or ecological adaptation, but these two aspects do not always coincide. To establish the nature of the species boundary within a local bacterial population, we characterized a sympatric population of the bacterium Rhizobium leguminosarum by genomic sequencing of 72 isolates. Although all strains have 16S rRNA typical of R. leguminosarum , they fall into five genospecies by the criterion of average nucleotide identity (ANI). Many genes, on plasmids as well as the chromosome, support this division: recombination of core genes has been largely within genospecies. Nevertheless, variation in ecological properties, including symbiotic host range and carbon-source utilization, cuts across these genospecies, so that none of these phenotypes is diagnostic of genospecies. This phenotypic variation is conferred by mobile genes. The genospecies meet the Mayr criteria for biological species in respect of their core genes, but do not correspond to coherent ecological groups, so periodic selection may not be effective in purging variation within them. The population structure is incompatible with traditional ‘polyphasic taxonomy′ that requires bacterial species to have both phylogenetic coherence and distinctive phenotypes. More generally, genomics has revealed that many bacterial species share adaptive modules by horizontal gene transfer, and we envisage a more consistent taxonomic framework that explicitly recognizes this. Significant phenotypes should be recognized as ‘biovars' within species that are defined by core gene phylogeny.


2021 ◽  
Vol 16 (11) ◽  
pp. 1934578X2110609
Author(s):  
Xiaofan Guo ◽  
Shouming Wang

Inonotus obliquus is a rare, edible and medicinal fungus that is widely used as a remedy for various diseases. Its main bioactive substances are polysaccharides and terpenoids. In this study, we characterized and investigated the pan-genome of three strains of I. obliquus. The genome sizes of JL01, HE, and NBRC8681 were 32.04, 29.04, and 31.78 Mb, respectively. There were 6 543 core gene families and 6 197 accessory gene families among the three strains, with 14 polysaccharide-related core gene families and seven accessory gene families. For terpenoids, there were 13 core gene families and 17 accessory gene families. Pan-genome sequencing of I. obliquus has improved our understanding of biological characteristics related to the biosynthesis of polysaccharides and terpenoids at the molecular level, which in turn will enable us to increase the production of polysaccharides and terpenoids by this mushroom.


Author(s):  
Longxiu Yang ◽  
Yuan Qin ◽  
Chongdong Jian

Alzheimer’s disease (AD), a nervous system disease, lacks effective therapies at present. RNA expression is the basic way to regulate life activities, and identifying related characteristics in AD patients may aid the exploration of AD pathogenesis and treatment. This study developed a classifier that could accurately classify AD patients and healthy people, and then obtained 3 core genes that may be related to the pathogenesis of AD. To this end, RNA expression data of the middle temporal gyrus of AD patients were firstly downloaded from GEO database, and the data were then normalized using limma package following a supplementation of missing data by k-Nearest Neighbor (KNN) algorithm. Afterwards, the top 500 genes of the most feature importance were obtained through Max-Relevance and Min-Redundancy (mRMR) analysis, and based on these genes, a series of AD classifiers were constructed through Support Vector Machine (SVM), Random Forest (RF), and KNN algorithms. Then, the KNN classifier with the highest Matthews correlation coefficient (MCC) value composed of 14 genes in incremental feature selection (IFS) analysis was identified as the best AD classifier. As analyzed, the 14 genes played a pivotal role in determination of AD and may be core genes associated with the pathogenesis of AD. Finally, protein-protein interaction (PPI) network and Random Walk with Restart (RWR) analysis were applied to obtain core gene-associated genes, and key pathways related to AD were further analyzed. Overall, this study contributed to a deeper understanding of AD pathogenesis and provided theoretical guidance for related research and experiments.


2011 ◽  
Vol 2011 ◽  
pp. 1-15 ◽  
Author(s):  
Solange Ana Belen Miele ◽  
Matías Javier Garavaglia ◽  
Mariano Nicolás Belaich ◽  
Pablo Daniel Ghiringhelli

The Baculoviridae is a large group of insect viruses containing circular double-stranded DNA genomes of 80 to 180 kbp. In this study, genome sequences from 57 baculoviruses were analyzed to reevaluate the number and identity of core genes and to understand the distribution of the remaining coding sequences. Thirty one core genes with orthologs in all genomes were identified along with other 895 genes differing in their degrees of representation among reported genomes. Many of these latter genes are common to well-defined lineages, whereas others are unique to one or a few of the viruses. Phylogenetic analyses based on core gene sequences and the gene composition of the genomes supported the current division of the Baculoviridae into 4 genera: Alphabaculovirus, Betabaculovirus, Gammabaculovirus, and Deltabaculovirus.


2011 ◽  
Vol 22 (01) ◽  
pp. 35-50 ◽  
Author(s):  
CARLO PICCARDI ◽  
LISA CALATRONI ◽  
FABIO BERTONI

In this paper, we describe a method for clustering financial time series which is based on community analysis, a recently developed approach for partitioning the nodes of a network (graph). A network with N nodes is associated to the set of N time series. The weight of the link (i, j), which quantifies the similarity between the two corresponding time series, is defined according to a metric based on symbolic time series analysis, which has recently proved effective in the context of financial time series. Then, searching for network communities allows one to identify groups of nodes (and then time series) with strong similarity. A quantitative assessment of the significance of the obtained partition is also provided. The method is applied to two distinct case-studies concerning the US and Italy Stock Exchange, respectively. In the US case, the stability of the partitions over time is also thoroughly investigated. The results favorably compare with those obtained with the standard tools typically used for clustering financial time series, such as the minimal spanning tree and the hierarchical tree.


2011 ◽  
Vol 60 (1) ◽  
pp. 35-45 ◽  
Author(s):  
Dejing Wu ◽  
Xiangmei Li ◽  
Yonghong Yang ◽  
Yaojie Zheng ◽  
Chuanqing Wang ◽  
...  

This study aimed to evaluate the distribution of superantigen gene profiles and the presence of exfoliative toxin genes in community-acquired meticillin-resistant Staphylococcus aureus (CA-MRSA) isolated from Chinese children, and simultaneously to assess virulence gene profiles and genetic background. Of the CA-MRSA isolates, 88.9 % (88/99) harboured toxin genes, with sek as the most frequent toxin gene (62.6 %), followed by seq (61.6 %), seb (60.6 %) and sea (35.4 %). The eta gene was detected only in one ST398-IVa-spa t034 strain. The sed and etd genes were not found in any of the isolates tested. A total of 38 virulence genotypes were observed, of which the genotype seb-sek-seq (27.3 %, 24/88) comprised the majority, followed by sea-seb-sek-seq (18.2 %, 16/88). The enterotoxin gene cluster including seg-sei-sem-sen-seo-seu predominated at a rate of 15.1 %. The relationship among toxin genotypes, toxin genes encoding profiles of mobile genetic elements and genetic background was analysed. Among 66 clonal complex (CC) 59 isolates, 87.9 % (58/66) were positive for toxin genes, and 75.8 % (50/66) harboured the toxin gene combination seb-sek-seq. Among seb-sek-seq-positive CC59 strains, 42.0 % (21/50) also carried the sea gene. CC59 corresponded exclusively to accessory gene regulator 1 (agr-1). The data presented here enhance our current knowledge on the virulence determinants of CA-MRSA.


2021 ◽  
Vol 102 (3) ◽  
Author(s):  
Michael J. Arvin ◽  
Ange Lorenzi ◽  
Gaelen R. Burke ◽  
Michael R. Strand

Bracoviruses (BVs) are endogenized nudiviruses that braconid parasitoid wasps have coopted for functions in parasitizing hosts. Microplitis demolitor is a braconid wasp that produces Microplitis demolitor bracovirus (MdBV) and parasitizes the larval stage of the moth Chrysodeixis includens. Some BV core genes are homologs of genes also present in baculoviruses while others are only known from nudiviruses or other BVs. In this study, we had two main goals. The first was to separate MdBV virions into envelope and nucleocapsid fractions before proteomic analysis to identify core gene products that were preferentially associated with one fraction or the other. Results indicated that nearly all MdBV baculovirus-like gene products that were detected by our proteomic analysis had similar distributions to homologs in the occlusion-derived form of baculoviruses. Several core gene products unknown from baculoviruses were also identified as envelope or nucleocapsid components. Our second goal was to functionally characterize a core gene unknown from baculoviruses that was originally named HzNVorf64-like. Immunoblotting assays supported our proteomic data that identified HzNVorf64-like as an envelope protein. We thus renamed HzNVorf64-like as MdBVe46, which we further hypothesized was important for infection of C. includens. Knockdown of MdBVe46 by RNA interference (RNAi) greatly reduced transcript and protein abundance. Knockdown of MdBVe46 also altered virion morphogenesis, near-fully inhibited infection of C. includens, and significantly reduced the proportion of hosts that were successfully parasitized by M. demolitor.


Author(s):  
Marcus Nguyen ◽  
Robert Olson ◽  
Maulik Shukla ◽  
Margo VanOeffelen ◽  
James J. Davis

AbstractA growing number of studies have shown that machine learning algorithms can be used to accurately predict antimicrobial resistance (AMR) phenotypes from bacterial sequence data. In these studies, models are typically trained using input features derived from comprehensive sets of known AMR genes or whole genome sequences. However, it can be difficult to determine whether genomes and their corresponding sets of AMR genes are complete when sequencing contaminated or metagenomic samples. In this study, we explore the possibility of using incomplete genome sequence data to predict AMR phenotypes. Machine learning models were built from randomly-selected sets of core genes that are held in common among the members of a species, and the AMR-conferring genes were removed based on their protein annotations. For Klebsiella pneumoniae, Mycobacterium tuberculosis, Salmonella enterica, and Staphylococcus aureus, we report that it is possible to classify susceptible and resistant phenotypes with average F1 scores ranging from 0.80-0.89 with as few as 100 conserved non-AMR genes, with very major error rates ranging from 0.11-0.23 and major error rates ranging from 0.10-0.20. Models built from core genes have predictive power in the cases where the primary AMR mechanism results from SNPs or horizontal gene transfer. By randomly sampling non-overlapping sets of core genes for use in these models, we show that F1 scores and error rates are stable and have little variance between replicates. Potential biases from strain-specific SNPs, phylogenetic sampling, and imbalances in the phylogenetic distribution of susceptible and resistant strains do not appear to have an impact on this result. Although these small core gene models have lower accuracies and higher error rates than models built from the corresponding assembled genomes, the results suggest that sufficient variation exists in the core non-AMR genes of a species for predicting AMR phenotypes. Overall this study suggests that building models from conserved genes may be a potentially useful strategy for predicting AMR phenotypes when genomes are incomplete.


Sign in / Sign up

Export Citation Format

Share Document