FunOrder 2.0 – a fully automated method for the identification of co-evolved genes

Coevolution is an important biological process that shapes interacting species or even proteins – may it be physically interacting proteins or consecutive enzymes in a metabolic pathway. The detection of co-evolved proteins will contribute to a better understanding of biological systems. Previously, we developed a semi-automated method, termed FunOrder, for the detection of co-evolved genes from an input gene or protein set. We demonstrated the usability and applicability of FunOrder by identifying essential genes in biosynthetic gene clusters from different ascomycetes. A major drawback of this original method was the need for a manual assessment, which may create a user bias and prevents a high-throughput application. Here we present a fully automated version of this method termed FunOrder 2.0. To fully automatize the method, we used several mathematical indices to determine the optimal number of clusters in the FunOrder output, and a subsequent k-means clustering based on the first three principal components of a principal component analysis of the FunOrder output. Further, we replaced the BLAST with the DIAMOND tool, which enhanced speed and allows the future integration of larger proteome databases. The introduced changes slightly decreased the sensitivity of this method, which is outweighed by enhanced overall speed and specificity. Additionally, the changes lay the foundation for future high-throughput applications of FunOrder 2.0 in different phyla to solve different biological problems.

Download Full-text

Faculty Opinions recommendation of Genome-wide high-throughput mining of natural-product biosynthetic gene clusters by phage display.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1083041.536022 ◽

2007 ◽

Author(s):

Burckhard Seelig

Keyword(s):

Phage Display ◽

Natural Product ◽

High Throughput ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Genome Wide

Download Full-text

Genome-Wide High-Throughput Mining of Natural-Product Biosynthetic Gene Clusters by Phage Display

Chemistry & Biology ◽

10.1016/j.chembiol.2007.01.006 ◽

2007 ◽

Vol 14 (3) ◽

pp. 303-312 ◽

Cited By ~ 44

Author(s):

Jun Yin ◽

Paul D. Straight ◽

Siniša Hrvatin ◽

Pieter C. Dorrestein ◽

Stefanie B. Bumpus ◽

...

Keyword(s):

Phage Display ◽

Natural Product ◽

High Throughput ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Genome Wide

Download Full-text

Probability Estimation of Direct Hydrocarbon Indicators Using Gaussian Mixture Models

10.21528/cbic2021-131 ◽

2021 ◽

Author(s):

John B. Lemos ◽

Matheus R. S. Barbosa ◽

Edric B. Troccoli ◽

Alexsandro G. Cerqueira

Keyword(s):

Cluster Analysis ◽

Mixture Models ◽

Clustering Algorithm ◽

Gaussian Mixture Models ◽

Principal Component ◽

Gaussian Mixture ◽

Optimal Number ◽

Original Dataset ◽

Pca Algorithm ◽

Optimal Number Of Clusters

This work aims to delimit the Direct Hydrocarbon Indicators (DHI) zones using the Gaussian Mixture Models (GMM) algorithm, an unsupervised machine learning method, over the FS8 seismic horizon in the seismic data of the Dutch F3 Field. The dataset used to perform the cluster analysis was extracted from the 3D seismic dataset. It comprises the following seismic attributes: Sweetness, Spectral Decomposition, Acoustic Impedance, Coherence, and Instantaneous Amplitude. The Principal Component Analysis (PCA) algorithm was applied in the original dataset for dimensionality reduction and noise filtering, and we choose the first three principal components to be the input of the clustering algorithm. The cluster analysis using the Gaussian Mixture Models was performed by varying the number of groups from 2 to 20. The Elbow Method suggested a smaller number of groups than needed to isolate the DHI zones. Therefore, we observed that four is the optimal number of clusters to highlight this seismic feature. Furthermore, it was possible to interpret other clusters related to the lithology through geophysical well log data.

Download Full-text

High-Throughput Transcriptional Characterization of Regulatory Sequences from Bacterial Biosynthetic Gene Clusters

ACS Synthetic Biology ◽

10.1021/acssynbio.0c00639 ◽

2021 ◽

Author(s):

Jimin Park ◽

Sung Sun Yim ◽

Harris H. Wang

Keyword(s):

High Throughput ◽

Gene Clusters ◽

Regulatory Sequences ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters

Download Full-text

Identification of polyketide biosynthetic gene clusters that harbor self-resistance target genes

10.1101/2020.06.01.128595 ◽

2020 ◽

Author(s):

Gergana A Vandova ◽

Aleksandra Nivina ◽

Chaitan Khosla ◽

Ronald W Davis ◽

Curt R Fisher ◽

...

Keyword(s):

Gene Cluster ◽

Resistance Gene ◽

Resistance Genes ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

The Self ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Automated Method ◽

Novel Antibiotics

AbstractBackgroundPolyketide secondary metabolites have been a rich source of antibiotic discovery for decades. Thousands of novel polyketide synthase (PKS) gene clusters have been identified in recent years with advances in DNA sequencing. However, experimental characterization of novel and useful PKS activities remains complicated. As a result, computational tools to analyze sequence data are essential to identify and prioritize potentially novel PKS activities. Here we exploit the concept of genetically-encoded self-resistance to identify and rank biosynthetic gene clusters for their potential to encode novel antibiotics.ResultsTo identify PKS genes that are likely to produce an antibacterial compound, we developed an automated method to identify and catalog clusters that harbor potential self-resistance genes. We manually curated a list of known self-resistance genes and searched all NCBI genome databases for homologs of these self-resistance genes in biosynthetic gene clusters. The algorithm takes into account (1) the distance of the potential self-resistance gene to a core enzyme in the biosynthetic gene cluster; (2) the presence of a duplicated housekeeping copy of the self-resistance gene; (3) the presence of close homologs of the biosynthetic gene cluster in diverse species also harboring the putative self-resistance gene; (4) evidence for coevolution of the self-resistance gene and core biosynthetic gene; and (5) self-resistance gene ubiquity. We generated a catalog of 190 unique PKS clusters whose products likely target known enzymes of antibacterial importance. We also present an expanded set of putative self-resistance genes that may be useful in identifying small molecules active against novel microbial targets.ConclusionsWe developed a bioinformatic approach to identify and rank biosynthetic gene clusters that likely harbor self-resistance genes and may produce compounds with antibacterial properties. We compiled a list of putative self-resistance genes for novel antibacterial targets, and of orphan PKS clusters harboring these targets. These catalogues are a resource for discovery of novel antibiotics.

Download Full-text

Expanding the Natural Products Heterologous Expression Repertoire in the Model Cyanobacterium Anabaena sp. Strain PCC 7120: Production of Pendolmycin and Teleocidin B-4

10.26434/chemrxiv.11316098.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Patrick Videau ◽

Kaitlyn Wells ◽

Arun Singh ◽

Jessie Eiting ◽

Philip Proteau ◽

...

Keyword(s):

Natural Products ◽

Genome Mining ◽

Gene Clusters ◽

Combinatorial Biosynthesis ◽

Test Case ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Cyanobacterium Anabaena ◽

Anabaena Sp ◽

Pcc 7120

Cyanobacteria are prolific producers of natural products and genome mining has shown that many orphan biosynthetic gene clusters can be found in sequenced cyanobacterial genomes. New tools and methodologies are required to investigate these biosynthetic gene clusters and here we present the use of Anabaena sp. strain PCC 7120 as a host for combinatorial biosynthesis of natural products using the indolactam natural products (lyngbyatoxin A, pendolmycin, and teleocidin B-4) as a test case. We were able to successfully produce all three compounds using codon optimized genes from Actinobacteria. We also introduce a new plasmid backbone based on the native Anabaena7120 plasmid pCC7120ζ and show that production of teleocidin B-4 can be accomplished using a two-plasmid system, which can be introduced by co-conjugation.

Download Full-text

Elucidating the Weakly Reversible Cs-Pb-Br Perovskite Nanocrystal Reaction Network with High-Throughput Maps and Transformations

10.26434/chemrxiv.12253277.v2 ◽

2020 ◽

Author(s):

Jakob Dahl ◽

Xingzhi Wang ◽

Xiao Huang ◽

Emory Chan ◽

Paul Alivisatos

Keyword(s):

High Throughput ◽

Dynamic Equilibrium ◽

Reaction Network ◽

Substantial Impact ◽

Equilibrium Behavior ◽

Automated Method ◽

Lead Bromide ◽

Different Shapes ◽

Complex Chemistry ◽

Lead Halide

Advances in automation and data analytics can aid exploration of the complex chemistry of nanoparticles. Lead halide perovskite colloidal nanocrystals provide an interesting proving ground: there are reports of many different phases and transformations, which has made it hard to form a coherent conceptual framework for their controlled formation through traditional methods. In this work, we systematically explore the portion of Cs-Pb-Br synthesis space in which many optically distinguishable species are formed using high-throughput robotic synthesis to understand their formation reactions. We deploy an automated method that allows us to determine the relative amount of absorbance that can be attributed to each species in order to create maps of the synthetic space. These in turn facilitate improved understanding of the interplay between kinetic and thermodynamic factors that underlie which combination of species are likely to be prevalent under a given set of conditions. Based on these maps, we test potential transformation routes between perovskite nanocrystals of different shapes and phases. We find that shape is determined kinetically, but many reactions between different phases show equilibrium behavior. We demonstrate a dynamic equilibrium between complexes, monolayers and nanocrystals of lead bromide, with substantial impact on the reaction outcomes. This allows us to construct a chemical reaction network that qualitatively explains our results as well as previous reports and can serve as a guide for those seeking to prepare a particular composition and shape.

Download Full-text

Faculty Opinions recommendation of A systematic analysis of biosynthetic gene clusters in the human microbiome reveals a common family of antibiotics.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718872945.793500860 ◽

2014 ◽

Author(s):

Howard Young ◽

Heekyong Bae

Keyword(s):

Human Microbiome ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Systematic Analysis

Download Full-text

Method for determining optimal number of clusters in K-means clustering algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01995 ◽

2010 ◽

Vol 30 (8) ◽

pp. 1995-1998 ◽

Cited By ~ 18

Author(s):

Shi-bing ZHOU ◽

Zhen-yuan XU ◽

Xu-qing TANG

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text

Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

Current Bioinformatics ◽

10.2174/1574893613666180601080008 ◽

2018 ◽

Vol 14 (1) ◽

pp. 11-23 ◽

Cited By ~ 3

Author(s):

Lin Zhang ◽

Yanling He ◽

Huaizhi Wang ◽

Hui Liu ◽

Yufei Huang ◽

...

Keyword(s):

Clustering Analysis ◽

Methylation Level ◽

Optimal Number ◽

Generative Model ◽

Methylation Data ◽

Sequencing Data ◽

Number Of Clusters ◽

Rna Methylation ◽

Clustering Effect ◽

Optimal Number Of Clusters

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Download Full-text