scholarly journals FunOrder 2.0 – a fully automated method for the identification of co-evolved genes

2022 ◽  
Author(s):  
Gabriel A. Vignolle ◽  
Robert L. Mach ◽  
Astrid R. Mach-Aigner ◽  
Christian Derntl

Coevolution is an important biological process that shapes interacting species or even proteins – may it be physically interacting proteins or consecutive enzymes in a metabolic pathway. The detection of co-evolved proteins will contribute to a better understanding of biological systems. Previously, we developed a semi-automated method, termed FunOrder, for the detection of co-evolved genes from an input gene or protein set. We demonstrated the usability and applicability of FunOrder by identifying essential genes in biosynthetic gene clusters from different ascomycetes. A major drawback of this original method was the need for a manual assessment, which may create a user bias and prevents a high-throughput application. Here we present a fully automated version of this method termed FunOrder 2.0. To fully automatize the method, we used several mathematical indices to determine the optimal number of clusters in the FunOrder output, and a subsequent k-means clustering based on the first three principal components of a principal component analysis of the FunOrder output. Further, we replaced the BLAST with the DIAMOND tool, which enhanced speed and allows the future integration of larger proteome databases. The introduced changes slightly decreased the sensitivity of this method, which is outweighed by enhanced overall speed and specificity. Additionally, the changes lay the foundation for future high-throughput applications of FunOrder 2.0 in different phyla to solve different biological problems.

2007 ◽  
Vol 14 (3) ◽  
pp. 303-312 ◽  
Author(s):  
Jun Yin ◽  
Paul D. Straight ◽  
Siniša Hrvatin ◽  
Pieter C. Dorrestein ◽  
Stefanie B. Bumpus ◽  
...  

2021 ◽  
Author(s):  
John B. Lemos ◽  
Matheus R. S. Barbosa ◽  
Edric B. Troccoli ◽  
Alexsandro G. Cerqueira

This work aims to delimit the Direct Hydrocarbon Indicators (DHI) zones using the Gaussian Mixture Models (GMM) algorithm, an unsupervised machine learning method, over the FS8 seismic horizon in the seismic data of the Dutch F3 Field. The dataset used to perform the cluster analysis was extracted from the 3D seismic dataset. It comprises the following seismic attributes: Sweetness, Spectral Decomposition, Acoustic Impedance, Coherence, and Instantaneous Amplitude. The Principal Component Analysis (PCA) algorithm was applied in the original dataset for dimensionality reduction and noise filtering, and we choose the first three principal components to be the input of the clustering algorithm. The cluster analysis using the Gaussian Mixture Models was performed by varying the number of groups from 2 to 20. The Elbow Method suggested a smaller number of groups than needed to isolate the DHI zones. Therefore, we observed that four is the optimal number of clusters to highlight this seismic feature. Furthermore, it was possible to interpret other clusters related to the lithology through geophysical well log data.


2020 ◽  
Author(s):  
Gergana A Vandova ◽  
Aleksandra Nivina ◽  
Chaitan Khosla ◽  
Ronald W Davis ◽  
Curt R Fisher ◽  
...  

AbstractBackgroundPolyketide secondary metabolites have been a rich source of antibiotic discovery for decades. Thousands of novel polyketide synthase (PKS) gene clusters have been identified in recent years with advances in DNA sequencing. However, experimental characterization of novel and useful PKS activities remains complicated. As a result, computational tools to analyze sequence data are essential to identify and prioritize potentially novel PKS activities. Here we exploit the concept of genetically-encoded self-resistance to identify and rank biosynthetic gene clusters for their potential to encode novel antibiotics.ResultsTo identify PKS genes that are likely to produce an antibacterial compound, we developed an automated method to identify and catalog clusters that harbor potential self-resistance genes. We manually curated a list of known self-resistance genes and searched all NCBI genome databases for homologs of these self-resistance genes in biosynthetic gene clusters. The algorithm takes into account (1) the distance of the potential self-resistance gene to a core enzyme in the biosynthetic gene cluster; (2) the presence of a duplicated housekeeping copy of the self-resistance gene; (3) the presence of close homologs of the biosynthetic gene cluster in diverse species also harboring the putative self-resistance gene; (4) evidence for coevolution of the self-resistance gene and core biosynthetic gene; and (5) self-resistance gene ubiquity. We generated a catalog of 190 unique PKS clusters whose products likely target known enzymes of antibacterial importance. We also present an expanded set of putative self-resistance genes that may be useful in identifying small molecules active against novel microbial targets.ConclusionsWe developed a bioinformatic approach to identify and rank biosynthetic gene clusters that likely harbor self-resistance genes and may produce compounds with antibacterial properties. We compiled a list of putative self-resistance genes for novel antibacterial targets, and of orphan PKS clusters harboring these targets. These catalogues are a resource for discovery of novel antibiotics.


Author(s):  
Patrick Videau ◽  
Kaitlyn Wells ◽  
Arun Singh ◽  
Jessie Eiting ◽  
Philip Proteau ◽  
...  

Cyanobacteria are prolific producers of natural products and genome mining has shown that many orphan biosynthetic gene clusters can be found in sequenced cyanobacterial genomes. New tools and methodologies are required to investigate these biosynthetic gene clusters and here we present the use of <i>Anabaena </i>sp. strain PCC 7120 as a host for combinatorial biosynthesis of natural products using the indolactam natural products (lyngbyatoxin A, pendolmycin, and teleocidin B-4) as a test case. We were able to successfully produce all three compounds using codon optimized genes from Actinobacteria. We also introduce a new plasmid backbone based on the native <i>Anabaena</i>7120 plasmid pCC7120ζ and show that production of teleocidin B-4 can be accomplished using a two-plasmid system, which can be introduced by co-conjugation.


2020 ◽  
Author(s):  
Jakob Dahl ◽  
Xingzhi Wang ◽  
Xiao Huang ◽  
Emory Chan ◽  
Paul Alivisatos

<p>Advances in automation and data analytics can aid exploration of the complex chemistry of nanoparticles. Lead halide perovskite colloidal nanocrystals provide an interesting proving ground: there are reports of many different phases and transformations, which has made it hard to form a coherent conceptual framework for their controlled formation through traditional methods. In this work, we systematically explore the portion of Cs-Pb-Br synthesis space in which many optically distinguishable species are formed using high-throughput robotic synthesis to understand their formation reactions. We deploy an automated method that allows us to determine the relative amount of absorbance that can be attributed to each species in order to create maps of the synthetic space. These in turn facilitate improved understanding of the interplay between kinetic and thermodynamic factors that underlie which combination of species are likely to be prevalent under a given set of conditions. Based on these maps, we test potential transformation routes between perovskite nanocrystals of different shapes and phases. We find that shape is determined kinetically, but many reactions between different phases show equilibrium behavior. We demonstrate a dynamic equilibrium between complexes, monolayers and nanocrystals of lead bromide, with substantial impact on the reaction outcomes. This allows us to construct a chemical reaction network that qualitatively explains our results as well as previous reports and can serve as a guide for those seeking to prepare a particular composition and shape. </p>


2018 ◽  
Vol 14 (1) ◽  
pp. 11-23 ◽  
Author(s):  
Lin Zhang ◽  
Yanling He ◽  
Huaizhi Wang ◽  
Hui Liu ◽  
Yufei Huang ◽  
...  

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. <P><P> Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. <P><P> Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. <P><P> Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. <P><P> Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. <P><P> Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.


Sign in / Sign up

Export Citation Format

Share Document