Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations

Murilo Henrique Anzolini Cassiano; Rafael Silva-Rocha

doi:10.1128/msystems.00439-20

Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations

mSystems ◽

10.1128/msystems.00439-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Murilo Henrique Anzolini Cassiano ◽

Rafael Silva-Rocha

Keyword(s):

Predictive Power ◽

False Negative ◽

Model Organism ◽

Data Sets ◽

Promoter Prediction ◽

Promoter Elements ◽

Content Type ◽

Promoter Sequences ◽

Prediction Tools ◽

Dna Elements

The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives.

Get full-text (via PubEx)

Benchmarking available bacterial promoter prediction tools: potentialities and limitations

10.1101/2020.05.05.079335 ◽

2020 ◽

Author(s):

Murilo Henrique Anzolini Cassiano ◽

Rafael Silva-Rocha

Keyword(s):

Predictive Power ◽

Matthews Correlation Coefficient ◽

Sequence Data ◽

Data Sets ◽

Promoter Prediction ◽

Bacterial Genomes ◽

Systematic Comparison ◽

Bioinformatic Tools ◽

Prediction Tools ◽

High Throughput Technology

AbstractBackgroundThe promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massive mapping of promoter elements, we still mainly relay on bioinformatic tools to predict such elements in bacterial genomes. Additionally, despite many different prediction tools have become popular to identify bacterial promoters, there is no systematic comparison of such tools.ResultsHere, we performed a systematic comparison between several widely used promoter prediction tools (BPROM, bTSSfinder, BacPP, CNNProm, IBBP, Virtual Footprint, IPro70-FMWin, 70ProPred, iPromoter-2L and MULTiPly) using well-defined sequence data sets and standardized metrics to determine how well those tools performed related to each other. For this, we used datasets of experimentally validated promoters from Escherichia coli and a control dataset composed by randomly generated sequences with similar nucleotide distributions. We compared the performance of the tools using metrics such as specificity, sensibility, accuracy and Matthews Correlation Coefficient (MCC). We show that the widely used BPROM presented the worse performance among compared tools, while four tools (CNNProm, IPro70-FMWin, 70ProPreda and iPromoter-2L) offered high predictive power. From these, iPro70-FMWin exhibited the best results for most of the metrics used.ConclusionsTherefore, we exploit here some potentials and limitations of available tools and hope future works can be built upon our effort to systematically characterize such quite useful class of bioinformatics tools.

Get full-text (via PubEx)

A high-dimensional classification approach based on class-dependent feature subspace

Industrial Management & Data Systems ◽

10.1108/imds-11-2016-0491 ◽

2017 ◽

Vol 117 (10) ◽

pp. 2325-2339

Author(s):

Fuzan Chen ◽

Harris Wu ◽

Runliang Dou ◽

Minqiang Li

Keyword(s):

Predictive Power ◽

High Dimensional Data ◽

Classification Model ◽

High Dimensional ◽

Svm Classifier ◽

Data Sets ◽

Content Type ◽

Classification Approach ◽

Dimensional Classification ◽

Feature Subspace

Purpose The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification. Design/methodology/approach A classification approach based on class-dependent feature subspace (CFS) is proposed. CFS is a class-dependent integration of a support vector machine (SVM) classifier and associated discriminative features. For each class, our genetic algorithm (GA)-based approach evolves the best subset of discriminative features and SVM classifier simultaneously. To guarantee convergence and efficiency, the authors customize the GA in terms of encoding strategy, fitness evaluation, and genetic operators. Findings Experimental studies demonstrated that the proposed CFS-based approach is superior to other state-of-the-art classification algorithms on UCI data sets in terms of both concise interpretation and predictive power for high-dimensional data. Research limitations/implications UCI data sets rather than real industrial data are used to evaluate the proposed approach. In addition, only single-label classification is addressed in the study. Practical implications The proposed method not only constructs an accurate classification model but also obtains a compact combination of discriminative features. It is helpful for business makers to get a concise understanding of the high-dimensional data. Originality/value The authors propose a compact and effective classification approach for high-dimensional data. Instead of the same feature subset for all the classes, the proposed CFS-based approach obtains the optimal subset of discriminative feature and SVM classifier for each class. The proposed approach enhances both interpretability and predictive power for high-dimensional data.

Get full-text (via PubEx)

Probing the Nanostructure and Arrangement of Bacterial Magnetosomes by Small-Angle X-Ray Scattering

Applied and Environmental Microbiology ◽

10.1128/aem.01513-19 ◽

2019 ◽

Vol 85 (24) ◽

Cited By ~ 4

Author(s):

Sabine Rosenfeldt ◽

Cornelius N. Riese ◽

Frank Mickoleit ◽

Dirk Schüler ◽

Anna S. Schenk

Keyword(s):

Small Angle ◽

Model Organism ◽

Accurate Information ◽

Data Sets ◽

Saxs Data ◽

Content Type ◽

Magnetospirillum Gryphiswaldense ◽

X Ray ◽

X Ray Scattering ◽

Ray Scattering

ABSTRACT Magnetosomes are membrane-enveloped single-domain ferromagnetic nanoparticles enabling the navigation of magnetotactic bacteria along magnetic field lines. Strict control over each step of biomineralization generates particles of high crystallinity, strong magnetization, and remarkable uniformity in size and shape, which is particularly interesting for many biomedical and biotechnological applications. However, to understand the physicochemical processes involved in magnetite biomineralization, close and precise monitoring of particle production is required. Commonly used techniques, such as transmission electron microscopy (TEM) or Fe measurements, allow only for semiquantitative assessment of the magnetosome formation without routinely revealing quantitative structural information. In this study, lab-based small-angle X-ray scattering (SAXS) is explored as a means to monitor the different stages of magnetosome biogenesis in the model organism Magnetospirillum gryphiswaldense. SAXS is evaluated as a quantitative stand-alone technique to analyze the size, shape, and arrangement of magnetosomes in cells cultivated under different growth conditions. By applying a simple and robust fitting procedure based on spheres aligned in linear chains, it is demonstrated that the SAXS data sets contain information on both the diameter of the inorganic crystal and the protein-rich magnetosome membrane. The analyses corroborate a narrow particle size distribution with an overall magnetosome radius of 19 nm in Magnetospirillum gryphiswaldense. Furthermore, the averaged distance between individual magnetosomes is determined, revealing a chain-like particle arrangement with a center-to-center distance of 53 nm. Overall, these data demonstrate that SAXS can be used as a novel stand-alone technique allowing for the at-line monitoring of magnetosome biosynthesis, thereby providing accurate information on the particle nanostructure. IMPORTANCE This study explores lab-based small-angle X-ray scattering (SAXS) as a novel quantitative stand-alone technique to monitor the size, shape, and arrangement of magnetosomes during different stages of particle biogenesis in the model organism Magnetospirillum gryphiswaldense. The SAXS data sets contain volume-averaged, statistically accurate information on both the diameter of the inorganic nanocrystal and the enveloping protein-rich magnetosome membrane. As a robust and nondestructive in situ technique, SAXS can provide new insights into the physicochemical steps involved in the biosynthesis of magnetosome nanoparticles as well as their assembly into well-ordered chains. The proposed fit model can easily be adapted to account for different particle shapes and arrangements produced by other strains of magnetotactic bacteria, thus rendering SAXS a highly versatile method.

Get full-text (via PubEx)

Stereotactic radiation treatment planning and follow-up studies involving fused multimodality imaging

Journal of Neurosurgery ◽

10.3171/sup.2004.101.supplement3.0326 ◽

2004 ◽

Vol 101 (Supplement3) ◽

pp. 326-333 ◽

Cited By ~ 7

Author(s):

Klaus D. Hamm ◽

Gunnar Surber ◽

Michael Schmücking ◽

Reinhard E. Wurm ◽

Rene Aschenbach ◽

...

Keyword(s):

Image Fusion ◽

Treatment Planning ◽

Radiation Treatment ◽

Data Sets ◽

Slice Thickness ◽

Radiation Treatment Planning ◽

Content Type ◽

Follow Up Studies ◽

Fine Print

Object. Innovative new software solutions may enable image fusion to produce the desired data superposition for precise target definition and follow-up studies in radiosurgery/stereotactic radiotherapy in patients with intracranial lesions. The aim is to integrate the anatomical and functional information completely into the radiation treatment planning and to achieve an exact comparison for follow-up examinations. Special conditions and advantages of BrainLAB's fully automatic image fusion system are evaluated and described for this purpose. Methods. In 458 patients, the radiation treatment planning and some follow-up studies were performed using an automatic image fusion technique involving the use of different imaging modalities. Each fusion was visually checked and corrected as necessary. The computerized tomography (CT) scans for radiation treatment planning (slice thickness 1.25 mm), as well as stereotactic angiography for arteriovenous malformations, were acquired using head fixation with stereotactic arc or, in the case of stereotactic radiotherapy, with a relocatable stereotactic mask. Different magnetic resonance (MR) imaging sequences (T1, T2, and fluid-attenuated inversion-recovery images) and positron emission tomography (PET) scans were obtained without head fixation. Fusion results and the effects on radiation treatment planning and follow-up studies were analyzed. The precision level of the results of the automatic fusion depended primarily on the image quality, especially the slice thickness and the field homogeneity when using MR images, as well as on patient movement during data acquisition. Fully automated image fusion of different MR, CT, and PET studies was performed for each patient. Only in a few cases was it necessary to correct the fusion manually after visual evaluation. These corrections were minor and did not materially affect treatment planning. High-quality fusion of thin slices of a region of interest with a complete head data set could be performed easily. The target volume for radiation treatment planning could be accurately delineated using multimodal information provided by CT, MR, angiography, and PET studies. The fusion of follow-up image data sets yielded results that could be successfully compared and quantitatively evaluated. Conclusions. Depending on the quality of the originally acquired image, automated image fusion can be a very valuable tool, allowing for fast (∼ 1–2 minute) and precise fusion of all relevant data sets. Fused multimodality imaging improves the target volume definition for radiation treatment planning. High-quality follow-up image data sets should be acquired for image fusion to provide exactly comparable slices and volumetric results that will contribute to quality contol.

Get full-text (via PubEx)

Predictive and Descriptive CoMFA Models: The Effect of Variable Selection

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180212162028 ◽

2018 ◽

Vol 21 (2) ◽

pp. 117-124 ◽

Cited By ~ 4

Author(s):

Bakhtyar Sepehri ◽

Nematollah Omidikia ◽

Mohsen Kompany-Zareh ◽

Raouf Ghavami

Keyword(s):

Variable Selection ◽

Predictive Power ◽

Selection Method ◽

Data Sets ◽

Data Set ◽

Comfa Model ◽

Variable Selection Method

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.

Get full-text (via PubEx)

Mean reversion in corporate leverage: evidence from India

Managerial Finance ◽

10.1108/mf-09-2018-0425 ◽

2019 ◽

Vol 45 (9) ◽

pp. 1183-1198

Author(s):

Gaurav S. Chauhan ◽

Pradip Banerjee

Keyword(s):

Capital Structure ◽

Emerging Market ◽

Simulated Data ◽

Mean Reversion ◽

Developed Countries ◽

Data Sets ◽

Debt Ratio ◽

Testing Strategy ◽

Content Type ◽

Financing Behavior

Purpose Recent papers on target capital structure show that debt ratio seems to vary widely in space and time, implying that the functional specifications of target debt ratios are of little empirical use. Further, target behavior cannot be adjudged correctly using debt ratios, as they could revert due to mechanical reasons. The purpose of this paper is to develop an alternative testing strategy to test the target capital structure. Design/methodology/approach The authors make use of a major “shock” to the debt ratios as an event and think of a subsequent reversion as a movement toward a mean or target debt ratio. By doing this, the authors no longer need to identify target debt ratios as a function of firm-specific variables or any other rigid functional form. Findings Similar to the broad empirical evidence in developed economies, there is no perceptible and systematic mean reversion by Indian firms. However, unlike developed countries, proportionate usage of debt to finance firms’ marginal financing deficits is extensive; equity is used rather sparingly. Research limitations/implications The trade-off theory could be convincingly refuted at least for the emerging market of India. The paper here stimulated further research on finding reasons for specific financing behavior of emerging market firms. Practical implications The results show that the firms’ financing choices are not only depending on their own firm’s specific variables but also on the financial markets in which they operate. Originality/value This study attempts to assess mean reversion in debt ratios in a unique but reassuring manner. The results are confirmed by extensive calibration of the testing strategy using simulated data sets.

Get full-text (via PubEx)

Fast and robust identity-by-descent inference with the templated positional Burrows-Wheeler transform

Molecular Biology and Evolution ◽

10.1093/molbev/msaa328 ◽

2020 ◽

Author(s):

William A Freyman ◽

Kimberly F McManus ◽

Suyash S Shringarpure ◽

Ethan M Jewett ◽

Katarzyna Bryc ◽

...

Keyword(s):

Isolation By Distance ◽

False Negative ◽

Segment Length ◽

Data Sets ◽

Haplotype Sharing ◽

Binary File ◽

Inference Algorithms ◽

Out Of Sample ◽

Massive Scale ◽

Burrows Wheeler Transform

Abstract Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer (DTC) genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale datasets with millions of samples. Furthermore we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for non-commercial use in the code repository https://github.com/23andMe/phasedibd.

Get full-text (via PubEx)

Involvement of RpoN in Regulating Bacterial Arsenite Oxidation

Applied and Environmental Microbiology ◽

10.1128/aem.00238-12 ◽

2012 ◽

Vol 78 (16) ◽

pp. 5638-5645 ◽

Cited By ~ 23

Author(s):

Yoon-Suk Kang ◽

Brian Bothner ◽

Christopher Rensing ◽

Timothy R. McDermott

Keyword(s):

Binding Site ◽

Sigma Factor ◽

Model Organism ◽

Coding Region ◽

Rt Pcr ◽

Obvious Effect ◽

Content Type ◽

Definitive Evidence ◽

Relative Contribution ◽

Insertional Inactivation

ABSTRACTIn this study with the model organismAgrobacterium tumefaciens, we used a combination oflacZgene fusions, reverse transcriptase PCR (RT-PCR), and deletion and insertional inactivation mutations to show unambiguously that the alternative sigma factor RpoN participates in the regulation of AsIIIoxidation. A deletion mutation that removed the RpoN binding site from theaioBApromoter and anaacC3(gentamicin resistance) cassette insertional inactivation of therpoNcoding region eliminatedaioBAexpression and AsIIIoxidation, althoughrpoNexpression was not related to cell exposure to AsIII. Putative RpoN binding sites were identified throughout the genome and, as examples, included promoters foraioB,phoB1,pstS1,dctA,glnA,glnB, andflgBthat were examined by using qualitative RT-PCR andlacZreporter fusions to assess the relative contribution of RpoN to their transcription. The expressions ofaioBanddctAin the wild-type strain were considerably enhanced in cells exposed to AsIII, and both genes were silent in therpoN::aacC3mutant regardless of AsIII. The expression level ofglnAwas not influenced by AsIIIbut was reduced (but not silent) in therpoN::aacC3mutant and further reduced in the mutant under N starvation conditions. TherpoN::aacC3mutation had no obvious effect on the expression ofglnB,pstS1,phoB1, orflgB. These experiments provide definitive evidence to document the requirement of RpoN for AsIIIoxidation but also illustrate that the presence of a consensus RpoN binding site does not necessarily link the associated gene with regulation by AsIIIor by this sigma factor.

Get full-text (via PubEx)

Evaluation of the Carba NP Test for Rapid Detection of Carbapenemase-Producing Enterobacteriaceae and Pseudomonas aeruginosa

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.00878-13 ◽

2013 ◽

Vol 57 (9) ◽

pp. 4578-4580 ◽

Cited By ~ 147

Author(s):

Nathalie Tijet ◽

David Boyd ◽

Samir N. Patel ◽

Michael R. Mulvey ◽

Roberto G. Melano

Keyword(s):

Pseudomonas Aeruginosa ◽

Positive Predictive Value ◽

Negative Predictive Value ◽

Predictive Value ◽

Rapid Detection ◽

False Negative ◽

Bacterial Extract ◽

Content Type ◽

Negative Results ◽

False Negative Results

ABSTRACTThe Carba NP test was evaluated against a panel of 244 carbapenemase- and non-carbapenemase-producingEnterobacteriaceaeandPseudomonas aeruginosaisolates. We confirmed the 100% specificity and positive predictive value of the test, but the sensitivity and negative predictive value were 72.5% and 69.2%, respectively, and increased to 80% and 77.3%, respectively, using a more concentrated bacterial extract. False-negative results were associated with mucoid strains or linked to enzymes with low carbapenemase activity, particularly OXA-48-like, which has emerged globally in enterobacteria.

Get full-text (via PubEx)

Lactose-Inducible System for Metabolic Engineering of Clostridium ljungdahlii

Applied and Environmental Microbiology ◽

10.1128/aem.03666-13 ◽

2014 ◽

Vol 80 (8) ◽

pp. 2410-2416 ◽

Cited By ~ 65

Author(s):

Areen Banerjee ◽

Ching Leang ◽

Toshiyuki Ueki ◽

Kelly P. Nevin ◽

Derek R. Lovley

Keyword(s):

Ethanol Production ◽

Genetic Manipulation ◽

Electron Flow ◽

Model Organism ◽

Inducible Expression ◽

Carbon Flow ◽

Wild Type ◽

Content Type ◽

Microbial Electrosynthesis ◽

Clostridium Ljungdahlii

ABSTRACTThe development of tools for genetic manipulation ofClostridium ljungdahliihas increased its attractiveness as a chassis for autotrophic production of organic commodities and biofuels from syngas and microbial electrosynthesis and established it as a model organism for the study of the basic physiology of acetogenesis. In an attempt to expand the genetic toolbox forC. ljungdahlii, the possibility of adapting a lactose-inducible system for gene expression, previously reported forClostridium perfringens, was investigated. The plasmid pAH2, originally developed forC. perfringenswith agusAreporter gene, functioned as an effective lactose-inducible system inC. ljungdahlii. Lactose induction ofC. ljungdahliicontaining pB1, in which the gene for the aldehyde/alcohol dehydrogenase AdhE1 was downstream of the lactose-inducible promoter, increased expression ofadhE130-fold over the wild-type level, increasing ethanol production 1.5-fold, with a corresponding decrease in acetate production. Lactose-inducible expression ofadhE1in a strain in whichadhE1and theadhE1homologadhE2had been deleted from the chromosome restored ethanol production to levels comparable to those in the wild-type strain. Inducing expression ofadhE2similarly failed to restore ethanol production, suggesting thatadhE1is the homolog responsible for ethanol production. Lactose-inducible expression of the four heterologous genes necessary to convert acetyl coenzyme A (acetyl-CoA) to acetone diverted ca. 60% of carbon flow to acetone production during growth on fructose, and 25% of carbon flow went to acetone when carbon monoxide was the electron donor. These studies demonstrate that the lactose-inducible system described here will be useful for redirecting carbon and electron flow for the biosynthesis of products more valuable than acetate. Furthermore, this tool should aid in optimizing microbial electrosynthesis and for basic studies on the physiology of acetogenesis.

Get full-text (via PubEx)