scholarly journals Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations

mSystems ◽  
2020 ◽  
Vol 5 (4) ◽  
Author(s):  
Murilo Henrique Anzolini Cassiano ◽  
Rafael Silva-Rocha

The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives.

2020 ◽  
Author(s):  
Murilo Henrique Anzolini Cassiano ◽  
Rafael Silva-Rocha

AbstractBackgroundThe promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massive mapping of promoter elements, we still mainly relay on bioinformatic tools to predict such elements in bacterial genomes. Additionally, despite many different prediction tools have become popular to identify bacterial promoters, there is no systematic comparison of such tools.ResultsHere, we performed a systematic comparison between several widely used promoter prediction tools (BPROM, bTSSfinder, BacPP, CNNProm, IBBP, Virtual Footprint, IPro70-FMWin, 70ProPred, iPromoter-2L and MULTiPly) using well-defined sequence data sets and standardized metrics to determine how well those tools performed related to each other. For this, we used datasets of experimentally validated promoters from Escherichia coli and a control dataset composed by randomly generated sequences with similar nucleotide distributions. We compared the performance of the tools using metrics such as specificity, sensibility, accuracy and Matthews Correlation Coefficient (MCC). We show that the widely used BPROM presented the worse performance among compared tools, while four tools (CNNProm, IPro70-FMWin, 70ProPreda and iPromoter-2L) offered high predictive power. From these, iPro70-FMWin exhibited the best results for most of the metrics used.ConclusionsTherefore, we exploit here some potentials and limitations of available tools and hope future works can be built upon our effort to systematically characterize such quite useful class of bioinformatics tools.


2017 ◽  
Vol 117 (10) ◽  
pp. 2325-2339
Author(s):  
Fuzan Chen ◽  
Harris Wu ◽  
Runliang Dou ◽  
Minqiang Li

Purpose The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification. Design/methodology/approach A classification approach based on class-dependent feature subspace (CFS) is proposed. CFS is a class-dependent integration of a support vector machine (SVM) classifier and associated discriminative features. For each class, our genetic algorithm (GA)-based approach evolves the best subset of discriminative features and SVM classifier simultaneously. To guarantee convergence and efficiency, the authors customize the GA in terms of encoding strategy, fitness evaluation, and genetic operators. Findings Experimental studies demonstrated that the proposed CFS-based approach is superior to other state-of-the-art classification algorithms on UCI data sets in terms of both concise interpretation and predictive power for high-dimensional data. Research limitations/implications UCI data sets rather than real industrial data are used to evaluate the proposed approach. In addition, only single-label classification is addressed in the study. Practical implications The proposed method not only constructs an accurate classification model but also obtains a compact combination of discriminative features. It is helpful for business makers to get a concise understanding of the high-dimensional data. Originality/value The authors propose a compact and effective classification approach for high-dimensional data. Instead of the same feature subset for all the classes, the proposed CFS-based approach obtains the optimal subset of discriminative feature and SVM classifier for each class. The proposed approach enhances both interpretability and predictive power for high-dimensional data.


2019 ◽  
Vol 85 (24) ◽  
Author(s):  
Sabine Rosenfeldt ◽  
Cornelius N. Riese ◽  
Frank Mickoleit ◽  
Dirk Schüler ◽  
Anna S. Schenk

ABSTRACT Magnetosomes are membrane-enveloped single-domain ferromagnetic nanoparticles enabling the navigation of magnetotactic bacteria along magnetic field lines. Strict control over each step of biomineralization generates particles of high crystallinity, strong magnetization, and remarkable uniformity in size and shape, which is particularly interesting for many biomedical and biotechnological applications. However, to understand the physicochemical processes involved in magnetite biomineralization, close and precise monitoring of particle production is required. Commonly used techniques, such as transmission electron microscopy (TEM) or Fe measurements, allow only for semiquantitative assessment of the magnetosome formation without routinely revealing quantitative structural information. In this study, lab-based small-angle X-ray scattering (SAXS) is explored as a means to monitor the different stages of magnetosome biogenesis in the model organism Magnetospirillum gryphiswaldense. SAXS is evaluated as a quantitative stand-alone technique to analyze the size, shape, and arrangement of magnetosomes in cells cultivated under different growth conditions. By applying a simple and robust fitting procedure based on spheres aligned in linear chains, it is demonstrated that the SAXS data sets contain information on both the diameter of the inorganic crystal and the protein-rich magnetosome membrane. The analyses corroborate a narrow particle size distribution with an overall magnetosome radius of 19 nm in Magnetospirillum gryphiswaldense. Furthermore, the averaged distance between individual magnetosomes is determined, revealing a chain-like particle arrangement with a center-to-center distance of 53 nm. Overall, these data demonstrate that SAXS can be used as a novel stand-alone technique allowing for the at-line monitoring of magnetosome biosynthesis, thereby providing accurate information on the particle nanostructure. IMPORTANCE This study explores lab-based small-angle X-ray scattering (SAXS) as a novel quantitative stand-alone technique to monitor the size, shape, and arrangement of magnetosomes during different stages of particle biogenesis in the model organism Magnetospirillum gryphiswaldense. The SAXS data sets contain volume-averaged, statistically accurate information on both the diameter of the inorganic nanocrystal and the enveloping protein-rich magnetosome membrane. As a robust and nondestructive in situ technique, SAXS can provide new insights into the physicochemical steps involved in the biosynthesis of magnetosome nanoparticles as well as their assembly into well-ordered chains. The proposed fit model can easily be adapted to account for different particle shapes and arrangements produced by other strains of magnetotactic bacteria, thus rendering SAXS a highly versatile method.


2004 ◽  
Vol 101 (Supplement3) ◽  
pp. 326-333 ◽  
Author(s):  
Klaus D. Hamm ◽  
Gunnar Surber ◽  
Michael Schmücking ◽  
Reinhard E. Wurm ◽  
Rene Aschenbach ◽  
...  

Object. Innovative new software solutions may enable image fusion to produce the desired data superposition for precise target definition and follow-up studies in radiosurgery/stereotactic radiotherapy in patients with intracranial lesions. The aim is to integrate the anatomical and functional information completely into the radiation treatment planning and to achieve an exact comparison for follow-up examinations. Special conditions and advantages of BrainLAB's fully automatic image fusion system are evaluated and described for this purpose. Methods. In 458 patients, the radiation treatment planning and some follow-up studies were performed using an automatic image fusion technique involving the use of different imaging modalities. Each fusion was visually checked and corrected as necessary. The computerized tomography (CT) scans for radiation treatment planning (slice thickness 1.25 mm), as well as stereotactic angiography for arteriovenous malformations, were acquired using head fixation with stereotactic arc or, in the case of stereotactic radiotherapy, with a relocatable stereotactic mask. Different magnetic resonance (MR) imaging sequences (T1, T2, and fluid-attenuated inversion-recovery images) and positron emission tomography (PET) scans were obtained without head fixation. Fusion results and the effects on radiation treatment planning and follow-up studies were analyzed. The precision level of the results of the automatic fusion depended primarily on the image quality, especially the slice thickness and the field homogeneity when using MR images, as well as on patient movement during data acquisition. Fully automated image fusion of different MR, CT, and PET studies was performed for each patient. Only in a few cases was it necessary to correct the fusion manually after visual evaluation. These corrections were minor and did not materially affect treatment planning. High-quality fusion of thin slices of a region of interest with a complete head data set could be performed easily. The target volume for radiation treatment planning could be accurately delineated using multimodal information provided by CT, MR, angiography, and PET studies. The fusion of follow-up image data sets yielded results that could be successfully compared and quantitatively evaluated. Conclusions. Depending on the quality of the originally acquired image, automated image fusion can be a very valuable tool, allowing for fast (∼ 1–2 minute) and precise fusion of all relevant data sets. Fused multimodality imaging improves the target volume definition for radiation treatment planning. High-quality follow-up image data sets should be acquired for image fusion to provide exactly comparable slices and volumetric results that will contribute to quality contol.


2018 ◽  
Vol 21 (2) ◽  
pp. 117-124 ◽  
Author(s):  
Bakhtyar Sepehri ◽  
Nematollah Omidikia ◽  
Mohsen Kompany-Zareh ◽  
Raouf Ghavami

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.


2019 ◽  
Vol 45 (9) ◽  
pp. 1183-1198
Author(s):  
Gaurav S. Chauhan ◽  
Pradip Banerjee

Purpose Recent papers on target capital structure show that debt ratio seems to vary widely in space and time, implying that the functional specifications of target debt ratios are of little empirical use. Further, target behavior cannot be adjudged correctly using debt ratios, as they could revert due to mechanical reasons. The purpose of this paper is to develop an alternative testing strategy to test the target capital structure. Design/methodology/approach The authors make use of a major “shock” to the debt ratios as an event and think of a subsequent reversion as a movement toward a mean or target debt ratio. By doing this, the authors no longer need to identify target debt ratios as a function of firm-specific variables or any other rigid functional form. Findings Similar to the broad empirical evidence in developed economies, there is no perceptible and systematic mean reversion by Indian firms. However, unlike developed countries, proportionate usage of debt to finance firms’ marginal financing deficits is extensive; equity is used rather sparingly. Research limitations/implications The trade-off theory could be convincingly refuted at least for the emerging market of India. The paper here stimulated further research on finding reasons for specific financing behavior of emerging market firms. Practical implications The results show that the firms’ financing choices are not only depending on their own firm’s specific variables but also on the financial markets in which they operate. Originality/value This study attempts to assess mean reversion in debt ratios in a unique but reassuring manner. The results are confirmed by extensive calibration of the testing strategy using simulated data sets.


Author(s):  
William A Freyman ◽  
Kimberly F McManus ◽  
Suyash S Shringarpure ◽  
Ethan M Jewett ◽  
Katarzyna Bryc ◽  
...  

Abstract Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer (DTC) genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale datasets with millions of samples. Furthermore we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for non-commercial use in the code repository https://github.com/23andMe/phasedibd.


2012 ◽  
Vol 78 (16) ◽  
pp. 5638-5645 ◽  
Author(s):  
Yoon-Suk Kang ◽  
Brian Bothner ◽  
Christopher Rensing ◽  
Timothy R. McDermott

ABSTRACTIn this study with the model organismAgrobacterium tumefaciens, we used a combination oflacZgene fusions, reverse transcriptase PCR (RT-PCR), and deletion and insertional inactivation mutations to show unambiguously that the alternative sigma factor RpoN participates in the regulation of AsIIIoxidation. A deletion mutation that removed the RpoN binding site from theaioBApromoter and anaacC3(gentamicin resistance) cassette insertional inactivation of therpoNcoding region eliminatedaioBAexpression and AsIIIoxidation, althoughrpoNexpression was not related to cell exposure to AsIII. Putative RpoN binding sites were identified throughout the genome and, as examples, included promoters foraioB,phoB1,pstS1,dctA,glnA,glnB, andflgBthat were examined by using qualitative RT-PCR andlacZreporter fusions to assess the relative contribution of RpoN to their transcription. The expressions ofaioBanddctAin the wild-type strain were considerably enhanced in cells exposed to AsIII, and both genes were silent in therpoN::aacC3mutant regardless of AsIII. The expression level ofglnAwas not influenced by AsIIIbut was reduced (but not silent) in therpoN::aacC3mutant and further reduced in the mutant under N starvation conditions. TherpoN::aacC3mutation had no obvious effect on the expression ofglnB,pstS1,phoB1, orflgB. These experiments provide definitive evidence to document the requirement of RpoN for AsIIIoxidation but also illustrate that the presence of a consensus RpoN binding site does not necessarily link the associated gene with regulation by AsIIIor by this sigma factor.


2013 ◽  
Vol 57 (9) ◽  
pp. 4578-4580 ◽  
Author(s):  
Nathalie Tijet ◽  
David Boyd ◽  
Samir N. Patel ◽  
Michael R. Mulvey ◽  
Roberto G. Melano

ABSTRACTThe Carba NP test was evaluated against a panel of 244 carbapenemase- and non-carbapenemase-producingEnterobacteriaceaeandPseudomonas aeruginosaisolates. We confirmed the 100% specificity and positive predictive value of the test, but the sensitivity and negative predictive value were 72.5% and 69.2%, respectively, and increased to 80% and 77.3%, respectively, using a more concentrated bacterial extract. False-negative results were associated with mucoid strains or linked to enzymes with low carbapenemase activity, particularly OXA-48-like, which has emerged globally in enterobacteria.


2014 ◽  
Vol 80 (8) ◽  
pp. 2410-2416 ◽  
Author(s):  
Areen Banerjee ◽  
Ching Leang ◽  
Toshiyuki Ueki ◽  
Kelly P. Nevin ◽  
Derek R. Lovley

ABSTRACTThe development of tools for genetic manipulation ofClostridium ljungdahliihas increased its attractiveness as a chassis for autotrophic production of organic commodities and biofuels from syngas and microbial electrosynthesis and established it as a model organism for the study of the basic physiology of acetogenesis. In an attempt to expand the genetic toolbox forC. ljungdahlii, the possibility of adapting a lactose-inducible system for gene expression, previously reported forClostridium perfringens, was investigated. The plasmid pAH2, originally developed forC. perfringenswith agusAreporter gene, functioned as an effective lactose-inducible system inC. ljungdahlii. Lactose induction ofC. ljungdahliicontaining pB1, in which the gene for the aldehyde/alcohol dehydrogenase AdhE1 was downstream of the lactose-inducible promoter, increased expression ofadhE130-fold over the wild-type level, increasing ethanol production 1.5-fold, with a corresponding decrease in acetate production. Lactose-inducible expression ofadhE1in a strain in whichadhE1and theadhE1homologadhE2had been deleted from the chromosome restored ethanol production to levels comparable to those in the wild-type strain. Inducing expression ofadhE2similarly failed to restore ethanol production, suggesting thatadhE1is the homolog responsible for ethanol production. Lactose-inducible expression of the four heterologous genes necessary to convert acetyl coenzyme A (acetyl-CoA) to acetone diverted ca. 60% of carbon flow to acetone production during growth on fructose, and 25% of carbon flow went to acetone when carbon monoxide was the electron donor. These studies demonstrate that the lactose-inducible system described here will be useful for redirecting carbon and electron flow for the biosynthesis of products more valuable than acetate. Furthermore, this tool should aid in optimizing microbial electrosynthesis and for basic studies on the physiology of acetogenesis.


Sign in / Sign up

Export Citation Format

Share Document