scholarly journals Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Carlos-Francisco Méndez-Cruz ◽  
Antonio Blanchet ◽  
Alan Godínez ◽  
Ignacio Arroyo-Fernández ◽  
Socorro Gama-Castro ◽  
...  

Abstract Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git).

2021 ◽  
Vol 12 ◽  
Author(s):  
Suzanne Paley ◽  
Ingrid M. Keseler ◽  
Markus Krummenacker ◽  
Peter D. Karp

Updating genome databases to reflect newly published molecular findings for an organism was hard enough when only a single strain of a given organism had been sequenced. With multiple sequenced strains now available for many organisms, the challenge has grown significantly because of the still-limited resources available for the manual curation that corrects errors and captures new knowledge. We have developed a method to automatically propagate multiple types of curated knowledge from genes and proteins in one genome database to their orthologs in uncurated databases for related strains, imposing several quality-control filters to reduce the chances of introducing errors. We have applied this method to propagate information from the highly curated EcoCyc database for Escherichia coli K–12 to databases for 480 other Escherichia coli strains in the BioCyc database collection. The increase in value and utility of the target databases after propagation is considerable. Target databases received updates for an average of 2,535 proteins each. In addition to widespread addition and regularization of gene and protein names, 97% of the target databases were improved by the addition of at least 200 new protein complexes, at least 800 new or updated reaction assignments, and at least 2,400 sets of GO annotations.


2006 ◽  
Vol 188 (21) ◽  
pp. 7449-7456 ◽  
Author(s):  
Douglas F. Browning ◽  
David J. Lee ◽  
Alan J. Wolfe ◽  
Jeffrey A. Cole ◽  
Stephen J. W. Busby

ABSTRACT The Escherichia coli K-12 nrf operon promoter can be activated fully by the FNR protein (regulator of fumarate and nitrate reduction) binding to a site centered at position −41.5. FNR-dependent transcription is suppressed by integration host factor (IHF) binding at position −54, and this suppression is counteracted by binding of the NarL or NarP response regulator at position −74.5. The E. coli acs gene is transcribed from a divergent promoter upstream from the nrf operon promoter. Transcription from the major acsP2 promoter is dependent on the cyclic AMP receptor protein and is modulated by IHF and Fis binding at multiple sites. We show that IHF binding to one of these sites, located at position −127 with respect to the nrf promoter, has a positive effect on nrf promoter activity. This activation is dependent on the face of the DNA helix, independent of IHF binding at other locations, and found only when NarL/NarP are not bound at position −74.5. Binding of NarL/NarP appears to insulate the nrf promoter from the effects of IHF. The acs-nrf regulatory region is conserved in other pathogenic E. coli strains and related enteric bacteria but differs in Salmonella enterica serovar Typhimurium.


2003 ◽  
Vol 185 (18) ◽  
pp. 5398-5407 ◽  
Author(s):  
A. P. White ◽  
D. L. Gibson ◽  
S. K. Collinson ◽  
P. A. Banser ◽  
W. W. Kay

ABSTRACT Lipopolysaccharide (LPS) O polysaccharide was identified as the principle factor impeding intercellular formation of intact thin aggregative fimbriae (Tafi) in Salmonella enterica serovar Enteritidis. The extracellular nucleation-precipitation assembly pathway for these organelles was investigated by quantifying fimbrial formation between ΔagfA (AgfA recipient) and ΔagfB (AgfA donor) cells harboring mutations in LPS (galE::Tn10) and/or cellulose (ΔbcsA) synthesis. Intercellular complementation could be detected between ΔagfA and ΔagfB strains only when both possessed the galE mutation. LPS O polysaccharide appears to be an impenetrable barrier to AgfA assembly between cells but not within individual cells. The presence of cellulose did not restrict Tafi formation between cells. Transmission electron microscopy of w + S. enterica serovar Enteritidis 3b cells revealed diffuse Tafi networks without discernible fine structure. In the absence of cellulose, however, individual Tafi fibers were clearly visible, appeared to be occasionally branched, and showed the generally distinctive appearance described for Escherichia coli K-12 curli. A third extracellular matrix component closely associated with cellulose and Tafi was detected on Western blots by using immune serum raised to whole, purified Tafi aggregates. Cellulose was required to tightly link this material to cells. Antigenically similar material was also detected in S. enterica serovar Typhimurium and one diarrheagenic E. coli isolate. Preliminary analysis indicated that this material represented an anionic, extracellular polysaccharide that was distinct from colanic acid. Therefore, Tafi in their native state appear to exist as a complex with cellulose and at least one other component.


2001 ◽  
Vol 183 (19) ◽  
pp. 5554-5561 ◽  
Author(s):  
Sean R. Murray ◽  
David Bermudes ◽  
Karim Suwwan de Felipe ◽  
K. Brooks Low

ABSTRACT Lipid A, a potent endotoxin which can cause septic shock, anchors lipopolysaccharide (LPS) into the outer leaflet of the outer membrane of gram-negative bacteria. MsbB acylates (KDO)2-(lauroyl)-lipid IV-A with myristate during lipid A biosynthesis. Reports of knockouts of the msbB gene describe effects on virulence but describe no evidence of growth defects in Escherichia coli K-12 or Salmonella. Our data confirm the general lack of growth defects in msbB E. coli K-12. In contrast, msbB Salmonella entericaserovar Typhimurium exhibits marked sensitivity to galactose-MacConkey and 6 mM EGTA media. At 37°C in Luria-Bertani (LB) broth, msbB Salmonella cells elongate, form bulges, and grow slowly.msbB Salmonella grow well on LB-no salt (LB-0) agar; however, under specific shaking conditions in LB-0 broth, manymsbB Salmonella cells lyse during exponential growth and a fraction of the cells form filaments. msbB Salmonella grow with a near-wild-type growth rate in MSB (LB-0 containing Mg2+ and Ca2+) broth (23 to 42°C). Extragenic compensatory mutations, which partially suppress the growth defects, spontaneously occur at high frequency, and mutants can be isolated on media selective for faster growing derivatives. One of the suppressor mutations maps at 19.8 centisomes and is a recessive IS10insertional mutation in somA, a gene of unknown function which corresponds to ybjX in E. coli. In addition, random Tn10 mutagenesis carried out in an unsuppressed msbB strain produced a set of Tn10inserts, not in msbB or somA, that correlate with different suppressor phenotypes. Thus, insertional mutations, insomA and other genes, can suppress the msbBphenotype.


mSphere ◽  
2017 ◽  
Vol 2 (6) ◽  
Author(s):  
Revathy Krishnamurthi ◽  
Swagatha Ghosh ◽  
Supriya Khedkar ◽  
Aswin Sai Narain Seshasayee

ABSTRACT Transcription factors in the bacterium E. coli are rarely essential, and when they are essential, they are largely toxin-antitoxin systems. While studying transcription factors encoded in horizontally acquired regions in E. coli, we realized that the protein RacR, a putative transcription factor encoded by a gene on the rac prophage, is an essential protein. Here, using genetics, biochemistry, and bioinformatics, we show that its essentiality derives from its role as a transcriptional repressor of the ydaS and ydaT genes, whose products are toxic to the cell. Unlike type II toxin-antitoxin systems in which transcriptional regulation involves complexes of the toxin and antitoxin, repression by RacR is sufficient to keep ydaS transcriptionally silent. Horizontal gene transfer is a major driving force behind the genomic diversity seen in prokaryotes. The cryptic rac prophage in Escherichia coli K-12 carries the gene for a putative transcription factor RacR, whose deletion is lethal. We have shown that the essentiality of racR in E. coli K-12 is attributed to its role in transcriptionally repressing toxin gene(s) called ydaS and ydaT, which are adjacent to and coded divergently to racR. IMPORTANCE Transcription factors in the bacterium E. coli are rarely essential, and when they are essential, they are largely toxin-antitoxin systems. While studying transcription factors encoded in horizontally acquired regions in E. coli, we realized that the protein RacR, a putative transcription factor encoded by a gene on the rac prophage, is an essential protein. Here, using genetics, biochemistry, and bioinformatics, we show that its essentiality derives from its role as a transcriptional repressor of the ydaS and ydaT genes, whose products are toxic to the cell. Unlike type II toxin-antitoxin systems in which transcriptional regulation involves complexes of the toxin and antitoxin, repression by RacR is sufficient to keep ydaS transcriptionally silent.


2001 ◽  
Vol 183 (21) ◽  
pp. 6184-6196 ◽  
Author(s):  
K. Tedin ◽  
F. Norel

ABSTRACT The growth recovery of Escherichia coli K-12 andSalmonella enterica serovar Typhimurium ΔrelAmutants were compared after nutritional downshifts requiring derepression of the branched-chain amino acid pathways. Because wild-type E. coli K-12 and S. enterica serovar Typhimurium LT2 strains are defective in the expression of the genes encoding the branch point acetohydroxy acid synthetase II (ilvGM) and III (ilvIH) isozymes, respectively, ΔrelA derivatives corrected for these mutations were also examined. Results indicate that reduced expression of the known global regulatory factors involved in branched-chain amino acid biosynthesis cannot completely explain the observed growth recovery defects of the ΔrelA strains. In the E. coli K-12 MG1655 ΔrelA background, correction of the preexisting rph-1 allele which causes pyrimidine limitations resulted in complete loss of growth recovery. S. enterica serovar Typhimurium LT2 ΔrelA strains were fully complemented by elevated basal ppGpp levels in an S. enterica serovar Typhimurium LT2 ΔrelA spoT1 mutant or in a strain harboring an RNA polymerase mutation conferring a reduced RNA chain elongation rate. The results are best explained by a dependence on the basal levels of ppGpp, which are determined byrelA-dependent changes in tRNA synthesis resulting from amino acid starvations. Expression of the branched-chain amino acid operons is suggested to require changes in the RNA chain elongation rate of the RNA polymerase, which can be achieved either by elevation of the basal ppGpp levels or, in the case of the E. coli K-12 MG1655 strain, through pyrimidine limitations which partially compensate for reduced ppGpp levels. Roles for ppGpp in branched-chain amino acid biosynthesis are discussed in terms of effects on the synthesis of known global regulatory proteins and current models for the control of global RNA synthesis by ppGpp.


2002 ◽  
Vol 70 (2) ◽  
pp. 1027-1031 ◽  
Author(s):  
Susan R. Heimer ◽  
Rod A. Welch ◽  
Nicole T. Perna ◽  
György Pósfai ◽  
Peter S. Evans ◽  
...  

ABSTRACT Recent genomic analyses of Escherichia coli O157:H7 strain EDL933 revealed two loci encoding urease gene homologues (ureDABCEFG), which are absent in nonpathogenic E. coli strain K-12. This report demonstrates that the cloned EDL933 ure gene cluster is capable of synthesizing urease in an E. coli DH5α background. However, when the gene fragment is transformed back into the native EDL933 background, the enzymatic activity of the cloned determinants is undetectable. We speculate that an unidentified trans-acting factor in enterohemorrhagic E. coli (EHEC) is responsible for this regulation of ure expression. In addition, Fur-like recognition sites are present in three independent O157:H7 isolates upstream of ureD and ureA. Enzymatic assays confirmed a difference in urease expression of cloned EHEC ure clusters in E. coli MC3100Δfur. Likewise, interruption of fur in O157:H7 isolate IN1 significantly diminished urease activity. We propose that, similar to the function of Fur in regulating the acid response of Salmonella enterica serovar Typhimurium, it modulates urease expression in EHEC, perhaps contributing to the acid tolerance of the organism.


2004 ◽  
Vol 186 (6) ◽  
pp. 1629-1637 ◽  
Author(s):  
Jeffrey A. Lewis ◽  
Alexander R. Horswill ◽  
Brian E. Schwem ◽  
Jorge C. Escalante-Semerena

ABSTRACT The genes of Salmonella enterica serovar Typhimurium LT2 encoding functions needed for the utilization of tricarballylate as a carbon and energy source were identified and their locations in the chromosome were established. Three of the tricarballylate utilization (tcu) genes, tcuABC, are organized as an operon; a fourth gene, tcuR, is located immediately 5′ to the tcuABC operon. The tcuABC operon and tcuR gene share the same direction of transcription but are independently transcribed. The tcuRABC genes are missing in the Escherichia coli K-12 chromosome. The tcuR gene is proposed to encode a regulatory protein needed for the expression of tcuABC. The tcuC gene is proposed to encode an integral membrane protein whose role is to transport tricarballylate across the cell membrane. tcuC function was sufficient to allow E. coli K-12 to grow on citrate (a tricarballylate analog) but not to allow growth of this bacterium on tricarballylate. E. coli K-12 carrying a plasmid with wild-type alleles of tcuABC grew on tricarballylate, suggesting that the functions of the TcuABC proteins were the only ones unique to S. enterica needed to catabolize tricarballylate. Analyses of the predicted amino acid sequences of the TcuAB proteins suggest that TcuA is a flavoprotein, and TcuB is likely anchored to the cell membrane and probably contains one or more Fe-S centers. The TcuB protein is proposed to work in concert with TcuA to oxidize tricarballylate to cis-aconitate, which is further catabolized via the Krebs cycle. The glyoxylate shunt is not required for growth of S. enterica on tricarballylate. A model for tricarballylate catabolism in S. enterica is proposed.


2001 ◽  
Vol 183 (23) ◽  
pp. 6943-6946 ◽  
Author(s):  
L. SaiSree ◽  
Manjula Reddy ◽  
J. Gowrishankar

ABSTRACT The radiation sensitivity of Escherichia coli B was first described more than 50 years ago, and the genetic locus responsible for the trait was subsequently identified aslon (encoding Lon protease). We now show that bothE. coli B and the first reported E. coliK-12 lon mutant, AB1899, carry IS186insertions in opposite orientations at a single site in thelon promoter region and that this site represents a natural hot spot for transposition of the insertion sequence (IS) element. Our analysis of deposited sequence data for a number of other IS186 insertion sites permitted the deductions that (i) the consensus target site sequence for IS186transposition is 5′-(G)≥4(N)3–6(C)≥4-3′, (ii) the associated host sequence duplication varies within the range of 6 to 12 bp and encompasses the N(3–6) sequence, and (iii) in a majority of instances, at least one end of the duplication is at the G-N (or N-C) junction. IS186-related sequences were absent in closely related bacterium Salmonella entericaserovar Typhimurium, indicating that this IS element is a recent acquisition in the evolutionary history of E. coli.


2003 ◽  
Vol 185 (17) ◽  
pp. 5192-5199 ◽  
Author(s):  
Akiko Ishiwa ◽  
Teruya Komano

ABSTRACT IncI1 plasmid R64 encodes a type IV pilus called a thin pilus, which includes PilV adhesins. Seven different sequences for the C-terminal segments of PilV adhesins can be produced by shufflon DNA rearrangement. The expression of the seven PilV adhesins determines the recipient specificity in liquid matings of plasmid R64. Salmonella enterica serovar Typhimurium LT2 was recognized by the PilVA′ and PilVB′ adhesins, while Escherichia coli K-12 was recognized by the PilVA′, PilVC, and PilVC′ adhesins. Lipopolysaccharide (LPS) on the surfaces of recipient cells was previously shown to be the specific receptor for the seven PilV adhesins. To identify the specific receptor structures of LPS for various PilV adhesins, R64 liquid matings were carried out with recipient cells consisting of various S. enterica serovar Typhimurium LT2 and E. coli K-12 waa mutants and their derivatives carrying various waa genes of different origins. From the mating experiments, including inhibition experiments, we propose that the GlcNAc(α1-2)Glc and Glc(α1-2)Gal structures of the LPS core of S. enterica serovar Typhimurium LT2 function as receptors for the PilVB′ and PilVC′ adhesins, respectively, while the PilVC′ receptor in the wild-type LT2 LPS core may be masked. We further propose that the GlcNAc(β1-7)Hep and Glc(α1-2)Glc structures of the LPS core of E. coli K-12 function as receptors for the PilVC and PilVC′ adhesins, respectively.


Sign in / Sign up

Export Citation Format

Share Document