scholarly journals GapMind for Carbon Sources: Automated annotations of catabolic pathways

2021 ◽  
Author(s):  
Morgan N Price ◽  
Adam M Deutschbauer ◽  
Adam P. Arkin

GapMind for carbon sources is an automated tool for annotating catabolic pathways in bacterial and archaeal genomes. GapMind includes 62 compounds and identifies potential transporters and enzymes by their similarity to experimentally-characterized proteins. To improve GapMind's coverage, we used high-throughput genetic data from 29 bacteria and systematically examined the gaps. We identified novel pathways or enzymes for the utilization of glucosamine, citrulline, myo-inositol, lactose, and phenylacetate, and we annotated 299 diverged enzymes and transporters. We also curated 125 proteins from published reports. For the 29 bacteria with genetic data, GapMind finds high-confidence paths for 85% of utilized carbon sources. In diverse bacteria and archaea, 38% of utilized carbon sources have high-confidence paths, which was improved from 27% by incorporating the fitness-based annotations and our curation. GapMind for carbon sources is available as a web server (http://papers.genomics.lbl.gov/carbon) and takes just 30 seconds for the typical genome.

2004 ◽  
Vol 186 (5) ◽  
pp. 1337-1344 ◽  
Author(s):  
Gracia Morales ◽  
Juan Francisco Linares ◽  
Ana Beloso ◽  
Juan Pablo Albar ◽  
José Luis Martínez ◽  
...  

ABSTRACT The Crc protein is involved in the repression of several catabolic pathways for the assimilation of some sugars, nitrogenated compounds, and hydrocarbons in Pseudomonas putida and Pseudomonas aeruginosa when other preferred carbon sources are present in the culture medium (catabolic repression). Crc appears to be a component of a signal transduction pathway modulating carbon metabolism in pseudomonads, although its mode of action is unknown. To better understand the role of Crc, the proteome profile of two otherwise isogenic P. putida strains containing either a wild-type or an inactivated crc allele was compared. The results showed that Crc is involved in the catabolic repression of the hpd and hmgA genes from the homogentisate pathway, one of the central catabolic pathways for aromatic compounds that is used to assimilate intermediates derived from the oxidation of phenylalanine, tyrosine, and several aromatic hydrocarbons. This led us to analyze whether Crc also regulates the expression of the other central catabolic pathways for aromatic compounds present in P. putida. It was found that genes required to assimilate benzoate through the catechol pathway (benA and catBCA) and 4-OH-benzoate through the protocatechuate pathway (pobA and pcaHG) are also negatively modulated by Crc. However, the pathway for phenylacetate appeared to be unaffected by Crc. These results expand the influence of Crc to pathways used to assimilate several aromatic compounds, which highlights its importance as a master regulator of carbon metabolism in P. putida.


2015 ◽  
Vol 81 (12) ◽  
pp. 3914-3924 ◽  
Author(s):  
Danilo Pérez-Pantoja ◽  
Pablo Leiva-Novoa ◽  
Raúl A. Donoso ◽  
Cedric Little ◽  
Margarita Godoy ◽  
...  

ABSTRACTCupriavidus pinatubonensisJMP134, like many other environmental bacteria, uses a range of aromatic compounds as carbon sources. Previous reports have shown a preference for benzoate when this bacterium grows on binary mixtures composed of this aromatic compound and 4-hydroxybenzoate or phenol. However, this observation has not been extended to other aromatic mixtures resembling a more archetypal context. We carried out a systematic study on the substrate preference ofC. pinatubonensisJMP134 growing on representative aromatic compounds channeled through different catabolic pathways described in aerobic bacteria. Growth tests of nearly the entire set of binary combinations and in mixtures composed of 5 or 6 aromatic components showed that benzoate and phenol were always the preferred and deferred growth substrates, respectively. This pattern was supported by kinetic analyses that showed shorter times to initiate consumption of benzoate in aromatic compound mixtures. Gene expression analysis by real-time reverse transcription-PCR (RT-PCR) showed that, in all mixtures, the repression by benzoate over other catabolic pathways was exerted mainly at the transcriptional level. Additionally, inhibition of benzoate catabolism suggests that its multiple repressive actions are not mediated by a sole mechanism, as suggested by dissimilar requirements of benzoate degradation for effective repression in different aromatic compound mixtures. The hegemonic preference for benzoate over multiple aromatic carbon sources is not explained on the basis of growth rate and/or biomass yield on each single substrate or by obvious chemical or metabolic properties of these aromatic compounds.


mBio ◽  
2010 ◽  
Vol 1 (3) ◽  
Author(s):  
Vladimir Trifonov ◽  
Raul Rabadan

ABSTRACT Environmental metagenomic samples and samples obtained as an attempt to identify a pathogen associated with the emergence of a novel infectious disease are important sources of novel microorganisms. The low costs and high throughput of sequencing technologies are expected to allow for the genetic material in those samples to be sequenced and the genomes of the novel microorganisms to be identified by alignment to those in a database of known genomes. Yet, for various biological and technical reasons, such alignment might not always be possible. We investigate a frequency analysis technique which on one hand allows for the identification of genetic material without relying on alignment and on the other hand makes possible the discovery of nonoverlapping contigs from the same organism. The technique is based on obtaining signatures of the genetic data and defining a distance/similarity measure between signatures. More precisely, the signatures of the genetic data are the frequencies of k-mers occurring in them, with k being a natural number. We considered an entropy-based distance between signatures, similar to the Kullback-Leibler distance in information theory, and investigated its ability to categorize negative-sense single-stranded RNA (ssRNA) viral genetic data. Our conclusion is that in this viral context, the technique provides a viable way of discovering genetic relationships without relying on alignment. We envision that our approach will be applicable to other microbial genetic contexts, e.g., other types of viruses, and will be an important tool in the discovery of novel microorganisms. IMPORTANCE Multiple factors contribute to the emergence of novel infectious diseases. Implementation of effective measures against such diseases relies on the rapid identification of novel pathogens. Another important source of novel microorganisms is environmental metagenomic samples. The low costs and high throughput of sequencing technologies provide a method for the identification of novel microorganisms by sequence alignment. There are several obstacles to this method, as follows: our knowledge of biology is biased by an anthropomorphic view, microbial genomic material could be a minuscule fraction of the sample, the sequencing and enrichment technologies can be a source of errors and biases, and finally, microbes have high diversity and high evolutionary rates. As a result, novel microorganisms could have very low genetic similarity to already known genomes, and the identification by alignment could be computationally prohibitive. We investigate a frequency analysis technique which allows for the identification of novel genetic material without relying on alignment.


Parasitology ◽  
2009 ◽  
Vol 136 (12) ◽  
pp. 1633-1642 ◽  
Author(s):  
J. BARRETT

SUMMARYThis review describes some of the developments in helminth biochemistry that have taken place over the last 40 years. Since the early 1970s the main anabolic and catabolic pathways in parasitic helminths have been worked out. The mode of action of the majority of anthelmintics is now known, but in many cases the mechanisms of resistance remain elusive. Developments in helminth biochemistry have depended heavily on developments in other areas. High throughput methods such as proteomics, transcriptomics and genome sequencing are now generating vast amounts of new data. The challenge for the future is to interpret and understand the biological relevance of this new information.


2018 ◽  
Author(s):  
Brian S. Helfer ◽  
Darrell O. Ricke

AbstractHigh throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) provides additional applications for DNA forensics including identification, mixture analysis, kinship prediction, and biogeographic ancestry prediction. Public repositories of human genetic data are being rapidly generated and released, but the majorities of these samples are de-identified to protect privacy, and have little or no individual metadata such as appearance (photos), ethnicity, relatives, etc. A reference in silico dataset has been generated to enable development and testing of new DNA forensics algorithms. This dataset provides 11 million SNP profiles for individuals with defined ethnicities and family relationships spanning eight generations with admixture for a panel with 39,108 SNPs.


Author(s):  
Jimmy A McGuire ◽  
Darko D Cotoras ◽  
Brendan O'Connell ◽  
Shobi Z S Lawalata ◽  
Cynthia Y Wang-Claypool ◽  
...  

We used Massively Parallel High-Throughput Sequencing to obtain genetic data from a 145-year old holotype specimen of the flying lizard, Draco cristatellus. Obtaining genetic data from this holotype was necessary to resolve an otherwise intractable taxonomic problem involving the status of this species relative to closely related sympatric Draco species that cannot otherwise be distinguished from one another on the basis of museum specimens. Initial analyses suggested that the DNA present in the holotype sample was so degraded as to be unusable for sequencing. However, we used a specialized extraction procedure developed for highly degraded ancient DNA samples and MiSeq shotgun sequencing to obtain just enough low-coverage mitochondrial DNA (547 base pairs) to conclusively resolve the species status of the holotype as well as a second known specimen of this species. The holotype was prepared before the advent of formalin-fixation and therefore was most likely originally fixed with ethanol and never exposed to formalin. Whereas conventional wisdom suggests that formalin-fixed samples should be the most challenging for DNA sequencing, we propose that evaporation during long-term alcohol storage and consequent water-exposure may subject older ethanol-fixed museum specimens to hydrolytic damage. If so, this may pose an even greater challenge for sequencing efforts involving historical samples.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4470 ◽  
Author(s):  
Jimmy A. McGuire ◽  
Darko D. Cotoras ◽  
Brendan O’Connell ◽  
Shobi Z.S. Lawalata ◽  
Cynthia Y. Wang-Claypool ◽  
...  

We used Massively Parallel High-Throughput Sequencing to obtain genetic data from a 145-year old holotype specimen of the flying lizard, Draco cristatellus. Obtaining genetic data from this holotype was necessary to resolve an otherwise intractable taxonomic problem involving the status of this species relative to closely related sympatric Draco species that cannot otherwise be distinguished from one another on the basis of museum specimens. Initial analyses suggested that the DNA present in the holotype sample was so degraded as to be unusable for sequencing. However, we used a specialized extraction procedure developed for highly degraded ancient DNA samples and MiSeq shotgun sequencing to obtain just enough low-coverage mitochondrial DNA (721 base pairs) to conclusively resolve the species status of the holotype as well as a second known specimen of this species. The holotype was prepared before the advent of formalin-fixation and therefore was most likely originally fixed with ethanol and never exposed to formalin. Whereas conventional wisdom suggests that formalin-fixed samples should be the most challenging for DNA sequencing, we propose that evaporation during long-term alcohol storage and consequent water-exposure may subject older ethanol-fixed museum specimens to hydrolytic damage. If so, this may pose an even greater challenge for sequencing efforts involving historical samples.


2017 ◽  
Vol 5 (32) ◽  
Author(s):  
Terence S. Crofts ◽  
Bin Wang ◽  
Aaron Spivak ◽  
Tara A. Gianoulis ◽  
Kevin J. Forsberg ◽  
...  

ABSTRACT Most antibiotics are derived from the soil, but their catabolism there, which is necessary to close the antibiotic carbon cycle, remains uncharacterized. We report the first draft genome sequences of soil Proteobacteria identified for subsisting solely on β-lactams as their carbon sources. The genomes encode multiple β-lactamases, although their antibiotic catabolic pathways remain enigmatic.


2020 ◽  
Vol 48 (W1) ◽  
pp. W529-W537 ◽  
Author(s):  
Long Tian ◽  
Chengjie Huang ◽  
Reza Mazloom ◽  
Lenwood S Heath ◽  
Boris A Vinatzer

Abstract High throughput DNA sequencing in combination with efficient algorithms could provide the basis for a highly resolved, genome phylogeny-based and digital prokaryotic taxonomy. However, current taxonomic practice continues to rely on cumbersome journal publications for the description of new species, which still constitute the smallest taxonomic units. In response, we introduce LINbase, a web server that allows users to genomically circumscribe any group of prokaryotes with measurable DNA similarity and that uses the individual isolate as smallest unit. Since LINbase leverages the concept of Life Identification Numbers (LINs), which are codes assigned to individual genomes based on reciprocal average nucleotide identity, we refer to groups circumscribed in LINbase as LINgroups. Users can associate with each LINgroup a name, a short description, and a URL to a peer-reviewed publication. As soon as a LINgroup is circumscribed, any user can immediately identify query genomes as members and submit comments about the LINgroup. Most genomes currently in LINbase were imported from GenBank, but users can upload their own genome sequences as well. In conclusion, LINbase combines the resolution of LINs with the power of crowdsourcing in support of a highly resolved, genome phylogeny-based digital taxonomy. LINbase is available at http://www.LINbase.org.


Sign in / Sign up

Export Citation Format

Share Document