scholarly journals COG database update: focus on microbial diversity, model organisms, and widespread pathogens

2020 ◽  
Vol 49 (D1) ◽  
pp. D274-D281 ◽  
Author(s):  
Michael Y Galperin ◽  
Yuri I Wolf ◽  
Kira S Makarova ◽  
Roberto Vera Alvarez ◽  
David Landsman ◽  
...  

Abstract The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI’s gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations.

2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Yupeng Li ◽  
Gangao Wu ◽  
Yu Shang ◽  
Yue Qi ◽  
Xue Wang ◽  
...  

Abstract Background Interstitial lung diseases (ILDs), a diverse group of diffuse lung diseases, mainly affect the lung parenchyma. The low-throughput ‘omics’ technologies (genomics, transcriptomics, proteomics) and relative drug information have begun to reshaped our understanding of ILDs, whereas, these data are scattered among massive references and are difficult to be fully exploited. Therefore, we manually mined and summarized these data at a database (ILDGDB, http://ildgdb.org/) and will continue to update it in the future. Main body The current version of ILDGDB incorporates 2018 entries representing 20 ILDs and over 600 genes obtained from over 3000 articles in four species. Each entry contains detailed information, including species, disease type, detailed description of gene (e.g. official symbol of gene), and the original reference etc. ILDGDB is free, and provides a user-friendly web page. Users can easily search for genes of interest, view their expression pattern and detailed information, manage genes sets and submit novel ILDs-gene association. Conclusion The main principle behind ILDGDB’s design is to provide an exploratory platform, with minimum filtering and interpretation, while making the presentation of the data very accessible, which will provide great help for researchers to decipher gene mechanisms and improve the prevention, diagnosis and therapy of ILDs.


2004 ◽  
Vol 5 (3) ◽  
pp. 281-284 ◽  
Author(s):  
Zuzana Swigonova ◽  
Jinsheng Lai ◽  
Jianxin Ma ◽  
Wusirika Ramakrishna ◽  
Victor Llaca ◽  
...  

Data from cytological and genetic mapping studies suggest that maize arose as a tetraploid. Two previous studies investigating the most likely mode of maize origin arrived at different conclusions. Gaut and Doebley [7] proposed a segmental allotetraploid origin of the maize genome and estimated that the two maize progenitors diverged at 20.5 million years ago (mya). In a similar study, using larger data set, Brendel and colleagues (quoted in [8]) suggested a single genome duplication at 16 mya. One of the key components of such analyses is to examine sequence divergence among strictly orthologous genes. In order to identify such genes, Lai and colleagues [10] sequenced five duplicated chromosomal regions from the maize genome and the orthologous counterparts from the sorghum genome. They also identified the orthologous regions in rice. Using positional information of genetic components, they identified 11 orthologous genes across the two duplicated regions of maize, and the sorghum and rice regions. Swigonovaet al. [12] analyzed the 11 orthologues, and showed that all five maize chromosomal regions duplicated at the same time, supporting a tetraploid origin of maize, and that the two maize progenitors diverged from each other at about the same time as each of them diverged from sorghum, about 11.9 mya.


2018 ◽  
Author(s):  
Elin Videvall ◽  
Se Jin Song ◽  
Hanna M. Bensch ◽  
Maria Strandh ◽  
Anel Engelbrecht ◽  
...  

AbstractThe development of gut microbiota during ontogeny in vertebrates is emerging as an important process influencing physiology, immune system, health, and adult fitness. However, we have little knowledge of how the gut microbiome is colonised and develops in non-model organisms, and to what extent microbial diversity and specific taxa influence changes in fitness-related traits. Here, we used 16S rRNA gene sequencing to describe the successional development of the faecal microbiota in juvenile ostriches (Struthio camelus; n = 71) over their first three months of life, during which time a five-fold difference in weight was observed. We found a gradual increase in microbial diversity with age, an overall convergence in community composition among individuals, multiple colonisation and extinction events, and major taxonomic shifts coinciding with the cessation of yolk absorption. In addition, we discovered significant but complex associations between juvenile growth and microbial diversity, and identified distinct bacterial groups that had positive (Bacteroidaceae) and negative (Enterobacteriaceae, Enterococcaceae, Lactobacillaceae) correlations with the growth of individuals at specific ages. These results have broad implications for our understanding of the development of gut microbiota and its association with juvenile growth.


2018 ◽  
Author(s):  
Adrian M Altenhoff ◽  
Jeremy Levy ◽  
Magdalena Zarowiecki ◽  
Bartłomiej Tomiczek ◽  
Alex Warwick Vesztrocy ◽  
...  

AbstractGenomes and transcriptomes are now typically sequenced by individual labs, but analysing them often remains challenging. One essential step in many analyses lies in identifying orthologs—corresponding genes across multiple species—but this is far from trivial. The OMA (Orthologous MAtrix) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and pre-computed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of the Lophotrochozoa, a challenging clade within the Protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in non-model organisms. OMA Standalone is available at http://omabrowser.org/standalone under the permissible open source Mozilla Public License Version 2.0.


2019 ◽  
Vol 42 ◽  
Author(s):  
Nicole M. Baran

AbstractReductionist thinking in neuroscience is manifest in the widespread use of animal models of neuropsychiatric disorders. Broader investigations of diverse behaviors in non-model organisms and longer-term study of the mechanisms of plasticity will yield fundamental insights into the neurobiological, developmental, genetic, and environmental factors contributing to the “massively multifactorial system networks” which go awry in mental disorders.


2003 ◽  
Vol 39 ◽  
pp. 11-24 ◽  
Author(s):  
Justin V McCarthy

Apoptosis is an evolutionarily conserved process used by multicellular organisms to developmentally regulate cell number or to eliminate cells that are potentially detrimental to the organism. The large diversity of regulators of apoptosis in mammalian cells and their numerous interactions complicate the analysis of their individual functions, particularly in development. The remarkable conservation of apoptotic mechanisms across species has allowed the genetic pathways of apoptosis determined in lower species, such as the nematode Caenorhabditis elegans and the fruitfly Drosophila melanogaster, to act as models for understanding the biology of apoptosis in mammalian cells. Though many components of the apoptotic pathway are conserved between species, the use of additional model organisms has revealed several important differences and supports the use of model organisms in deciphering complex biological processes such as apoptosis.


2002 ◽  
Vol 69 ◽  
pp. 117-134 ◽  
Author(s):  
Stuart M. Haslam ◽  
David Gems ◽  
Howard R. Morris ◽  
Anne Dell

There is no doubt that the immense amount of information that is being generated by the initial sequencing and secondary interrogation of various genomes will change the face of glycobiological research. However, a major area of concern is that detailed structural knowledge of the ultimate products of genes that are identified as being involved in glycoconjugate biosynthesis is still limited. This is illustrated clearly by the nematode worm Caenorhabditis elegans, which was the first multicellular organism to have its entire genome sequenced. To date, only limited structural data on the glycosylated molecules of this organism have been reported. Our laboratory is addressing this problem by performing detailed MS structural characterization of the N-linked glycans of C. elegans; high-mannose structures dominate, with only minor amounts of complex-type structures. Novel, highly fucosylated truncated structures are also present which are difucosylated on the proximal N-acetylglucosamine of the chitobiose core as well as containing unusual Fucα1–2Gal1–2Man as peripheral structures. The implications of these results in terms of the identification of ligands for genomically predicted lectins and potential glycosyltransferases are discussed in this chapter. Current knowledge on the glycomes of other model organisms such as Dictyostelium discoideum, Saccharomyces cerevisiae and Drosophila melanogaster is also discussed briefly.


Sign in / Sign up

Export Citation Format

Share Document