scholarly journals Maize GO Annotation - Methods, Evaluation, and Review (maize-GAMER)

2017 ◽  
Author(s):  
Kokulapalan Wimalanathan ◽  
Iddo Friedberg ◽  
Carson M. Andorf ◽  
Carolyn J. Lawrence-Dill

1SummaryWe created a new high-coverage, robust, and reproducible functional annotation of maize protein coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein coding genes, respectively, this study provides annotations for 100% of the genes. We also compared the quality of our newly-derived annotations with the existing Gramene and Phytozome functional annotation sets by comparing all three to a manually annotated gold standard set of 1,619 genes where annotations were primarily inferred from direct assay or mutant phenotype. Evaluations based on the gold standard indicate that our new annotation set is measurably more accurate than those from Phytozome and Gramene. To derive this new high-coverage, high-confidence annotation set we used sequence-similarity and protein-domain-presence methods as well as mixed-method pipelines that developed for the Critical Assessment of Function Annotation (CAFA) challenge. Our project to improve maize annotations is called maize-GAMER (GO Annotation Method, Evaluation, and Review) and the newly-derived annotations are accessible via MaizeGDB (http://download.maizegdb.org/maize-GAMER) and CyVerse (B73 RefGen_v3 5b+ at doi: doi.org/10.7946/P2S62P and B73 RefGen_v4 Zm00001d.2 at doi: doi.org/10.7946/P2M925).

2016 ◽  
Vol 4 (4) ◽  
Author(s):  
Welkin H. Pope ◽  
Anshika Bandyopadhyay ◽  
Meghan L. Carlton ◽  
Meghan T. Kane ◽  
Niyati J. Panchal ◽  
...  

Gordonia bacteriophage Yvonnetastic was isolated from soil in Pittsburgh, PA, using Gordonia terrae 3612 as a host. Yvonnetastic has siphoviral morphology and a genome of 98,136 bp, with 198 predicted protein-coding genes and five tRNA genes. Yvonnetastic does not share substantial sequence similarity with other sequenced bacteriophage genomes.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1527
Author(s):  
Kun Li ◽  
Juanjuan Liu ◽  
Zhibo Zeng ◽  
Muhammad Fakhar-e-Alam Kulyar ◽  
Yaping Wang ◽  
...  

Probiotic bacteria are receiving increased attention due to the potential benefits to their hosts. Plateau yaks have resistance against diseases and stress, which is potentially related to their inner probiotics. To uncover the potential functional genes of yak probiotics, we sequenced the whole genome of Lactobacillus sakei (L. sakei). The results showed that the genome length of L. sakei was 1.99 Mbp, with 1943 protein coding genes (21 rRNA, 65 tRNA, and 1 tmRNA). There were three plasmids found in this bacteria, with 88 protein coding genes. EggNOG annotation uncovered that the L. sakei genes were found to belong to J (translation, ribosomal structure, and biogenesis), L (replication, recombination, and repair), G (carbohydrate transport and metabolism), and K (transcription). GO annotation showed that most of the L. sakei genes were related to cellular processes, metabolic processes, biological regulation, localization, response to stimulus, and organization or biogenesis of cellular components. CAZy annotation found that there were 123 CAZys in the L. sakei genome, with glycosyl transferases and glycoside hydrolases. Our results revealed the genome characteristics of L. sakei, which may give insight into the future employment of this probiotic bacterium for its functional benefits.


2002 ◽  
Vol 10 (04) ◽  
pp. 381-407 ◽  
Author(s):  
VLADIMIR A. KUZNETSOV ◽  
VALERY V. PICKALOV ◽  
OLEG V. SENKO ◽  
GARY D. KNOTT

Motivation: Obtaining accurate estimates of the numbers of protein-coding genes and protein domains in a proteome, and the number of protein domains in nature is a daunting challenge. Computational analysis of the protein domain sets in the proteomes of many species allows us to estimate these numbers and to find their evolution relationships. Results: We have analyzed the distributions of the number of occurrences of protein domains in sample proteomes of the 70 fully sequenced genome organisms of three major kingdoms of life: Archaea, Bacteria and Eukaryota. We found that a large fraction of the identified distinct protein domains (i.e., unique domains and homologous domain families) in these 70 proteomes (1051 (23%) out of 4493) are found in at least one organism in each of these kingdoms of life and that 43 (1%) of these domains are common to all the 70 organisms. All the observed domain occurrence frequency distributions for these 70 proteomes are well fitted by a family of Pareto-like functions, associated with the steady state distributions of a linear Markov random process. We present explicit formulas that accurately predict the number of distinct protein domains and the number of protein-coding genes for a given organism as functions of the number of non-redundant domain-to-protein links in the proteomes. These functions allows us to predict that there are 42,740, 27,900, and 21,200 protein-coding genes/open reading frames in the human, A. thaliana, and mouse genomes, respectively. We also estimate that there are 5271, 2955, and 4915 distinct protein domains in the human, A. thaliana, and mouse proteomes, respectively, and about 5500 distinct protein domains in the entire "proteome world".


2020 ◽  
Vol 6 (2) ◽  
pp. 15 ◽  
Author(s):  
Lucas Maciel ◽  
David Morales-Vicente ◽  
Sergio Verjovski-Almeida

Schistosoma japonicum is a flatworm that causes schistosomiasis, a neglected tropical disease. S. japonicum RNA-Seq analyses has been previously reported in the literature on females and males obtained during sexual maturation from 14 to 28 days post-infection in mouse, resulting in the identification of protein-coding genes and pathways, whose expression levels were related to sexual development. However, this work did not include an analysis of long non-coding RNAs (lncRNAs). Here, we applied a pipeline to identify and annotate lncRNAs in 66 S. japonicum RNA-Seq publicly available libraries, from different life-cycle stages. We also performed co-expression analyses to find stage-specific lncRNAs possibly related to sexual maturation. We identified 12,291 S. japonicum expressed lncRNAs. Sequence similarity search and synteny conservation indicated that some 14% of S. japonicum intergenic lncRNAs have synteny conservation with S. mansoni intergenic lncRNAs. Co-expression analyses showed that lncRNAs and protein-coding genes in S. japonicum males and females have a dynamic co-expression throughout sexual maturation, showing differential expression between the sexes; the protein-coding genes were related to the nervous system development, lipid and drug metabolism, and overall parasite survival. Co-expression pattern suggests that lncRNAs possibly regulate these processes or are regulated by the same activation program as that of protein-coding genes.


2019 ◽  
Author(s):  
Carson W. Allan ◽  
Luciano M. Matzkin

AbstractBackgroundRelationships between an organism and its environment can be fundamental in the understanding how populations change over time and species arise. Local ecological conditions can shape variation at multiple levels, among these are the evolutionary history and trajectories of coding genes. This study examines the rate of molecular evolution at protein-coding genes throughout the genome in response to host adaptation in the cactophilicDrosophila mojavensis. These insects are intimately associated with cactus necroses, developing as larvae and feeding as adults in these necrotic tissues.Drosophila mojavensisis composed of four isolated populations across the deserts of western North America and each population has adapted to utilize different cacti that are chemically, nutritionally, and structurally distinct.ResultsHigh coverage Illumina sequencing was performed on three previously unsequenced populations ofD. mojavensis. Genomes were assembled using the previously sequenced genome ofD. mojavensisfrom Santa Catalina Island (USA) as a template. Protein coding genes were aligned across all four populations and rates of protein evolution were determined for all loci using a several approaches.ConclusionsLoci that exhibited elevated rates of molecular evolution tended to be shorter, have fewer exons, low expression, be transcriptionally responsive to cactus host use and have fixed expression differences across the four cactus host populations. Fast evolving genes were involved with metabolism, detoxification, chemosensory reception, reproduction and behavior. Results of this study gives insight into the process and the genomic consequences of local ecological adaptation.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Carson W. Allan ◽  
Luciano M. Matzkin

Abstract Background Relationships between an organism and its environment can be fundamental in the understanding how populations change over time and species arise. Local ecological conditions can shape variation at multiple levels, among these are the evolutionary history and trajectories of coding genes. This study examines the rate of molecular evolution at protein-coding genes throughout the genome in response to host adaptation in the cactophilic Drosophila mojavensis. These insects are intimately associated with cactus necroses, developing as larvae and feeding as adults in these necrotic tissues. Drosophila mojavensis is composed of four isolated populations across the deserts of western North America and each population has adapted to utilize different cacti that are chemically, nutritionally, and structurally distinct. Results High coverage Illumina sequencing was performed on three previously unsequenced populations of D. mojavensis. Genomes were assembled using the previously sequenced genome of D. mojavensis from Santa Catalina Island (USA) as a template. Protein coding genes were aligned across all four populations and rates of protein evolution were determined for all loci using a several approaches. Conclusions Loci that exhibited elevated rates of molecular evolution tend to be shorter, have fewer exons, low expression, be transcriptionally responsive to cactus host use and have fixed expression differences across the four cactus host populations. Fast evolving genes were involved with metabolism, detoxification, chemosensory reception, reproduction and behavior. Results of this study give insight into the process and the genomic consequences of local ecological adaptation.


Author(s):  
I. Zwir ◽  
C. Del-Val ◽  
M. Hintsanen ◽  
K. M. Cloninger ◽  
R. Romero-Zaliz ◽  
...  

AbstractThe genetic basis for the emergence of creativity in modern humans remains a mystery despite sequencing the genomes of chimpanzees and Neanderthals, our closest hominid relatives. Data-driven methods allowed us to uncover networks of genes distinguishing the three major systems of modern human personality and adaptability: emotional reactivity, self-control, and self-awareness. Now we have identified which of these genes are present in chimpanzees and Neanderthals. We replicated our findings in separate analyses of three high-coverage genomes of Neanderthals. We found that Neanderthals had nearly the same genes for emotional reactivity as chimpanzees, and they were intermediate between modern humans and chimpanzees in their numbers of genes for both self-control and self-awareness. 95% of the 267 genes we found only in modern humans were not protein-coding, including many long-non-coding RNAs in the self-awareness network. These genes may have arisen by positive selection for the characteristics of human well-being and behavioral modernity, including creativity, prosocial behavior, and healthy longevity. The genes that cluster in association with those found only in modern humans are over-expressed in brain regions involved in human self-awareness and creativity, including late-myelinating and phylogenetically recent regions of neocortex for autobiographical memory in frontal, parietal, and temporal regions, as well as related components of cortico-thalamo-ponto-cerebellar-cortical and cortico-striato-cortical loops. We conclude that modern humans have more than 200 unique non-protein-coding genes regulating co-expression of many more protein-coding genes in coordinated networks that underlie their capacities for self-awareness, creativity, prosocial behavior, and healthy longevity, which are not found in chimpanzees or Neanderthals.


2021 ◽  
Author(s):  
Chengfeng Yang ◽  
Qinzhi Su ◽  
Min Tang ◽  
Shiqi Luo ◽  
Hao Zheng ◽  
...  

An in-depth understanding of microbial function and the division of ecological niches requires accurate delineation and identification of microbes at a fine taxonomic resolution. Microbial phylotypes are typically defined using a 97% small subunit (16S) rRNA threshold. However, increasing evidence has demonstrated the ubiquitous presence of taxonomic units of distinct functions within phylotypes. These so-called sequence-discrete populations (SDPs) have used to be mainly delineated by disjunct sequence similarity at the whole-genome level. However, gene markers that could accurately identify and quantify SDPs are lacking in microbial community studies. Here we developed a pipeline to screen single-copy protein-coding genes that could accurately characterize SDP diversity via amplicon sequencing of microbial communities. Fifteen candidate marker genes were evaluated using three criteria (extent of sequence divergence, phylogenetic accuracy, and conservation of primer regions) and the selected genes were subject to test the efficiency in differentiating SDPs within Gilliamella, a core honeybee gut microbial phylotype, as a proof-of-concept. The results showed that the 16S V4 region failed to report accurate SDP diversities due to low taxonomic resolution and changing copy numbers. In contrast, the single-copy genes recommended by our pipeline were able to successfully quantify Gilliamella SDPs for both mock samples and honeybee guts, with results highly consistent with those of metagenomics. The pipeline developed in this study is expected to identify single-copy protein coding genes capable of accurately quantifying diverse bacterial communities at the SDP level.


2016 ◽  
Author(s):  
Nikki E Freed ◽  
Dirk Bumann ◽  
Olin K Silander

Gene essentiality - whether or not a gene is necessary for cell growth - is a fundamental component of gene function. It is not well established how quickly gene essentiality can change, as few studies have compared empirical measures of essentiality between closely related organisms. Here we present the results of a Tn-seq experiment designed to detect essential protein coding genes in the bacterial pathogen Shigella flexneri 2a 2457T on a genome-wide scale. Superficial analysis of this data suggested that 451 protein-coding genes in this Shigella strain are critical for robust cellular growth on rich media. Comparison of this set of genes with a gold-standard data set of essential genes in the closely related Escherichia coli K12 BW25113 suggested that an excessive number of genes appeared essential in Shigella but non-essential in E. coli. Importantly, and in converse to this comparison, we found no genes that were essential in E. coli and non-essential in Shigella, suggesting that many genes were artefactually inferred as essential in Shigella. Controlling for such artefacts resulted in a much smaller set of discrepant genes. Among these, we identified three sets of functionally related genes, two of which have previously been implicated as critical for Shigella growth, but which are dispensable for E. coli growth. The data presented here highlight the small number of protein coding genes for which we have strong evidence that their essentiality status differs between the closely related bacterial taxa E. coli and Shigella. A set of genes involved in acetate utilization provides a canonical example. These results leave open the possibility of developing strain-specific antibiotic treatments targeting such differentially essential genes, but suggest that such opportunities may be rare in closely related bacteria.


Sign in / Sign up

Export Citation Format

Share Document