scholarly journals Hayai-Annotation Plants: an ultra-fast and comprehensive gene annotation system in plants

2018 ◽  
Author(s):  
Andrea Ghelfi ◽  
Kenta Shirasawa ◽  
Hideki Hirakawa ◽  
Sachiko Isobe

SummaryHayai-Annotation Plants is a browser-based interface for an ultra-fast and accurate gene annotation system for plant species using R. The pipeline combines the sequence-similarity searches, using USEARCH against UniProtKB (taxonomy Embryophyta), with a functional annotation step. Hayai-Annotation Plants provides five layers of annotation: 1) gene name; 2) gene ontology terms consisting of its three main domains (Biological Process, Molecular Function, and Cellular Component); 3) enzyme commission number; 4) protein existence level; 5) and evidence type. In regard to speed and accuracy, Hayai-Annotation Plants annotated Arabidopsis thaliana (Araport11, representative peptide sequences) within five minutes with an accuracy of 96.4 %.Availability and ImplementationThe software is implemented in R and runs on Macintosh and Linux systems. It is freely available at https://github.com/kdri-genomics/Hayai-Annotation-Plants under the GPLv3 license.

2019 ◽  
Vol 35 (21) ◽  
pp. 4427-4429 ◽  
Author(s):  
Andrea Ghelfi ◽  
Kenta Shirasawa ◽  
Hideki Hirakawa ◽  
Sachiko Isobe

Abstract Summary Hayai-Annotation Plants is a browser-based interface for an ultra-fast and accurate functional gene annotation system for plant species using R. The pipeline combines the sequence-similarity searches, using USEARCH against UniProtKB (taxonomy Embryophyta), with a functional annotation step. Hayai-Annotation Plants provides five layers of annotation: i) protein name; ii) gene ontology terms consisting of its three main domains (Biological Process, Molecular Function and Cellular Component); iii) enzyme commission number; iv) protein existence level; and v) evidence type. It implements a new algorithm that gives priority to protein existence level to propagate GO and EC information and annotated Arabidopsis thaliana representative peptide sequences (Araport11) within 5 min at the PC level. Availability and implementation The software is implemented in R and runs on Macintosh and Linux systems. It is freely available at https://github.com/kdri-genomics/Hayai-Annotation-Plants under the GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Luca Ambrosino ◽  
Hamed Bostan ◽  
Valentino Ruggieri ◽  
Maria Luisa Chiusano

Motivation. Even after years from the first completion of genomes by sequencing, comparative genomics still remains a challenge, also enhanced by the availability of numerous draft genomes with still poor annotation quality. The detection of ortholog genes between different species is a key approach for comparative genomics. For example, ortholog gene detection may support investigations on mechanisms that shaped the organization of the genomes, highlighting on gain or loss of function and on gene annotation. On the other hand, the detection of paralog genes is fundamental for understanding the evolutionary mechanisms that drove gene function innovation and support gene families analyses. Here we report on the gene comparison between two distantly related plants, Solanum lycopersicum (Tomato) (The Tomato Genome Consortium 2012) and Vitis vinifera (Grapevine) (Jaillon et al. 2007), considered as economically important species from asterids and rosids clades, respectively. The strategy was accompanied by integration of multilevel analyses, from domain investigations to expression profiling, to get to the most reliable results and to offer powerful resources, in order to understand different useful aspects of plant evolution and physiology and to dissect traits and molecular aspects that could provide novel tools for agriculture applications and biotechnologies. Methods. In order to predict best putative orthologs and paralogs between Tomato and Grapevine, and to overcome possible annotation issues, all-against-all sequence similarity searches between genes, mRNAs and proteins collections of both species were performed. A Bidirectional Best Hit approach was implemented to detect the best orthologs between the two species. Moreover we developed a dedicated algorithm in Python programming language able to define more extended alignments between mRNA sequences. NetworkX package (Hagberg et al. 2008) was used to define networks of paralogs and orthologs. Proteins domain prediction was carried out on the entire Tomato and Grapevine protein collection by using InterProScan program (Jones et al. 2014). The enzyme classification was obtained by sequence similarity searches between Tomato and Grapevine mRNA collections and the entire UniProt reviewed protein collection (UniProt consortium 2015). The metabolic pathways associated to the detected enzymes were identified exploiting the KEGG Database (Kanehisa and Goto 2000). Expression level of three developmental stages of Tomato (2 cm fruit, breaker and mature red) and the corresponding stages of Grapevine (post-setting, veraison, mature berry) was defined on the basis of the iTAG loci (Shearer et al. 2014) and v1 vitis loci, respectively. The expression was normalized by Reads Per Kilobases per Million (RPKM) for each tissue/stage. Abstract truncated at 3,000 characters - the full version is available in the pdf file


2016 ◽  
Author(s):  
Luca Ambrosino ◽  
Hamed Bostan ◽  
Valentino Ruggieri ◽  
Maria Luisa Chiusano

Motivation. Even after years from the first completion of genomes by sequencing, comparative genomics still remains a challenge, also enhanced by the availability of numerous draft genomes with still poor annotation quality. The detection of ortholog genes between different species is a key approach for comparative genomics. For example, ortholog gene detection may support investigations on mechanisms that shaped the organization of the genomes, highlighting on gain or loss of function and on gene annotation. On the other hand, the detection of paralog genes is fundamental for understanding the evolutionary mechanisms that drove gene function innovation and support gene families analyses. Here we report on the gene comparison between two distantly related plants, Solanum lycopersicum (Tomato) (The Tomato Genome Consortium 2012) and Vitis vinifera (Grapevine) (Jaillon et al. 2007), considered as economically important species from asterids and rosids clades, respectively. The strategy was accompanied by integration of multilevel analyses, from domain investigations to expression profiling, to get to the most reliable results and to offer powerful resources, in order to understand different useful aspects of plant evolution and physiology and to dissect traits and molecular aspects that could provide novel tools for agriculture applications and biotechnologies. Methods. In order to predict best putative orthologs and paralogs between Tomato and Grapevine, and to overcome possible annotation issues, all-against-all sequence similarity searches between genes, mRNAs and proteins collections of both species were performed. A Bidirectional Best Hit approach was implemented to detect the best orthologs between the two species. Moreover we developed a dedicated algorithm in Python programming language able to define more extended alignments between mRNA sequences. NetworkX package (Hagberg et al. 2008) was used to define networks of paralogs and orthologs. Proteins domain prediction was carried out on the entire Tomato and Grapevine protein collection by using InterProScan program (Jones et al. 2014). The enzyme classification was obtained by sequence similarity searches between Tomato and Grapevine mRNA collections and the entire UniProt reviewed protein collection (UniProt consortium 2015). The metabolic pathways associated to the detected enzymes were identified exploiting the KEGG Database (Kanehisa and Goto 2000). Expression level of three developmental stages of Tomato (2 cm fruit, breaker and mature red) and the corresponding stages of Grapevine (post-setting, veraison, mature berry) was defined on the basis of the iTAG loci (Shearer et al. 2014) and v1 vitis loci, respectively. The expression was normalized by Reads Per Kilobases per Million (RPKM) for each tissue/stage. Abstract truncated at 3,000 characters - the full version is available in the pdf file


Insects ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 396
Author(s):  
Natrada Mitpuangchon ◽  
Kwan Nualcharoen ◽  
Singtoe Boonrotpong ◽  
Patamarerk Engsontia

Many animal species can produce venom for defense, predation, and competition. The venom usually contains diverse peptide and protein toxins, including neurotoxins, proteolytic enzymes, protease inhibitors, and allergens. Some drugs for cancer, neurological disorders, and analgesics were developed based on animal toxin structures and functions. Several caterpillar species possess venoms that cause varying effects on humans both locally and systemically. However, toxins from only a few species have been investigated, limiting the full understanding of the Lepidoptera toxin diversity and evolution. We used the RNA-seq technique to identify toxin genes from the stinging nettle caterpillar, Parasa lepida (Cramer, 1799). We constructed a transcriptome from caterpillar urticating hairs and reported 34,968 unique transcripts. Using our toxin gene annotation pipeline, we identified 168 candidate toxin genes, including protease inhibitors, proteolytic enzymes, and allergens. The 21 P. lepida novel Knottin-like peptides, which do not show sequence similarity to any known peptide, have predicted 3D structures similar to tarantula, scorpion, and cone snail neurotoxins. We highlighted the importance of convergent evolution in the Lepidoptera toxin evolution and the possible mechanisms. This study opens a new path to understanding the hidden diversity of Lepidoptera toxins, which could be a fruitful source for developing new drugs.


2012 ◽  
Vol 13 (Suppl 4) ◽  
pp. S2 ◽  
Author(s):  
Emanuele Bramucci ◽  
Alessandro Paiardini ◽  
Francesco Bossa ◽  
Stefano Pascarella

2018 ◽  
Vol 5 (1) ◽  
pp. 170907 ◽  
Author(s):  
Dejun Ji ◽  
Bo Yang ◽  
Yongjun Li ◽  
Miaoying Cai ◽  
Wei Zhang ◽  
...  

The high-quality brush hair, or Type III brush hair, is coarse hair but with a tip and little medulla, which uniquely grows in the cervical carina of Chinese Haimen goat ( Capra hircus ). To unveil the mechanism of the formation of Type III brush hair in Haimen goats, transcriptomic RNAseq technology was used for screening of differentially expressed genes (DEGs) in the skin samples of the Type III and the non-Type III hair goats, and these DEGs were analysed by KEGG pathway analysis. The results showed that a total of 295 DEGs were obtained, mainly from three main functional types: cellular component, molecular function and biological process. These DEGs were mainly enriched in three KEGG pathways, such as protein processing in endoplasmic reticulum, MAPK, and complement and coagulation cascades. These DEGs gave hints to a possible mechanism, under which heat stress possibly initiated the formation. The study provided some useful biological information, which could give a new view about the roles of certain factors in hair growth and give hints on the mechanism of the formation of the Type III brush hair in Chinese Haimen goat.


2020 ◽  
Author(s):  
Lin Wang ◽  
Qingchun Chen ◽  
Haitao Feng ◽  
Minghu Jiang ◽  
Juxiang Huang ◽  
...  

Abstract Background: Ras suppressor protein 1 (L12535) and peptidylprolyl cis/trans isomerase NIMA-interacting 1 (PIN1) common molecular and knowledge subnetworks containing microtubule associated protein 1B-MAP1B_1 (upstream) related to cognition by references were identified in human left hemisphere, based on our established significant high expression beta-transducin repeat containing E3 ubiquitin protein ligase (BTRC)-activating downstream Gene (protein) reconstruction network inference (GRNInfer) and Database for Annotation, Visualization and Integrated Discovery (DAVID).Results: Our results show the common molecules exostosin-like glycosyltransferase 2 (EXTL2) interaction with MAP1B_1 both activating TERF1_1 with HSP90AB1 from BTRC-activating downstream GRNInfer database; The common biological process and molecular function of MAP1B_1, TERF1_1 as microtubule (MT) binding; HSP90AB1 as poly(A) RNA binding; BTRC, HSP90AB1, PIN1 as innate immune response from BTRC-activating downstream DAVID database; The common cellular component of EXTL2 at integral component of membrane; MAP1B_1, HSP90AB1, TERF1_1 at cytoplasm (CP); The common tissue distributions of L12535 and PIN1 in Prefrontal Cortex (PFC), PB cluster of differentiation (CD)14+Monocytes.Conclusions: We propose and mutual positively verify CP poly(A) RNA binding immunity via outside-in glycosyltransfer with MT of BTRC-activating L12535 and PIN1 subnetworks for cognition in PFC|CD14.


2020 ◽  
Vol 40 (11) ◽  
Author(s):  
Kevin J. McNaught ◽  
Elizabeth T. Wiles ◽  
Eric U. Selker

ABSTRACT Polycomb repressive complex 2 (PRC2) catalyzes methylation of histone H3 at lysine 27 (H3K27) in genomic regions of most eukaryotes and is critical for maintenance of the associated transcriptional repression. However, the mechanisms that shape the distribution of H3K27 methylation, such as recruitment of PRC2 to chromatin and/or stimulation of PRC2 activity, are unclear. Here, using a forward genetic approach in the model organism Neurospora crassa, we identified two alleles of a gene, NCU04278, encoding an unknown PRC2 accessory subunit (PAS). Loss of PAS resulted in losses of H3K27 methylation concentrated near the chromosome ends and derepression of a subset of associated subtelomeric genes. Immunoprecipitation followed by mass spectrometry confirmed reciprocal interactions between PAS and known PRC2 subunits, and sequence similarity searches demonstrated that PAS is not unique to N. crassa. PAS homologs likely influence the distribution of H3K27 methylation and underlying gene repression in a variety of fungal lineages.


Sign in / Sign up

Export Citation Format

Share Document