scholarly journals Hayai-Annotation Plants: an ultra-fast and comprehensive functional gene annotation system in plants

2019 ◽  
Vol 35 (21) ◽  
pp. 4427-4429 ◽  
Author(s):  
Andrea Ghelfi ◽  
Kenta Shirasawa ◽  
Hideki Hirakawa ◽  
Sachiko Isobe

Abstract Summary Hayai-Annotation Plants is a browser-based interface for an ultra-fast and accurate functional gene annotation system for plant species using R. The pipeline combines the sequence-similarity searches, using USEARCH against UniProtKB (taxonomy Embryophyta), with a functional annotation step. Hayai-Annotation Plants provides five layers of annotation: i) protein name; ii) gene ontology terms consisting of its three main domains (Biological Process, Molecular Function and Cellular Component); iii) enzyme commission number; iv) protein existence level; and v) evidence type. It implements a new algorithm that gives priority to protein existence level to propagate GO and EC information and annotated Arabidopsis thaliana representative peptide sequences (Araport11) within 5 min at the PC level. Availability and implementation The software is implemented in R and runs on Macintosh and Linux systems. It is freely available at https://github.com/kdri-genomics/Hayai-Annotation-Plants under the GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Andrea Ghelfi ◽  
Kenta Shirasawa ◽  
Hideki Hirakawa ◽  
Sachiko Isobe

SummaryHayai-Annotation Plants is a browser-based interface for an ultra-fast and accurate gene annotation system for plant species using R. The pipeline combines the sequence-similarity searches, using USEARCH against UniProtKB (taxonomy Embryophyta), with a functional annotation step. Hayai-Annotation Plants provides five layers of annotation: 1) gene name; 2) gene ontology terms consisting of its three main domains (Biological Process, Molecular Function, and Cellular Component); 3) enzyme commission number; 4) protein existence level; 5) and evidence type. In regard to speed and accuracy, Hayai-Annotation Plants annotated Arabidopsis thaliana (Araport11, representative peptide sequences) within five minutes with an accuracy of 96.4 %.Availability and ImplementationThe software is implemented in R and runs on Macintosh and Linux systems. It is freely available at https://github.com/kdri-genomics/Hayai-Annotation-Plants under the GPLv3 license.


PLoS Genetics ◽  
2012 ◽  
Vol 8 (11) ◽  
pp. e1003064 ◽  
Author(s):  
Astrid Vieler ◽  
Guangxi Wu ◽  
Chia-Hong Tsai ◽  
Blair Bullard ◽  
Adam J. Cornish ◽  
...  

PLoS Genetics ◽  
2017 ◽  
Vol 13 (5) ◽  
pp. e1006802 ◽  
Author(s):  
Astrid Vieler ◽  
Guangxi Wu ◽  
Chia-Hong Tsai ◽  
Blair Bullard ◽  
Adam J. Cornish ◽  
...  

2016 ◽  
Author(s):  
Luca Ambrosino ◽  
Hamed Bostan ◽  
Valentino Ruggieri ◽  
Maria Luisa Chiusano

Motivation. Even after years from the first completion of genomes by sequencing, comparative genomics still remains a challenge, also enhanced by the availability of numerous draft genomes with still poor annotation quality. The detection of ortholog genes between different species is a key approach for comparative genomics. For example, ortholog gene detection may support investigations on mechanisms that shaped the organization of the genomes, highlighting on gain or loss of function and on gene annotation. On the other hand, the detection of paralog genes is fundamental for understanding the evolutionary mechanisms that drove gene function innovation and support gene families analyses. Here we report on the gene comparison between two distantly related plants, Solanum lycopersicum (Tomato) (The Tomato Genome Consortium 2012) and Vitis vinifera (Grapevine) (Jaillon et al. 2007), considered as economically important species from asterids and rosids clades, respectively. The strategy was accompanied by integration of multilevel analyses, from domain investigations to expression profiling, to get to the most reliable results and to offer powerful resources, in order to understand different useful aspects of plant evolution and physiology and to dissect traits and molecular aspects that could provide novel tools for agriculture applications and biotechnologies. Methods. In order to predict best putative orthologs and paralogs between Tomato and Grapevine, and to overcome possible annotation issues, all-against-all sequence similarity searches between genes, mRNAs and proteins collections of both species were performed. A Bidirectional Best Hit approach was implemented to detect the best orthologs between the two species. Moreover we developed a dedicated algorithm in Python programming language able to define more extended alignments between mRNA sequences. NetworkX package (Hagberg et al. 2008) was used to define networks of paralogs and orthologs. Proteins domain prediction was carried out on the entire Tomato and Grapevine protein collection by using InterProScan program (Jones et al. 2014). The enzyme classification was obtained by sequence similarity searches between Tomato and Grapevine mRNA collections and the entire UniProt reviewed protein collection (UniProt consortium 2015). The metabolic pathways associated to the detected enzymes were identified exploiting the KEGG Database (Kanehisa and Goto 2000). Expression level of three developmental stages of Tomato (2 cm fruit, breaker and mature red) and the corresponding stages of Grapevine (post-setting, veraison, mature berry) was defined on the basis of the iTAG loci (Shearer et al. 2014) and v1 vitis loci, respectively. The expression was normalized by Reads Per Kilobases per Million (RPKM) for each tissue/stage. Abstract truncated at 3,000 characters - the full version is available in the pdf file


2019 ◽  
Author(s):  
Catherine Badel ◽  
Violette Da Cunha ◽  
Ryan Catchpole ◽  
Patrick Forterre ◽  
Jacques Oberto

Abstract Motivation Comparative plasmid genome analyses require complex tools, the manipulation of large numbers of sequences and constitute a daunting task for the wet bench experimentalist. Dedicated plasmid databases are sparse, only comprise bacterial plasmids and provide exclusively access to sequence similarity searches. Results We have developed Web-Assisted Symbolic Plasmid Synteny (WASPS), a web service granting protein and DNA sequence similarity searches against a database comprising all completely sequenced natural plasmids from bacterial, archaeal and eukaryal origin. This database pre-calculates orthologous protein clustering and enables WASPS to generate fully resolved plasmid synteny maps in real time using internal and user-provided DNA sequences. Availability and implementation WASPS queries befit all current browsers such as Firefox, Edge or Safari while the best functionality is achieved with Chrome. Internet Explorer is not supported. WASPS is freely accessible at https://archaea.i2bc.paris-saclay.fr/wasps/. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3594-3596 ◽  
Author(s):  
Cédric R Weber ◽  
Rahmad Akbar ◽  
Alexander Yermanos ◽  
Milena Pavlović ◽  
Igor Snapkov ◽  
...  

Abstract Summary B- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full-length variable region immune receptor sequences by tuning the following immune receptor features: (i) species and chain type (BCR, TCR, single and paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis, such as germline gene annotation, diversity and overlap estimation, sequence similarity, network architecture, clustering analysis and machine learning methods for motif detection. Availability and implementation The package is available via https://github.com/GreiffLab/immuneSIM and on CRAN at https://cran.r-project.org/web/packages/immuneSIM. The documentation is hosted at https://immuneSIM.readthedocs.io. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document