scholarly journals FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow

Genes ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1645
Author(s):  
Anna Vlasova ◽  
Toni Hermoso Pulido ◽  
Francisco Camara ◽  
Julia Ponomarenko ◽  
Roderic Guigó

Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine. The pipeline integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG. It starts from a protein sequence FASTA file and, optionally, a structural annotation file in GFF format, and produces several files, such as GO assignments, output summaries of the abovementioned programs and final annotation reports. The pipeline can be broken easily into smaller processes for the purpose of parallelization and easily deployed in a Linux computational environment, thanks to software containerization, thus helping to ensure full reproducibility.

2016 ◽  
Vol 113 (48) ◽  
pp. 13738-13743 ◽  
Author(s):  
Justin Johan Jozias van der Hooft ◽  
Joe Wandy ◽  
Michael P. Barrett ◽  
Karl E. V. Burgess ◽  
Simon Rogers

The potential of untargeted metabolomics to answer important questions across the life sciences is hindered because of a paucity of computational tools that enable extraction of key biochemically relevant information. Available tools focus on using mass spectrometry fragmentation spectra to identify molecules whose behavior suggests they are relevant to the system under study. Unfortunately, fragmentation spectra cannot identify molecules in isolation but require authentic standards or databases of known fragmented molecules. Fragmentation spectra are, however, replete with information pertaining to the biochemical processes present, much of which is currently neglected. Here, we present an analytical workflow that exploits all fragmentation data from a given experiment to extract biochemically relevant features in an unsupervised manner. We demonstrate that an algorithm originally used for text mining, latent Dirichlet allocation, can be adapted to handle metabolomics datasets. Our approach extracts biochemically relevant molecular substructures (“Mass2Motifs”) from spectra as sets of co-occurring molecular fragments and neutral losses. The analysis allows us to isolate molecular substructures, whose presence allows molecules to be grouped based on shared substructures regardless of classical spectral similarity. These substructures, in turn, support putative de novo structural annotation of molecules. Combining this spectral connectivity to orthogonal correlations (e.g., common abundance changes under system perturbation) significantly enhances our ability to provide mechanistic explanations for biological behavior.


2020 ◽  
Author(s):  
David Aciole Barbosa ◽  
Bruno C. Araújo ◽  
Giovana S. Branco ◽  
Alexandre W. S. Hilsdorf ◽  
Luiz R. Nunes ◽  
...  

ABSTRACTCobia (Rachycentron canadum) is a marine teleost species with great productive potential worldwide. However, the genomic information currently available for this species in public databases is limited. This lack of information hinders gene expression assessments, which could bring forward novel insights into the physiology, ecology, evolution, and genetics of this aquaculture species. In this study, we report the first de novo transcriptome assembly of cobia liver to improve the availability of gene sequences from this important commercial fish. Thus, Illumina sequencing of liver transcripts generated 1,761,965,794 raw reads, which were filtered into 1,652,319,304 high-quality reads. De novo assembly resulted in 101,789 unigenes and 163,096 isoforms, with an average length of 950.61 and 1617.34 nt, respectively. Comparisons against six different databases resulted in functional annotation for 125,993 of these elements (77.3%), providing relevant information regarding the genomic content of R. canadum. We trust that the availability of these functionally annotated cobia elements will likely assist future nutrigenomics and breeding programs, involving this important fish farming species.


2015 ◽  
Author(s):  
Xiaofan Zhou ◽  
David Peris ◽  
Jacek Kominek ◽  
Cletus P Kurtzman ◽  
Chris Todd Hittinger ◽  
...  

AbstractThe availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled thede novodecoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, andde novoassembly algorithms have augmented the complexity ofde novogenome sequencing projects in non-model organisms. To reduce the costs and challenges inde novogenome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silicoWholeGenomeSequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of ade novogenome sequencing project: data generation (through simulation), data quality control,de novoassembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design ofde novogenome sequencing projects and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available athttps://github.com/zhouxiaofan1983/iWGS.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gulden Olgun ◽  
Afshan Nabi ◽  
Oznur Tastan

Abstract Background While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint at a functional association. Results We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast. Conclusions NoRCE is a platform-independent, user-friendly, comprehensive R package that can be used to gain insight into the functional importance of a list of ncRNAs of any type. The tool offers flexibility to conduct the users’ preferred set of analyses by designing their own pipeline of analysis. NoRCE is available in Bioconductor and https://github.com/guldenolgun/NoRCE.


Author(s):  
José Cerca ◽  
Marius F. Maurstad ◽  
Nicolas C. Rochette ◽  
Angel G. Rivera‐Colón ◽  
Niraj Rayamajhi ◽  
...  
Keyword(s):  
De Novo ◽  

Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 1017
Author(s):  
Mohammed Bakkali ◽  
Rubén Martín-Blázquez ◽  
Mercedes Ruiz-Estévez ◽  
Manuel A. Garrido-Ramos

We sequenced the sporophyte transcriptome of Killarney fern (Vandenboschia speciosa (Willd.) G. Kunkel). In addition to being a rare endangered Macaronesian-European endemism, this species has a huge genome (10.52 Gb) as well as particular biological features and extreme ecological requirements. These characteristics, together with the systematic position of ferns among vascular plants, make it of high interest for evolutionary, conservation and functional genomics studies. The transcriptome was constructed de novo and contained 36,430 transcripts, of which 17,706 had valid BLAST hits. A total of 19,539 transcripts showed at least one of the 7362 GO terms assigned to the transcriptome, whereas 6547 transcripts showed at least one of the 1359 KEGG assigned terms. A prospective analysis of functional annotation results provided relevant insights on genes involved in important functions such as growth and development as well as physiological adaptations. In this context, a catalogue of genes involved in the genetic control of plant development, during the vegetative to reproductive transition, in stress response as well as genes coding for transcription factors is given. Altogether, this study provides a first step towards understanding the gene expression of a significant fern species and the in silico functional and comparative analyses reported here provide important data and insights for further comparative evolutionary studies in ferns and land plants in general.


Animals ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 2226
Author(s):  
Sazia Kunvar ◽  
Sylwia Czarnomska ◽  
Cino Pertoldi ◽  
Małgorzata Tokarska

The European bison is a non-model organism; thus, most of its genetic and genomic analyses have been performed using cattle-specific resources, such as BovineSNP50 BeadChip or Illumina Bovine 800 K HD Bead Chip. The problem with non-specific tools is the potential loss of evolutionary diversified information (ascertainment bias) and species-specific markers. Here, we have used a genotyping-by-sequencing (GBS) approach for genotyping 256 samples from the European bison population in Bialowieza Forest (Poland) and performed an analysis using two integrated pipelines of the STACKS software: one is de novo (without reference genome) and the other is a reference pipeline (with reference genome). Moreover, we used a reference pipeline with two different genomes, i.e., Bos taurus and European bison. Genotyping by sequencing (GBS) is a useful tool for SNP genotyping in non-model organisms due to its cost effectiveness. Our results support GBS with a reference pipeline without PCR duplicates as a powerful approach for studying the population structure and genotyping data of non-model organisms. We found more polymorphic markers in the reference pipeline in comparison to the de novo pipeline. The decreased number of SNPs from the de novo pipeline could be due to the extremely low level of heterozygosity in European bison. It has been confirmed that all the de novo/Bos taurus and Bos taurus reference pipeline obtained SNPs were unique and not included in 800 K BovineHD BeadChip.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Daniel Stribling ◽  
Peter L. Chang ◽  
Justin E. Dalton ◽  
Christopher A. Conow ◽  
Malcolm Rosenthal ◽  
...  

Abstract Objectives Arachnids have fascinating and unique biology, particularly for questions on sex differences and behavior, creating the potential for development of powerful emerging models in this group. Recent advances in genomic techniques have paved the way for a significant increase in the breadth of genomic studies in non-model organisms. One growing area of research is comparative transcriptomics. When phylogenetic relationships to model organisms are known, comparative genomic studies provide context for analysis of homologous genes and pathways. The goal of this study was to lay the groundwork for comparative transcriptomics of sex differences in the brain of wolf spiders, a non-model organism of the pyhlum Euarthropoda, by generating transcriptomes and analyzing gene expression. Data description To examine sex-differential gene expression, short read transcript sequencing and de novo transcriptome assembly were performed. Messenger RNA was isolated from brain tissue of male and female subadult and mature wolf spiders (Schizocosa ocreata). The raw data consist of sequences for the two different life stages in each sex. Computational analyses on these data include de novo transcriptome assembly and differential expression analyses. Sample-specific and combined transcriptomes, gene annotations, and differential expression results are described in this data note and are available from publicly-available databases.


Data in Brief ◽  
2020 ◽  
Vol 31 ◽  
pp. 105917
Author(s):  
Marianela Cobos ◽  
Hicler N. Rodríguez ◽  
Segundo L. Estela ◽  
Carlos G. Castro ◽  
J. Dylan Maddox ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document