general feature format
Recently Published Documents


TOTAL DOCUMENTS

5
(FIVE YEARS 4)

H-INDEX

2
(FIVE YEARS 2)

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 304 ◽  
Author(s):  
Geo Pertea ◽  
Mihaela Pertea

Summary: GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license  (https://github.com/gpertea/gffread, https://github.com/gpertea/gffcompare).


2020 ◽  
Vol 36 (18) ◽  
pp. 4810-4812
Author(s):  
Qingxi Meng ◽  
Idoia Ochoa ◽  
Mikel Hernaez

Abstract Motivation Sequencing data are often summarized at different annotation levels for further analysis, generally using the general feature format (GFF) or its descendants, gene transfer format (GTF) and GFF3. Existing utilities for accessing these files, like gffutils and gffread, do not focus on reducing the storage space, significantly increasing it in some cases. We propose GPress, a framework for querying GFF files in a compressed form. GPress can also incorporate and compress expression files from both bulk and single-cell RNA-Seq experiments, supporting simultaneous queries on both the GFF and expression files. In brief, GPress applies transformations to the data which are then compressed with the general lossless compressor BSC. To support queries, GPress compresses the data in blocks and creates several index tables for fast retrieval. Results We tested GPress on several GFF files of different organisms, and showed that it achieves on average a 61% reduction in size with respect to gzip (the current de facto compressor for GFF files) while being able to retrieve all annotations for a given identifier or a range of coordinates in a few seconds (when run in a common laptop). In contrast, gffutils provides faster retrieval but doubles the size of the GFF files. When additionally linking an expression file, we show that GPress can reduce its size by more than 68% when compared to gzip (for both bulk and single-cell RNA-Seq experiments), while still retrieving the information within seconds. Finally, applying BSC to the data streams generated by GPress instead of to the original file shows a size reduction of more than 44% on average. Availability and implementation GPress is freely available at https://github.com/qm2/gpress. Supplementary information Supplementary data are available at Bioinformatics online.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 304 ◽  
Author(s):  
Geo Pertea ◽  
Mihaela Pertea

Summary: GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license  (https://github.com/gpertea/gffread, https://github.com/gpertea/gffcompare).


2019 ◽  
Author(s):  
Qingxi Meng ◽  
Idoia Ochoa ◽  
Mikel Hernaez

1Abstract1.1MotivationSequencing data are often summarized at different annotation levels for further analysis. The general feature format (GFF) and its descendants, the gene transfer format (GTF) and GFF3, are the most commonly used data formats for genomic annotations. These files are extensively updated, queried and shared, and hence as the number of generated GFF files increases, efficient data storage and retrieval are becoming increasingly important. Existing GFF utilities for accessing these files, like gffutils and gffread, do not focus on reducing the storage space, significantly increasing it in some cases. Hence, we propose GPress, a framework for querying GFF files in a compressed form. In addition, GPress can also incorporate and compress feature expression files, supporting simultaneous queries on both files.1.2ResultsWe tested GPress on several GFF files of different organisms, and showed that it achieves on average a 98% reduction in size, while being able to retrieve all annotations for a given identifier or a range of coordinates in a few seconds. For example, on a Human GFF file, GPress can find all items with a unique identifier in 2.47 seconds and all items with coordinates within the range of 1,000 to 100,000 in 4.61 seconds. In contrast, gffutils provides faster retrieval but doubles the size of the GFF files. When additionally linking an expression file, we show that GPress can reduce the size of the expression file by more than 92%, while still retrieving the information within seconds. GPress is freely available at https://github.com/qm2/gpress.


Sign in / Sign up

Export Citation Format

Share Document