PyRanges: efficient comparison of genomic intervals in Python
Abstract Summary Complex genomic analyses often use sequences of simple set operations like intersection, overlap and nearest on genomic intervals. These operations, coupled with some custom programming, allow a wide range of analyses to be performed. To this end, we have written PyRanges, a data structure for representing and manipulating genomic intervals and their associated data in Python. Run single threaded on binary set operations, PyRanges is in median 2.3–9.6 times faster than the popular R GenomicRanges library and is equally memory efficient; run multi-threaded on 8 cores, our library is up to 123 times faster. PyRanges is therefore ideally suited both for individual analyses and as a foundation for future genomic libraries in Python. Availability and implementation PyRanges is available as open source under the MIT license at https://github.com/biocore-NTNU/pyranges and the documentation exists at https://biocore-NTNU.github.io/pyranges/ Supplementary information Supplementary data are available at Bioinformatics online.