CNVkit-RNA: Copy number inference from RNA-Sequencing data
AbstractRNA-sequencing is most commonly used to measure gene expression, but it is possible to extract genotypic information from RNA-sequencing data, too. Point mutations and translocations can be detected when they occur in expressed genes, however, there are few software solutions to infer copy number information from RNA-sequencing data. This is because a gene’s expression is dictated by a number of variables, including, but not limited to, copy number variation. Here, we report new functionalities within the software package CNVkit that enable copy number inference from RNA-sequencing data. First, CNVkit removes technical variation in gene expression associated with GC-content and transcript length. Next, CNVkit assigns a weight, dictated by several variables, to each transcript with the net effect of preferentially inferring copy number from highly and stably expressed genes. We benchmarked our approach on 105 melanomas from The Cancer Genome Atlas project and observed a high degree of concordance (R = 0.739) between our estimates and those from array comparative genomic hybridization (aCGH) on the same samples. After initial configuration, the software requires few inputs, is able to process a batch of up to 100 samples in less than ten minutes, and can be used in conjunction with pre-existing features of CNVkit, including visualization tools. Overall, we present a rapid, user-friendly software solution to infer copy number information from gene expression data.