Identification and Utilization of Copy Number Information for Correcting Hi-C Contact Map of Cancer Cell Line
AbstractMotivationHi-C and its variant techniques have been developed to capture the spatial organization of chromatin. Normalization of Hi-C contact maps is essential for accurate modeling and interpretation of genome-wide chromatin conformation. Most Hi-C correction methods are originally developed for normal cell lines and mainly target systematic biases. In contrast, cancer genomes carry multi-level copy number variations (CNVs). Copy number influences interaction frequency between genomic loci. Therefore, CNV-driven bias needs to be corrected for generating euploid-equivalent chromatin contact maps.ResultsWe developed HiCNAtra framework that extracts read depth (RD) signal from Hi-C or 3C-seq reads to generate the high-resolution CNV profile and use this information to correct the contact map. We proposed the “entire restriction fragment” counting for better estimation of the RD signal and generation of CNV profiles. HiCNAtra integrates CNV information along with other systematic biases for explicitly correcting the interaction matrix using Poisson regression model. We demonstrated that RD estimation of HiCNAtra recapitulates the whole-genome sequencing (WGS)-derived coverage signal of the same cell line. Benchmarking against OneD method (only explicit method to target CNV bias) showed that HiCNAtra fared better in eliminating the impact of CNV on the contact maps.Availability and implementationHiCNAtra is an open source software implemented in MATLAB and is available at https://github.com/AISKhalil/HiCNAtra.