ifCNV: a novel isolation-forest-based package to detect copy number variations from NGS datasets
Copy number variations (CNVs) are an essential component of genetic variation distributed across large parts of the human genome. CNV detection from next-generation sequencing data and artificial intelligence algorithms has progressed in recent years. However, only a few tools have taken advantage of machine learning algorithms for CNV detection. The most developed approach is to use a reference dataset to compare with the samples of interest, and it is well known that selecting appropriate normal samples represents a challenging task which dramatically influences the precision of results in all CNV-detecting tools. With careful consideration of these issues, we propose here ifCNV, a new software based on isolation forests that creates its own reference, available in R and python with customisable parameters. ifCNV combines artificial intelligence using two isolation forests and a comprehensive scoring method to faithfully detect CNVs among various samples. It was validated using datasets from diverse origins, and it exhibits high sensitivity, specificity and accuracy. ifCNV is a publicly available open-source software that allows the detection of CNVs in many clinical situations.