Chromosomal instability (CIN) and somatic copy number alterations (SCNA) play a key role in the evolutionary process that shapes cancer genomes. SCNAs comprise many classes of clinically relevant events, such as localised amplifications, gains, losses, loss-of-heterozygosity (LOH) events, and recently discovered parallel evolutionary events revealed by multi-sample phasing. These events frequently appear jointly with whole genome doubling (WGD), a transformative event in tumour evolution, which generates tetraploid or near-tetraploid cells. WGD events are often clonal, occuring before the emergence of the most recent common ancestor, and have been associated with increased CIN, poor patient outcome and are currently being investigated as potential therapeutic targets.
While SCNAs can provide a rich source of phylogenetic information, so far no method exists for phylogenetic inference from SCNAs that includes WGD events. Here we present MEDICC2, a new phylogenetic algorithm for allele-specific SCNA data based on a minimum-evolution criterion that explicitly models clonal and subclonal WGD events and that takes parallel evolutionary events into account. MEDICC2 can identify WGD events and quantify SCNA burden in single-sample studies and infer phylogenetic trees and ancestral genomes in multi-sample scenarios. In this scenario, it accurately locates clonal and subclonal WGD events as well as parallel evolutionary events in the evolutionary history of the tumour, timing SCNAs relative to each other.
We use MEDICC2 to detect WGD events in 2778 tumours with 98.8% accuracy and show its ability to correctly place subclonal WGD events in simulated and real-world multi-sample tumours, while accurately inferring its phylogeny and parallel SCNA events. MEDICC2 is implemented in Python 3 and freely available under GPLv3 at https://bitbucket.org/schwarzlab/medicc2.