Machine Learning Integrates Genomic Signatures for Subclassification Beyond Primary and Secondary Acute Myeloid Leukemia
While genomic alterations drive the pathogenesis of acute myeloid leukemia (AML), traditional classifications are largely based on morphology and prototypic genetic founder lesions define only a small proportion of AML patients. The historical subdivision of primary/de novo AML (pAML) and secondary AML (sAML) has shown to variably correlate with genetic patterns. Perhaps, the combinatorial complexity and heterogeneity of AML genomic architecture have precluded, so far, the genomic-based subclassification to identify distinct molecularly-defined subtypes more reflective of shared pathogenesis. We integrated cytogenetic and gene sequencing data from a multicenter cohort of 6,788 AML patients that were analyzed using standard and machine learning methods to generate a novel AML molecular subclassification with biological correlates corresponding to underlying pathogenesis. Standard supervised analyses resulted in modest cross-validation accuracy when attempting to use molecular patterns to predict traditional pathomorphological AML classifications. We performed unsupervised analysis by applying Bayesian Latent Class method that identified 4 unique genomic clusters of distinct prognoses. Invariant genomic features driving each cluster were extracted and resulted in 97% cross-validation accuracy when used for genomic subclassification. Subclasses of AML defined by molecular signatures overlapped current pathomorphological and clinically-defined AML subtypes. We internally and externally validated our results and share an open-access molecular classification scheme for AML patients. Although the heterogeneity inherent in the genomic changes across nearly 7,000 AML patients is too vast for traditional prediction methods, however, machine learning methods allowed for the definition of novel genomic AML subclasses indicating that traditional pathomorphological definitions may be less reflective of overlapping pathogenesis.