Gene-specific artificial intelligence-based variant classification engine: results of a time-capsule experiment
Abstract Background: Interpretation of genetic variation remains an impediment to cost-effective application of genomics to medicine. An advanced artificial intelligence (AI)-based Variant Classification Engine (aiVCE), rooted in ACMG/AMP guidelines, employs data-driven methods to expedite gene-specific classification (franklin.genoox.com). In this blinded study, the aiVCE’s overall and rule-level performances were evaluated using ClinVar (v. 2018-10) variants with creation dates after 5/01/2017. By removing any prior knowledge of these variants from the aiVCE training data, they were treated as novel variants. Using a ‘Full’ dataset (75,801 variants with ≥1 star) and an ‘Increased-Certainty’ dataset (3,993 variants with ≥2 stars), the aiVCE classified variants as pathogenic (P), likely-pathogenic (LP), uncertain significance (VUS), likely-benign (LB), or benign (B). VUS with sufficient supporting data were subclassified as VUS-leaning benign or VUS-leaning pathogenic. aiVCE results were evaluated to determine concordance with final ClinVar classification and rule-level determinations. Results: The aiVCE demonstrated >97% concordance among Increased-Certainty variants. Concordance was >95% across variant effects (e.g., missense, null, splice region), and was >93.5% for the Full dataset. When assessing the aiVCE’s application of specific ACMG rules, significant differences were observed between ClinVar P/LP and B/LB variants rule-met proportions (all P<0.00001), thus supporting gene-specific rule selections. Evaluation of discordance between the aiVCE and ClinVar uncovered evidences that might have been unavailable to submitting laboratories, highlighting AI utility in variant classification. Conclusions: The aiVCE exhibited robust performance, despite lacking past evidence, in determining whether variants would be categorized as P/LP. Applying latest computational advances to existing guidelines may assist scientists and clinicians interpret variants with limited clinical information and greatly reduce analytical bottlenecks.