CADA: Phenotype-driven gene prioritization based on a case-enriched knowledge graph
AbstractMany rare syndromes can be well described and delineated from other disorders by a combination of characteristic symptoms. These phenotypic features are best documented with terms of the human phenotype ontology (HPO), which is increasingly used in electronic health records (EHRs), too. Many algorithms that perform HPO-based gene prioritization have also been developed, however, the performance of many such tools suffers from an overrepresentation of atypical cases in the medical literature. This is certainly the case if the algorithm cannot handle features that occur with reduced frequency in a disorder. With CADA we built a knowledge-graph that is based on case annotations and disorder annotations and show that CADA exhibits superior performance particularly for patients that present with the pathognomonic findings of a disease. Crucial in the design of our approach is the use of the growing amount of phenotypic information that diagnostic labs deposit in databases such as ClinVar. By this means CADA is an ideal reference tool for differential diagnostics in rare disorders that can also be updated regularly.