The Druggable Genome as Seen from the Protein Data Bank
AbstractAlmost twenty years after the human genome was sequenced, the wealth of data produced by the international human genome project has not translated into a significantly improved drug discovery enterprise. This is in part because small molecule modulators that could be used to explore the cellular function of their target proteins and to discover new therapeutic opportunities are only available for a limited portion of the human proteome. International efforts are underway to develop such chemical tools for a few, specific protein families, and a “Target 2035” call to enable, expand and federate these efforts towards a comprehensive chemical coverage of the druggable genome was recently announced. But what is the druggable genome? Here, we systematically review structures of human proteins bound to drug-like ligands available from the protein databank (PDB) and use ligand desolvation upon binding as a druggability metric to draw a landscape of the human druggable genome. We show that the vast majority of druggable protein families, including some highly populated and deeply associated with cancer according to genomic screens, are almost orphan of small molecule ligands, and propose a list of 46 druggable domains representing 3440 human proteins that could be the focus of large chemical probe discovery efforts.