A visual atlas of genes tissue-specific pathological roles
Dysregulation of a gene′s function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene–pathology relationships is an ever daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape (HCL) project [1], researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger–tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite heterogeneous and is confounded by several clinical and demographic covariates. To circumvent this, we mined ≈18 million PubMed abstracts published till May 2019 and selected ≈6.1 million of them that describe the pathological role of genes in different diseases. Further, we employed a word embedding technique from the domain of Natural Language Processing (NLP) to learn vector representation of entities such as genes, diseases, tissues, etc., in a way such that their relationship is preserved in a vector space. Notably, Pathomap, by the virtue of its underpinning theory, also learns transitive relationships. Pathomap provided a vector representation of words indicating a possible association between DNMT3A/BCOR with CYLD cutaneous syndrome (CCS). The first manuscript reporting this finding was not part of our training data.