Biases in arginine codon usage correlate with genetic disease risk
ABSTRACTPurposeThe persistence of hypermutable ‘CGN’ (CGG, CGA, CGC, CGU) arginine codons at high frequency suggests the possibility of negative selective pressure at these sites and that arginine codon usage could be a predictive indicator of human disease genes.MethodsWe analyzed arginine codons (CGN, AGG, AGA) from all ‘canonical’ Ensembl protein coding gene transcripts before comparing the frequency of CGN codons between genes with and without human disease associations and with gnomAD constraint metrics.ResultsThe frequency of CGN codons among a gene’s total arginine codon count was higher in genes linked to syndromic autism spectrum disorder (ASD) compared to genes not associated with ASD. A comparison of genes annotated as dominant or recessive with control genes not matching either classification revealed a progressive increase in CGN codon frequency. Moreover, CGN frequency was positively correlated with a gene’s probability of loss-of-function intolerance (pLI) score and negatively correlated with ‘observed-over-expected’ ratios for both loss of function and missense mutations.ConclusionOur findings indicate that genes utilizing CGN arginine codons rather than AGG or AGA are more likely to underlie single gene disorders, particularly for dominant phenotypes, and thus constitute candidate genes for the study of human genetic disease.