Abstract
It is estimated that 10-30% of disease-associated genetic variants affect splicing. Splicing variants may generate deleteriously altered gene product and are potential therapeutic targets. However, systematic diagnosis or prediction for splicing variants is yet to be established, especially for the near-exon intronic splice region. The major challenge lies in the redundant and ill-defined branch sites and other splicing motifs therein. Here, we carried out unbiased massively parallel splicing assays on 5,307 disease-associated variants overlapped with branch sites and collected 5,884 variants across the 5’ splice region. We found that strong splice sites and exonic features preserve splicing from intronic sequence variation. While the splicing altering mechanism of the 3’ intronic variants is complex, that of the 5’ is mainly splice site destruction. Statistical learning combined with these molecular features allows precise prediction for altered splicing from an intronic variant. This statistical model provides identity and ranking of biological features that determine splicing, which serves as transferable knowledge, and out-performs the benchmarking predictive tool. Moreover, we demonstrated that intronic splicing variants may associate with disease risks in human population. Our study elucidates the mechanism of splicing response of intronic variants, which classify disease-associated splicing variants for the promise of precision medicine.