MSIFinder: A python package for detecting MSI status using random forest classifier.
2601 Background: Microsatellite instability (MSI) is a common genomic alteration in several tumors, such as colorectal cancer, endometrial carcinoma, and stomach, which is characterized as microsatellite instability-high (MSI-H) and microsatellite stable (MSS) based on a high degree of polymorphism in microsatellite lengths. MSI is a predictive biomarker for immunotherapy efficacy in advanced/metastatic solid tumors, especially in colorectal cancer (CRC) patients. Several computational approaches based on target panel sequencing data have been used to detect MSI; However, they are considerably affected by the sequencing depth and panel size. Methods: We developed MSIFinder, a python package for automatic MSI classification, using random forest classifier (RFC)-based genome sequencing, which is a machine learning technology. We included 19 MSI-H and 25 MSS samples as training sets. First, RFC model were built by 54 feature markers from the training sets. Second. The software was validated the classifier using a test set comprising 21 MSI-H and 379 MSS samples. Results: With this test set, MSIFinder achieved a sensitivity (recall) of 0.997, a specificity of 1, an accuracy of 0.998, a positive predictive value (PPV) of 0.954, an F1 score of 0.977, and an area under curve (AUC) of 0.999. We discovered that MSIFinder is less affected by low sequencing depth and can achieve a concordance of 0.993, while exhibiting a sequencing depth of 100×. Furthermore, we realized that MSIFinder is less affected by the panel size and can achieve a concordance of 0.99 when the panel size is 0.5 m (million base). Conclusions: These results indicated that MSIFinder is a robust MSI classification tool and not affected by the panel size and sequencing depth. Furthermore, MSIFinder can provide reliable MSI detection for scientific and clinical purposes.[Table: see text]