Background: Nonalcoholic fatty liver disease (NAFLD) increases the possibility to suffer from liver or cardiovascular disease. Although hepatic biopsy is well acknowledged as the standard diagnosis, it is difficult to implement because of its intrusiveness and cost concerns.
Moreover, overweight people or diabetic patients are always NAFLD-positive, but not absolute. Therefore, to distinguish whether a diabetic case has NAFLD via nonintrusive indicators is of great significance for further interventions. Objective: With 8499 diabetic patients hosted by Shanghai
Sixth People’s Hospital, we try to rank the impacts of multiple routine indicators (features) on NAFLD, and further predict NAFLD within this diabetic population. Methods: We first rank dozens of related features according to their contributions in NAFLD prediction, and then we
prune several trivial features to simplify the prediction. Additionally, three classification algorithms are considered and compared, e.g., C4.5, Naïve Bayes and Random Forest. Results: The experiment shows that Random Forest outperforms the rest (accuracy 85.1%, recall 90.98%
and AUC 0.631). Conclusions: We find that the top nine markers together can effectively tell NAFLD out of this diabetic population. They are triglyceride (TG), low density lipoprotein (LDL), insulin (INS), hbA1C, high-density lipoprotein (HDL), fasting plasma glucose (FPG), age, total
cholesterol (TC) and duration.