Systematic human learning by literature and data mining for feature selection in machine learning
Abstract We proposed a learning algorithm for human to conduct literature and data mining for causal factor discovery. The applicability is to select features for a machine learning prediction model, including but not limited to that using real-world, time-varying data from electronic health records. This protocol is relatively quick to find potentially actionable predictors for a clinical prediction while dealing with high dimensionality in big data. However, this protocol might not find a potentially novel cause, since this only exhaustively examines the existing evidences in a single study. The key stages consisted of systematic human learning, causal diagram construction, data preprocessing, causal inference modeling, and development and validation of a prediction model to describe the explainability.