Class Distribution Curve Based Discretization With Application to Wearable Sensors and Medical Monitoring
Understanding diseases and human activities, and constructing highly accurate classifiers are two important tasks in bio-medicine, healthcare, and wearable sensor technology. Being able to mine high-quality patterns is useful here, as such patterns can help improve understanding and build accurate classifiers. However, most pattern mining algorithms only operate on discrete data; applying them often requires a binning step to discretize continuous attributes. This article presents a new discretization technique, called Class Distribution Curve based Binning (CDC Binning); the main idea is to use a so-called class distribution curve, which measures the class purity in sliding windows over an attribute's range, to construct binning intervals. Experiments show that (1) CDC Binning outperforms existing binning methods in discovering high-quality patterns, especially when the class distribution curve is complicated (e.g. when the two classes are two fairly similar human activities), and (2) it can outperform other binning methods by 10% in classification accuracy when using discovered patterns as features. CDC Binning is particularly useful for applications where the classes/activities to be distinguished are similar to each other. This is especially important in wearable sensor technology where detection of behavioral or activity changes in a person (e.g. fall detection) could indicate a significant medical event.