Correlates of Physical Activity Behavior in Adults: A Data Mining Approach
Abstract Purpose A data mining approach was applied to establish a multilevel hierarchy explaining physical activity (PA) behavior, and to methodologically identify the correlates of PA behavior. Methods The 46-year follow-up data from the population-based Northern Finland Birth Cohort 1966 were used to create a hierarchy using Chi-square Automatic Interaction Detection (CHAID) decision tree technique for predicting PA behavior. The study’s subjects were classified as physically active or physically inactive based on their activity profiles derived from objective measurement of PA. The variables were a wide list of potentially modifiable factors including self-reported, clinical, and environmental measures. We then analyzed the association of the factors emerging from the model with three PA metrics including sedentary (SED), light PA (LPA), and moderate-to-vigorous PA (MVPA) minutes per day. Results Model fitting was performed using a total of 168 factors as input variables to classify the PA behavior of 2,701 physically active and 1,881 physically inactive subjects. The decision tree selected a total of 36 factors of different domains by which 54 subgroups of subjects were formed. Factors emerging from the model were associated with the PA metrics, including body fat percentage (SED: B=26.5, LPA: B=-16.1, and MVPA: B=-11.7), normalized heart rate recovery 60 seconds after exercise (SED: B=-16.1, LPA: B=9.9, and MVPA: B=9.6), average weekday total sitting time (SED: B=34.1, LPA: B=-25.3, and MVPA: B=-5.8), and extravagance score (SED: B=6.3 and LPA: B=-3.7). Conclusions Using data mining, a data-driven model was established from empirical data that can be potentially utilized to identify subgroups for multilevel intervention allocation. An extensive set of factors was methodologically discovered that can be a basis for additional hypothesis testing in PA correlates research.