Regression-Guided Clustering: A Semisupervised Method for Circulation-to-Environment Synoptic Classification
AbstractRegression-guided clustering is introduced as a means of constructing circulation-to-environment synoptic climatological classifications. Rather than applying an unsupervised clustering algorithm to synoptic-scale atmospheric circulation data, one instead augments the atmospheric circulation dataset with predictions from a supervised regression model linking circulation to environment. The combined dataset is then entered into the clustering algorithm. The level of influence of the environmental dataset can be controlled by a simple weighting factor. The method is generic in that the choice of regression model and clustering algorithm is left to the user. Examples are given using standard multivariate linear regression models and the k-means clustering algorithm, both established methods in synoptic climatology. Results for southern British Columbia, Canada, indicate that model performance can be made to range between that of a fully unsupervised algorithm and a fully supervised algorithm.