Super-unsupervised text classification for labeling online political hate
We live in a world of text but the sheer magnitude of social media data coupled with a need to measure complex psychological constructs have made this important source of data difficult to use for many social scientists. Either researchers engage in costly hand-coding of thousands of texts using supervised techniques or in unsupervised techniques where the measurement of predefined constructs are difficult. We propose a novel approach which we call super-unsupervised learning using the psychologically complex construct online political hate. This approach draws on the best features from both supervised and unsupervised learning techniques: Measurements of complex psychological constructs without a single labelled data source. We first outline the approach and then provide tests of (i) face validity, (ii) convergent and discriminant validity, (iii) criterion validity, (iv) external validity and (v) ecological validity.