In human-machine interaction, facial emotion recognition plays an important role in recognizing the psychological state of humans. In this study, we propose a novel emotion recognition framework based on using a knowledge transfer approach to capture features and employ an improved deep forest model to determine the final emotion types. The structure of a very deep convolutional network is learned from ImageNet and is utilized to extract face and emotion features from other data sets, solving the problem of insufficiently labeled samples. Then, these features are input into a classifier called multi-composition deep forest, which consists of 16 types of forests for facial emotion recognition, to enhance the diversity of the framework. The proposed method does not need require to train a network with a complex structure, and the decision tree-based classifier can achieve accurate results with very few parameters, making it easier to implement, train, and apply in practice. Moreover, the classifier can adaptively decide its model complexity without iteratively updating parameters. The experimental results for two emotion recognition problems demonstrate the superiority of the proposed method over several well-known methods in facial emotion recognition.