Fractal dimension pattern-based multiresolution analysis for rough estimator of speaker-dependent audio emotion recognition
As a general means of expression, audio analysis and recognition have attracted much attention for its wide applications in real-life world. Audio emotion recognition (AER) attempts to understand the emotional states of human with the given utterance signals, and has been studied abroad for its further development on friendly human–machine interfaces. Though there have been several the-state-of-the-arts auditory methods devised to audio recognition, most of them focus on discriminative usage of acoustic features, while feedback efficiency of recognition demands is ignored. This makes possible application of AER, and rapid learning of emotion patterns is desired. In order to make predication of audio emotion possible, the speaker-dependent patterns of audio emotions are learned with multiresolution analysis, and fractal dimension (FD) features are calculated for acoustic feature extraction. Furthermore, it is able to efficiently learn the intrinsic characteristics of auditory emotions, while the utterance features are learned from FDs of each sub-band. Experimental results show the proposed method is able to provide comparative performance for AER.