Vocalisations linked to emotional states are partly conserved among phylogenetically related species. This continuity may allow humans to accurately infer affective information from vocalisations produced by chimpanzees. In two pre-registered experiments, we examine human listeners’ ability to infer behavioural contexts (e.g., discovering food) and core affect dimensions (arousal and valence) from 155 vocalisations produced by 66 chimpanzees in 10 different positive and negative contexts at high, medium, or low arousal levels. In Experiment 1, listeners (n = 310), categorised the vocalisations in a forced-choice task with 10 response options, and rated arousal and valence. In Experiment 2, participants (n = 3120) matched vocalisations to production contexts using Yes/No response options. The results show that listeners were accurate at matching vocalisations of most contexts in addition to inferring levels of arousal and valence. Judgments were more accurate for negative as compared to positive vocalisations. An acoustic analysis demonstrated that, listeners made use of brightness and duration cues, and relied on noisiness in making context judgements, and pitch to infer core affect dimensions. Overall, the results suggest that human listeners can infer affective information from chimpanzee vocalisations beyond core affect dimensions, indicating phylogenetic continuity in the mapping of vocalisations to behavioural contexts.