Draft version 28.03.2020. This paper has not been peer reviewed. Please do not copy or cite without author's permission. Short-term memory has mostly been investigated with verbal or visuospatial stimuli and less so with other categories of stimuli. Moreover, the influence of sensory modality has been explored almost solely in the verbal domain. The present study used the same experimental paradigm to investigate auditory and visual short-term memory for different types of stimuli. In each trial, participants were presented with two sequences of events, separated by a silent delay, and had to indicate whether the two sequences were identical or different. Performance in this recognition (delayed-matching-to-sample) paradigm was compared for materials that were either verbal (i.e., chained syllables without meaning) or nonverbal (i.e., not easily described by verbal labels). For the latter ones, the event sequence could either entail a contour, which is a pattern of up and down changes (based on non-pitch features), or not. All materials were implemented in both auditory and visual modalities. As previous research has reported better auditory memory (and to some extent, visual memory), and better auditory contour recognition for musicians than non-musicians, the recognition tasks were performed by a group of musicians and a group of non-musicians. Results revealed a selective advantage of musicians for the auditory no-contour stimuli and for the contour stimuli (both visual and auditory), suggesting that musical expertise is associated with specific short-term memory advantages in domains close to the trained domain, even extending cross-modally. These findings offer new insights into the role of encoding strategies and their effect on short-term memory performance across modalities.