Over the last couple of decades, a vast amount of research has been dedicated to understanding the nature and the architecture of visual short-term memory (VSTM), the mechanism by which currently relevant visual information is maintained. According to discrete-capacity models, VSTM is constrained by a limited number of discrete representations held simultaneously. In contrast, shared-resource models regard VSTM as limited in resources, which can be distributed flexibly between varying numbers of representations, and a new interference model posits that capacity is limited by interference among items. In this paper, we begin by reviewing benchmark findings regarding the debate over VSTM limitations, focusing on whether VSTM storage is all-or-none, and on whether objects’ complexity affects capacity. Afterwards, we put forward a hybrid framework of VSTM architecture, arguing that this system is composed of a two-level hierarchy of memory stores, each containing a different set of representations: (1) Perceptual Memory (PM), a resource-like level containing analog automatically-formed representations of visual stimuli in varying degrees of activation, and (2) visual Working Memory (WM), in which a subset of 3-4 items from PM are bound to conceptual representations and to their locations, thus conveying discrete (digital/symbolic) information which appears quantized. While PM has a large capacity and is relatively non-selective, visual WM is restricted in the number of items that can be maintained simultaneously and its content is regulated by a gating mechanism.