Visual recognition is achieved by a hierarchy of bidirectionally connected cortical areas. The entry of signals into higher areas involves the serial sampling of information within a movable window of attention. Here we explore how the cortex can move this window and integrate the sampled information. To make this concrete, we modeled the process of visual word recognition by hierarchical cortical areas representing features, letters, and words. At the start of the recognition process, nodes representing all contextually possible words are active. Simple connectivity rules allow a parallel top-down (T-D) computation of the relative probability of each feature at each location given the set of active words. This information is then used to guide the window of attention to information-rich features (e.g., a feature that is present in the visual image but has lowest probability). Bottom-up processing of this feature excludes words that do not contain it and leads to T-D recomputation of feature probabilities. Recognition occurs after several such cycles when all but one word has been excluded. We show that when 950 words are stored in long-term memory, recognition occurs after an average of 4.9 cycles. Because covert attention can be moved every 20–30 ms, word recognition could be as fast as determined experimentally (<200 ms of cortical processing). This model accounts for the findings that recognition time depends logarithmically on set size, recognition time is reduced when context reduces the number of possible targets, the time to classify a nonword decreases when its approximation to English decreases, and in high level cortex, the firing of neurons tuned to an object increases progressively as its recognition occurs. More generally the model provides a physiologically plausible view of how bi-directional signal flow in cortex guides attention to produce efficient recognition.