Brain functions are ever more explored using data-driven methods, which allow to work with very large datasets collected in relatively natural experimental settings. However, like hypothesis-driven approaches, data-driven methods do not come without drawbacks, and pose interpretation problems, particularly in cognitive domains such as speech and language, where temporal processing is a key component. While hypothesis-driven methods explicitly address speech processing as a hierarchical system, data-driven approaches probe speech processing as a system that can flexibly combine multiple and distributed features. Given the disparity of available methods and underlying concepts, synthesizing the results of hypothesis- and data-driven experiments represents a substantial challenge. Taking a number of influential examples in the recent speech and language literature, we unpack advantages and limitations of both approaches, and highlight ways in which they can be fruitfully combined, for example by using time-resolved analyses, by applying specific models at each level of information transformation, or more generally by complementing data-driven, exploratory approaches with analysis methods that question the data within more constrained model-spaces.