AbstractLarge datasets, consisting of hundreds or thousands of subjects, are becoming the new data standard within the neuroimaging community. While big data creates numerous benefits, such as detecting smaller effects, many of these big datasets have focused on non-clinical populations. The heterogeneity of clinical populations makes creating datasets of equal size and quality more challenging. There is a need for methods to connect these robust large datasets with the carefully curated clinical datasets collected over the past decades. In this study, we use resting-state fMRI data from the Adolescent Brain Cognitive Development study and the Human Connectome Project to discover generalizable brain features for use in an out-of-sample predictive model to classify young (3-10yrs) children who stutter from fluent peers. We achieve up to 72% classification accuracy using 10-fold cross validation. This study suggests that big data has the potential to yield generalizable biomarkers that are clinically meaningful. Specifically, this is the first study to demonstrate that big data-derived brain features can help differentiate children who stutter from their fluent peers and provide novel information on brain networks relevant to stuttering pathophysiology. The results provide a significant expansion to previous understanding of the neural bases of stuttering. In addition to auditory, somatomotor, and subcortical networks, the big data-based models highlight the importance of considering large scale brain networks supporting error sensitivity, attention, cognitive control, and emotion regulation/self-inspection in the neural bases of stuttering.