Modern machinery becomes more precious with the advance of science, and fault diagnosis is vital for avoiding economical losses or casualties. Among massive diagnosis methods, deep learning algorithms stand out to open an era of intelligent fault diagnosis. Deep residual networks are the state-of-the-art deep learning models which can continuously improve performance by deepening the network structures. However, in vibration-based fault diagnosis, the transient property instability of vibration signal usually calls for time–frequency analysis methods, and the characters of time–frequency matrices are distinct from standard images, which brings some natural limitations for the diagnosis performance of deep learning algorithms. To handle this issue, an enhanced deep residual network named the multilevel correlation stack-deep residual network is proposed in this article. Wavelet packet transform is used to preprocess the sensor signal, and then the proposed multilevel correlation stack-deep residual network uses kernels with different shapes to fully dig various kinds of useful information from any local regions of the processed input. Experiments on two rolling bearing datasets are carried out. Test results show that the multilevel correlation stack-deep residual network exhibits a more satisfactory classification performance than original deep residual networks and other similar methods, revealing significant potentials for realistic fault diagnosis applications.