<p>The seismic full waveform inversion (FWI), as one of important ways to obtain the seismic wave velocity, has made rapid development in the last decade. In response to problems of cycle-skipping artifacts, dependence on the initial model, and low-frequency information in FWI, researchers have made many improvements, such as multi-scale envelope inversion and low-frequency extension. Recently, deep learning has been also adopted seismic data processing and interpretation, because of its strong nonlinear mapping ability. However, these works depend on labels used for training heavily, especially for the velocity model in the inversion, which prevents them from real application. Referring to these studies, this work combines low-frequency extension commonly as well as multiscale inversion with deep learning, and proposes a multi-scale FWI gradient optimization method based on CNN. CNN we designed is trained to predict the inversion gradient corresponding to the low-frequency band data in FWI, so that multi-scale gradient optimization can be directly used in multi-scale inversion, expanding the low-frequency information in the actual data and reducing the calculation in FWI. With a specially designed dataset, CNN is trained to optimize the gradients computed from the high-frequency band data by predicting the gradients corresponding to the low-frequency band data and the gradients corresponding to the mid-frequency band data, respectively. The predicted gradients are used in different stages of the multi-scale inversion. The low-frequency gradients are used to invert the initial structural construction so as not to rely on a good initial model, and the high-frequency gradients are used to improve the accuracy of the inversion results. In this way, low-frequency expansion and multiscale inversion can be achieved simultaneously. Our method achieves good results on the initial model for a given uniform wave velocity, effectively alleviating the reliance on the initial model in FWI. This study provides a new idea of combining deep learning and full waveform inversion, which will be effectively used in seismic data processing.</p>