With the development of the wide-area monitoring system (WAMS), power system operators are capable of providing an accurate and fast estimation of time-varying load parameters. This study proposes a spatial-temporal deep network-based new attention concept to capture the dynamic and static patterns of electrical load consumption through modeling complicated and non-stationary interdependencies between time sequences. The designed deep attention-based network benefits from long short-term memory (LSTM) based component to learning temporal features in time and frequency-domains as encoder-decoder based recurrent neural network. Furthermore, to inherently learn spatial features, a convolutional neural network (CNN) based attention mechanism is developed. Besides, this paper develops a loss function based on a pseudo-Huber concept to enhance the robustness of the proposed network in noisy conditions as well as improve the training performance. The simulation results on IEEE 68-bus demonstrates the effectiveness and superiority of the proposed network through comparison with several previously presented and state-of-the-art methods.