Estimating the depth of the scene from a monocular image is an essential step for image semantic understanding. Practically, some existing methods for this highly ill-posed issue are still in lack of robustness and efficiency. This paper proposes a novel end-to-end depth esti- mation model with skip connections from a pre- trained Xception model for dense feature extrac- tion, and three new modules are designed to im- prove the upsampling process. In addition, ELU activation and convolutions with smaller kernel size are added to improve the pixel-wise regres- sion process. The experimental results show that our model has fewer network parameters, a lower error rate than the most advanced networks and requires only half the training time. The evalu- ation is based on the NYU v2 dataset, and our proposed model can achieve clearer boundary de- tails with state-of-the-art effects and robustness.