Indoor localization is one of the fundamentals of location-based services (LBS) such as seamless indoor and outdoor navigation, location-based precision marketing, spatial cognition of robotics, etc. Visual features take up a dominant part of the information that helps human and robotics understand the environment, and many visual localization systems have been proposed. However, the problem of indoor visual localization has not been well settled due to the tough trade-off of accuracy and cost. To better address this problem, a localization method based on image retrieval is proposed in this paper, which mainly consists of two parts. The first one is CNN-based image retrieval phase, CNN features extracted by pre-trained deep convolutional neural networks (DCNNs) from images are utilized to compare the similarity, and the output of this part are the matched images of the target image. The second one is pose estimation phase that computes accurate localization result. Owing to the robust CNN feature extractor, our scheme is applicable to complex indoor environments and easily transplanted to outdoor environments. The pose estimation scheme was inspired by monocular visual odometer, therefore, only RGB images and poses of reference images are needed for accurate image geo-localization. Furthermore, our method attempts to use lightweight datum to present the scene. To evaluate the performance, experiments are conducted, and the result demonstrates that our scheme can efficiently result in high location accuracy as well as orientation estimation. Currently the positioning accuracy and usability enhanced compared with similar solutions. Furthermore, our idea has a good application foreground, because the algorithms of data acquisition and pose estimation are compatible with the current state of data expansion.