Virtual reality satellites give people an immersive experience of exploring space. The intelligent attitude control method using reinforcement learning to achieve multiaxis synchronous control is one of the important tasks of virtual reality satellites. In real-world systems, methods based on reinforcement learning face safety issues during exploration, unknown actuator delays, and noise in the raw sensor data. To improve the sample efficiency and avoid safety issues during exploration, this paper proposes a new offline reinforcement learning method to make full use of samples. This method learns a policy set with imitation learning and a policy selector using a generative adversarial network (GAN). The performance of the proposed method was verified in a real-world system (reaction-wheel-based inverted pendulum). The results showed that the agent trained with our method reached and maintained a stable goal state in 10,000 steps, whereas the behavior cloning method only remained stable for 500 steps.