AbstractThe exploration-exploitation dilemma is one of a few fundamental problems in reinforcement learning and is seen as an intractable problem, mathematically. In this paper we prove the key to finding a tractable solution is to do an unintuitive thing–to explore without considering reward value. We have redefined exploration as having no objective but learning itself. Through theory and experiments we prove that this view leads to a perfect deterministic solution to the dilemma, based on the famous strategy win-stay, lose-switch strategy from game theory. This solution rests on our conjecture that information and reward are equally valuable for survival. Besides offering a mathematical answer, this view seems more robust than traditional approaches because it succeeds in the difficult conditions where rewards are sparse, deceptive, or non-stationary.