Q-learning

Q-learning is a model-free reinforcement learning technique. Specifically, Q-learning can be used to find an optimal action-selection policy for any given (finite) Markov decision process (MDP).

This is best option when there are limited set of possible states, in which there are limited set of possible actions that can be taken. The system will be rewarded for taking a certain action in a certain state. But it will also seek to get to such state to take the certain action even though the previous state/action didn't reward it immediately.