An area of machine learning concerned with how software agents ought to take actions in an environment to maximize some notion of cumulative reward.