Q-Learning algorithm theory with mathematical foundations. Markov Decision Process (MDP) An MDP is defined by the tuple $(S, A, P, R, \gamma)$: $S$: Set of states $A$: Set of actions $P$: Transition probability $P(s'|s,a)$ $R$: Reward function $R(s,a,s')$ $\gamma \in [0,1]$: Discount factor Value Functions State Value …
Read More