q-learning | ITOHI

Q-Learning Theory
Dec 12, 2024 · 3 min read · ai-knowhow reinforcement-learning q-learning mathematics ·
Share on:
Q-Learning algorithm theory with mathematical foundations. Markov Decision Process (MDP) An MDP is defined by the tuple $(S, A, P, R, \gamma)$: $S$: Set of states $A$: Set of actions $P$: Transition probability $P(s'|s,a)$ $R$: Reward function $R(s,a,s')$ $\gamma \in [0,1]$: Discount factor Value Functions State Value …

Read More