reinforcement-learning | ITOHI

DNN Policy Learning Theory
Dec 12, 2024 · 3 min read · ai-knowhow reinforcement-learning policy-gradient dnn mathematics ·
Share on:
Deep Neural Network policy learning with mathematical foundations. Policy Gradient Methods Policy Parameterization Policy $\pi_\theta(a|s)$ parameterized by neural network with weights $\theta$. Objective Function Maximize expected return: $$ J(\theta) = \mathbb{E}{\tau \sim \pi\theta}\left[\sum_{t=0}^{T} \gamma^t …

Read More
Q-Learning Theory
Dec 12, 2024 · 3 min read · ai-knowhow reinforcement-learning q-learning mathematics ·
Share on:
Q-Learning algorithm theory with mathematical foundations. Markov Decision Process (MDP) An MDP is defined by the tuple $(S, A, P, R, \gamma)$: $S$: Set of states $A$: Set of actions $P$: Transition probability $P(s'|s,a)$ $R$: Reward function $R(s,a,s')$ $\gamma \in [0,1]$: Discount factor Value Functions State Value …

Read More