DNN Policy Learning Theory
Deep Neural Network policy learning with mathematical foundations. Policy Gradient Methods Policy Parameterization Policy $\pi_\theta(a|s)$ parameterized by neural network with weights $\theta$. Objective Function Maximize expected return: $$ J(\theta) = \mathbb{E}{\tau \sim \pi\theta}\left[\sum_{t=0}^{T} \gamma^t …
Read MoreQ-Learning algorithm theory with mathematical foundations. Markov Decision Process (MDP) An MDP is defined by the tuple $(S, A, P, R, \gamma)$: $S$: Set of states $A$: Set of actions $P$: Transition probability $P(s'|s,a)$ $R$: Reward function $R(s,a,s')$ $\gamma \in [0,1]$: Discount factor Value Functions State Value …
Read More