-
Equivalence of Forward and Backward view in TD($\lambda$) method (Incomplete Blog)
This blog post explains the equivalence of the forward view and the backward view in the Temporal Difference ($\lambda$) method. The same holds true for generalized advantage estimation.
-
Monte-Carlo Algorithm with Reward Reshaping for Mountain Car gym environment
This blog post explains the On-policy Every-Visit Monte-Carlo algorithm and its implementation in *MountainCar-v0* openai-gym environment.
-
Contraction Property of Bellman Operator with contraction operator $\gamma < 1$
This blog post shows that the Bellman Operator used in value iteration is a contraction operator with contraction $\gamma<1$ and with respect to the $l_{\infty}$-norm.