Thanks a lot for your feedback.

For this particular proof, my main resources were the PhD thesis of a friend (Johanni Brea) and the appendix of “Algorithms for Reinforcement Learning” by Csaba Szepesvári.

But in general, for resources concerning the rigorous mathematical treatment of RL, you can look at "Markov Decision Processes: Discrete Stochastic Dynamic Programming" by Martin Putterman or one of the many books of Dimitri Bertsekas on the topic. Michael Littman has also quite a few amazing works (including his PhD thesis) if you are also interested in POMDPs.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store