Thanks a lot for your feedback.

For this particular proof, my main resources were the PhD thesis of a friend (Johanni Brea) and the appendix of “Algorithms for Reinforcement Learning” by Csaba Szepesvári.

But in general, for resources concerning the rigorous mathematical treatment of RL, you can look at "Markov Decision Processes: Discrete Stochastic Dynamic Programming" by Martin Putterman or one of the many books of Dimitri Bertsekas on the topic. Michael Littman has also quite a few amazing works (including his PhD thesis) if you are also interested in POMDPs.

--

CS PhD student in the Laboratory of Computational Neuroscience at EPFL || Personal website: https://sites.google.com/view/modirsha

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alireza Modirshanechi

Alireza Modirshanechi

CS PhD student in the Laboratory of Computational Neuroscience at EPFL || Personal website: https://sites.google.com/view/modirsha