Sitemap
1 min readDec 20, 2020

Hi Rishabh,

Thanks for your comment.

I am not sure if I understood what you mean. But we do not need in this proof to talk about immediate rewards explicitly. I would be happy to hear your comments with more details, but it may be helpful if I try to make the proof more clear:

1. The 1st line is basically the use of Eq. 1.

2. For the 2nd line, we claim that the expectation of q_\pi with respect to \pi' is greater than or equal to its expectation with respect to \pi. To see this fact, let us consider two cases:

(i) s \neq s*: in this case, \pi' is the same as \pi, and hence the inequality holds (it is more precisely equality in this case).

(ii) s=s*: for this case, the expectation with respect to \pi' is equal to q_\pi(s*,a*) which is by assumption greater than v_\pi(s*), that is, the expectation with respect to \pi.

3. So far, we proved 1st equality and 1st inequality. To go from line 2 to 3, we can basically use Bellman equations - with some considerations.

4. After step 3, we repeat the reasoning of step 2 again and again.

Please let me know if you still think it is not clear or wrong.

Best,

Alireza

Alireza Modirshanechi
Alireza Modirshanechi

Written by Alireza Modirshanechi

Postdoc at Helmholtz Munich and MPI for Biological Cybernetics; Ph.D. in CS from EPFL; Personal website: https://sites.google.com/view/modirsha

Responses (1)