[rllib] Add vf clipping param to fix pendulum example#2921
Conversation
|
Any plots? |
|
You get about the same performance as before: -900 within about 100k, and
-140 by 300k ish.
…On Wed, Sep 19, 2018, 3:39 PM Richard Liaw ***@***.***> wrote:
Any plots?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2921 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAA6SgYHxIEBHjHetM8cB8mNF4h72bjbks5ucseVgaJpZM4WxEc2>
.
|
|
Test PASSed. |
|
Test PASSed. |
|
this is actually probably why PPO hasn't been working for us; our rewards are on the wrong scale too. Thanks for this fix! |
|
I'm wondering if we should disable VF clipping by default (i.e. set to 9999), it seems like it is easy to run into issue. |
|
Changed it to 10.0 by default. |
|
I think that's pretty compelling; it's kind of a hidden failure mode. I'm not sure there's a consensus on how to appropriately scale your rewards, so I could easily imagine users using relatively large reward |
|
Test PASSed. |
richardliaw
left a comment
There was a problem hiding this comment.
I would be ok with turning off clipping by default; but 10.0 is fine too.
What do these changes do?
The vf clip param is sensitive to the scale of the rewards. This broke the pendulum tuned example when clipping was fixed. cc @eugenevinitsky
Related issue number
#2233