Hard to train

by CheN70 - opened

I think this one is really hard to train, as it may converge to the loacl optimazation.

I concur.
I am trying to write a2c to train this. With large effort, its result does not beat vanilla REINFORCE.

Sign up or log in to comment