Hard to train
#1
by
CheN70
- opened
I think this one is really hard to train, as it may converge to the loacl optimazation.
I concur.
I am trying to write a2c to train this. With large effort, its result does not beat vanilla REINFORCE.