Wyatt-Huang
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -45,7 +45,7 @@ Hyperparameters for DIPO have been shown as follow for easily reproducing our re
|
|
45 |
| Batch size | 256 | 256 | 256 | 256 |
|
46 |
| Discount for reward $$\gamma$$ | 0.99 | 0.99 | 0.99 | 0.99 |
|
47 |
| Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
|
48 |
-
| Learning rate for actor |
|
49 |
| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
50 |
| Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
|
51 |
| Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |
|
|
|
45 |
| Batch size | 256 | 256 | 256 | 256 |
|
46 |
| Discount for reward $$\gamma$$ | 0.99 | 0.99 | 0.99 | 0.99 |
|
47 |
| Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
|
48 |
+
| Learning rate for actor | $$3 × 10^{-4}$$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
49 |
| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
50 |
| Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
|
51 |
| Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |
|