Wyatt-Huang
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,9 @@
|
|
1 |
---
|
2 |
-
license:
|
3 |
tags:
|
4 |
- policy representation
|
5 |
- diffusion
|
|
|
6 |
---
|
7 |
## Policy Representation via Diffusion Probability Model for Reinforcement Learning
|
8 |
|
@@ -43,9 +44,9 @@ Hyperparameters for DIPO have been shown as follow for easily reproducing our re
|
|
43 |
| No. of hidden nodes | 256 | 256 | 256 | 256 |
|
44 |
| Activation | mish | relu | relu | tanh |
|
45 |
| Batch size | 256 | 256 | 256 | 256 |
|
46 |
-
| Discount for reward
|
47 |
| Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
|
48 |
-
| Learning rate for actor |
|
49 |
| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
50 |
| Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
|
51 |
| Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |
|
|
|
1 |
---
|
2 |
+
license: mit
|
3 |
tags:
|
4 |
- policy representation
|
5 |
- diffusion
|
6 |
+
- reinforcement learning
|
7 |
---
|
8 |
## Policy Representation via Diffusion Probability Model for Reinforcement Learning
|
9 |
|
|
|
44 |
| No. of hidden nodes | 256 | 256 | 256 | 256 |
|
45 |
| Activation | mish | relu | relu | tanh |
|
46 |
| Batch size | 256 | 256 | 256 | 256 |
|
47 |
+
| Discount for reward $\gamma$ | 0.99 | 0.99 | 0.99 | 0.99 |
|
48 |
| Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
|
49 |
+
| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
50 |
| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
51 |
| Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
|
52 |
| Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |
|