Wyatt-Huang
/

DIPO

Wyatt-Huang commited on Mar 12, 2024

Commit

9244b1e

verified ·

1 Parent(s): 6c3ec60

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,8 +1,9 @@
 ---
-license: openrail++
 tags:
 - policy representation
 - diffusion
 ---
 ## Policy Representation via Diffusion Probability Model for Reinforcement Learning
@@ -43,9 +44,9 @@ Hyperparameters for DIPO have been shown as follow for easily reproducing our re
 | No. of hidden nodes | 256 | 256  | 256  | 256  |
 | Activation | mish | relu | relu | tanh |
 | Batch size | 256 | 256 | 256 | 256 |
-| Discount for reward $$\gamma$$ | 0.99 | 0.99 | 0.99 | 0.99 |
 | Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
-| Learning rate for actor | $$3 × 10^{-4}$$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
 | Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
 | Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
 | Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |

 ---
+license: mit
 tags:
 - policy representation
 - diffusion
+- reinforcement learning
 ---
 ## Policy Representation via Diffusion Probability Model for Reinforcement Learning
 | No. of hidden nodes | 256 | 256  | 256  | 256  |
 | Activation | mish | relu | relu | tanh |
 | Batch size | 256 | 256 | 256 | 256 |
+| Discount for reward $\gamma$ | 0.99 | 0.99 | 0.99 | 0.99 |
 | Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
+| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
 | Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
 | Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
 | Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |