Wyatt-Huang commited on
Commit
9244b1e
·
verified ·
1 Parent(s): 6c3ec60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -1,8 +1,9 @@
1
  ---
2
- license: openrail++
3
  tags:
4
  - policy representation
5
  - diffusion
 
6
  ---
7
  ## Policy Representation via Diffusion Probability Model for Reinforcement Learning
8
 
@@ -43,9 +44,9 @@ Hyperparameters for DIPO have been shown as follow for easily reproducing our re
43
  | No. of hidden nodes | 256 | 256 | 256 | 256 |
44
  | Activation | mish | relu | relu | tanh |
45
  | Batch size | 256 | 256 | 256 | 256 |
46
- | Discount for reward $$\gamma$$ | 0.99 | 0.99 | 0.99 | 0.99 |
47
  | Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
48
- | Learning rate for actor | $$3 × 10^{-4}$$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
49
  | Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
50
  | Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
51
  | Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |
 
1
  ---
2
+ license: mit
3
  tags:
4
  - policy representation
5
  - diffusion
6
+ - reinforcement learning
7
  ---
8
  ## Policy Representation via Diffusion Probability Model for Reinforcement Learning
9
 
 
44
  | No. of hidden nodes | 256 | 256 | 256 | 256 |
45
  | Activation | mish | relu | relu | tanh |
46
  | Batch size | 256 | 256 | 256 | 256 |
47
+ | Discount for reward $\gamma$ | 0.99 | 0.99 | 0.99 | 0.99 |
48
  | Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
49
+ | Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
50
  | Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
51
  | Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
52
  | Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |