Antonio Serrano Muñoz commited on
Commit
488b1a8
1 Parent(s): c7eb3be

Add README

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: skrl
3
+ tags:
4
+ - deep-reinforcement-learning
5
+ - reinforcement-learning
6
+ - skrl
7
+ model-index:
8
+ - name: PPO
9
+ results:
10
+ - metrics:
11
+ - type: mean_reward
12
+ value: 2383.55 +/- 449.01
13
+ name: Total reward (mean)
14
+ task:
15
+ type: reinforcement-learning
16
+ name: reinforcement-learning
17
+ dataset:
18
+ name: OmniIsaacGymEnvs-FrankaCabinet
19
+ type: OmniIsaacGymEnvs-FrankaCabinet
20
+ ---
21
+
22
+ # OmniIsaacGymEnvs-FrankaCabinet-PPO
23
+
24
+ Trained agent model for [NVIDIA Omniverse Isaac Gym](https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs) environment
25
+
26
+ - **Task:** FrankaCabinet
27
+ - **Agent:** [PPO](https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ppo.html)
28
+
29
+ # Usage (with skrl)
30
+
31
+ ```python
32
+ from skrl.utils.huggingface import download_model_from_huggingface
33
+
34
+ # assuming that there is an agent named `agent`
35
+ path = download_model_from_huggingface("skrl/OmniIsaacGymEnvs-FrankaCabinet-PPO")
36
+ agent.load(path)
37
+ ```
38
+
39
+ # Hyperparameters
40
+
41
+ ```python
42
+ # https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ppo.html#configuration-and-hyperparameters
43
+ cfg_ppo = PPO_DEFAULT_CONFIG.copy()
44
+ cfg_ppo["rollouts"] = 16 # memory_size
45
+ cfg_ppo["learning_epochs"] = 8
46
+ cfg_ppo["mini_batches"] = 8 # 16 * 4096 / 8192
47
+ cfg_ppo["discount_factor"] = 0.99
48
+ cfg_ppo["lambda"] = 0.95
49
+ cfg_ppo["learning_rate"] = 5e-4
50
+ cfg_ppo["learning_rate_scheduler"] = KLAdaptiveRL
51
+ cfg_ppo["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.008}
52
+ cfg_ppo["random_timesteps"] = 0
53
+ cfg_ppo["learning_starts"] = 0
54
+ cfg_ppo["grad_norm_clip"] = 1.0
55
+ cfg_ppo["ratio_clip"] = 0.2
56
+ cfg_ppo["value_clip"] = 0.2
57
+ cfg_ppo["clip_predicted_values"] = True
58
+ cfg_ppo["entropy_loss_scale"] = 0.0
59
+ cfg_ppo["value_loss_scale"] = 2.0
60
+ cfg_ppo["kl_threshold"] = 0
61
+ cfg_ppo["rewards_shaper"] = lambda rewards, timestep, timesteps: rewards * 0.01
62
+ cfg_ppo["state_preprocessor"] = RunningStandardScaler
63
+ cfg_ppo["state_preprocessor_kwargs"] = {"size": env.observation_space, "device": device}
64
+ cfg_ppo["value_preprocessor"] = RunningStandardScaler
65
+ cfg_ppo["value_preprocessor_kwargs"] = {"size": 1, "device": device}
66
+ # logging to TensorBoard and writing checkpoints
67
+ cfg_ppo["experiment"]["write_interval"] = 120
68
+ cfg_ppo["experiment"]["checkpoint_interval"] = 1200
69
+ ```