JaiSurya commited on
Commit
196d954
·
1 Parent(s): 3ec724c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -6
README.md CHANGED
@@ -23,15 +23,48 @@ model-index:
23
 
24
  # **PPO** Agent playing **LunarLander-v2**
25
  This is a trained model of a **PPO** agent playing **LunarLander-v2**
26
- using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
 
 
27
 
28
  ## Usage (with Stable-baselines3)
29
- TODO: Add your code
 
 
30
 
 
 
 
 
31
 
32
- ```python
33
- from stable_baselines3 import ...
34
- from huggingface_sb3 import load_from_hub
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ...
 
 
 
37
  ```
 
23
 
24
  # **PPO** Agent playing **LunarLander-v2**
25
  This is a trained model of a **PPO** agent playing **LunarLander-v2**
26
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27
+
28
+ This model is trained with the help of [Deep RL Course by HuggingFace](https://huggingface.co/learn/deep-rl-course/unit0/introduction)
29
 
30
  ## Usage (with Stable-baselines3)
31
+ ```python
32
+ # necessary libraries
33
+ import gymnasium as gym
34
 
35
+ from huggingface_sb3 import load_from_hub, package_to_hub
36
+ from huggingface_hub import (
37
+ notebook_login,
38
+ )
39
 
40
+ from stable_baselines3 import PPO
41
+ from stable_baselines3.common.env_util import make_vec_env
42
+ from stable_baselines3.common.evaluation import evaluate_policy
43
+ from stable_baselines3.common.monitor import Monitor
44
+
45
+ # Step 1 : Create an environment
46
+ env = gym.make("LunarLander-v2")
47
+ observation,info = env.reset() # initialize the environment
48
+
49
+ # Step 2 : Create the model
50
+ model = PPO(
51
+ policy = "MlpPolicy", # Multiple Layer Perceptron Policy
52
+ env = env,
53
+ n_steps = 1024,
54
+ batch_size = 64,
55
+ n_epochs = 5,
56
+ gamma = 0.995,
57
+ gae_lambda = 0.98,
58
+ ent_coef = 0.0001,
59
+ clip_range = 0.1,
60
+ verbose = 1
61
+ )
62
+
63
+ # Step 3 : Train the model
64
+ model.learn(total_timesteps=2500000,progress_bar = True)
65
 
66
+ # Step 4 : Evaluation
67
+ eval_env = Monitor(gym.make("LunarLander-v2"))
68
+ mean_reward,std_reward = evaluate_policy(model,eval_env,n_eval_episodes = 10 ,deterministic=True)
69
+ print(f"Mean reward : {mean_reward} +/- {std_reward}")
70
  ```