Andrei Cozma commited on
Commit
6e58943
·
1 Parent(s): 45dcb54
Files changed (3) hide show
  1. README.md +30 -15
  2. assets/gradio_demo.png +0 -0
  3. demo.py +1 -1
README.md CHANGED
@@ -15,20 +15,34 @@ pinned: true
15
 
16
  Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
17
 
18
- ## Requirements
 
 
19
 
20
- - Python 3
21
  - Gymnasium: <https://pypi.org/project/gymnasium/>
22
  - WandB: <https://pypi.org/project/wandb/> (for logging)
23
  - Gradio: <https://pypi.org/project/gradio/> (for demo web app)
24
 
25
- ## Interactive Demo
26
 
27
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
- ## 2. Agents
30
 
31
- ### Dynamic-Programming Agent
32
 
33
  TODO
34
 
@@ -38,7 +52,7 @@ TODO
38
  TODO
39
  ```
40
 
41
- ### Monte-Carlo Agent
42
 
43
  This is the implementation of an On-Policy Monte-Carlo agent to solve several toy problems from the OpenAI Gymnasium.
44
 
@@ -68,29 +82,30 @@ python3 MonteCarloAgent.py --test policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.
68
  **MC Usage**
69
 
70
  ```bash
71
- usage: MonteCarloAgent.py [-h] [--train] [--test TEST] [--n_train_episodes N_TRAIN_EPISODES] [--n_test_episodes N_TEST_EPISODES] [--test_every TEST_EVERY] [--max_steps MAX_STEPS] [--update_type {first_visit,every_visit}]
72
- [--save_dir SAVE_DIR] [--no_save] [--gamma GAMMA] [--epsilon EPSILON] [--env ENV] [--render_mode RENDER_MODE] [--wandb_project WANDB_PROJECT] [--wandb_group WANDB_GROUP]
73
- [--wandb_job_type WANDB_JOB_TYPE] [--wandb_run_name_suffix WANDB_RUN_NAME_SUFFIX]
74
 
75
  options:
76
  -h, --help show this help message and exit
77
  --train Use this flag to train the agent.
78
  --test TEST Use this flag to test the agent. Provide the path to the policy file.
79
  --n_train_episodes N_TRAIN_EPISODES
80
- The number of episodes to train for. (default: 2000)
81
  --n_test_episodes N_TEST_EPISODES
82
  The number of episodes to test for. (default: 100)
83
  --test_every TEST_EVERY
84
  During training, test the agent every n episodes. (default: 100)
85
  --max_steps MAX_STEPS
86
- The maximum number of steps per episode before the episode is forced to end. (default: 500)
87
  --update_type {first_visit,every_visit}
88
  The type of update to use. (default: first_visit)
89
  --save_dir SAVE_DIR The directory to save the policy to. (default: policies)
90
  --no_save Use this flag to disable saving the policy.
91
- --gamma GAMMA The value for the discount factor to use. (default: 0.99)
92
- --epsilon EPSILON The value for the epsilon-greedy policy to use. (default: 0.1)
93
- --env ENV The Gymnasium environment to use. (default: CliffWalking-v0)
 
94
  --render_mode RENDER_MODE
95
  Render mode passed to the gym.make() function. Use 'human' to render the environment. (default: None)
96
  --wandb_project WANDB_PROJECT
 
15
 
16
  Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
17
 
18
+ # 1. Requirements
19
+
20
+ Python 3.6+ with the following major dependencies:
21
 
 
22
  - Gymnasium: <https://pypi.org/project/gymnasium/>
23
  - WandB: <https://pypi.org/project/wandb/> (for logging)
24
  - Gradio: <https://pypi.org/project/gradio/> (for demo web app)
25
 
26
+ Install all the dependencies using `pip`:
27
 
28
+ ```bash
29
+ ❯ pip3 install -r requirements.txt
30
+ ```
31
+
32
+ # 2. Interactive Demo
33
+
34
+ Launch the Gradio demo web app:
35
+
36
+ ```bash
37
+ ❯ python3 demo.py
38
+ Running on local URL: http://127.0.0.1:7860
39
+ ```
40
+
41
+ <img src="./assets/gradio_demo.png" height="600" />
42
 
43
+ # 2. Agents
44
 
45
+ ## Dynamic-Programming Agent
46
 
47
  TODO
48
 
 
52
  TODO
53
  ```
54
 
55
+ ## Monte-Carlo Agent
56
 
57
  This is the implementation of an On-Policy Monte-Carlo agent to solve several toy problems from the OpenAI Gymnasium.
58
 
 
82
  **MC Usage**
83
 
84
  ```bash
85
+ usage: MonteCarloAgent.py [-h] [--train] [--test TEST] [--n_train_episodes N_TRAIN_EPISODES] [--n_test_episodes N_TEST_EPISODES] [--test_every TEST_EVERY] [--max_steps MAX_STEPS] [--update_type {first_visit,every_visit}] [--save_dir SAVE_DIR] [--no_save]
86
+ [--gamma GAMMA] [--epsilon EPSILON] [--env {CliffWalking-v0,FrozenLake-v1,Taxi-v3}] [--render_mode RENDER_MODE] [--wandb_project WANDB_PROJECT] [--wandb_group WANDB_GROUP] [--wandb_job_type WANDB_JOB_TYPE]
87
+ [--wandb_run_name_suffix WANDB_RUN_NAME_SUFFIX]
88
 
89
  options:
90
  -h, --help show this help message and exit
91
  --train Use this flag to train the agent.
92
  --test TEST Use this flag to test the agent. Provide the path to the policy file.
93
  --n_train_episodes N_TRAIN_EPISODES
94
+ The number of episodes to train for. (default: 2500)
95
  --n_test_episodes N_TEST_EPISODES
96
  The number of episodes to test for. (default: 100)
97
  --test_every TEST_EVERY
98
  During training, test the agent every n episodes. (default: 100)
99
  --max_steps MAX_STEPS
100
+ The maximum number of steps per episode before the episode is forced to end. (default: 200)
101
  --update_type {first_visit,every_visit}
102
  The type of update to use. (default: first_visit)
103
  --save_dir SAVE_DIR The directory to save the policy to. (default: policies)
104
  --no_save Use this flag to disable saving the policy.
105
+ --gamma GAMMA The value for the discount factor to use. (default: 1.0)
106
+ --epsilon EPSILON The value for the epsilon-greedy policy to use. (default: 0.4)
107
+ --env {CliffWalking-v0,FrozenLake-v1,Taxi-v3}
108
+ The Gymnasium environment to use. (default: CliffWalking-v0)
109
  --render_mode RENDER_MODE
110
  Render mode passed to the gym.make() function. Use 'human' to render the environment. (default: None)
111
  --wandb_project WANDB_PROJECT
assets/gradio_demo.png ADDED
demo.py CHANGED
@@ -298,7 +298,7 @@ def run(policy_fname, n_test_episodes, max_steps, render_fps, epsilon):
298
 
299
  with gr.Blocks(title="CS581 Demo") as demo:
300
  gr.components.HTML(
301
- "<h1>CS581 Final Project Demo - Reinforcement Learning: From Dynamic Programming to Monte-Carlo</h1>"
302
  )
303
 
304
  gr.components.HTML("<h2>Select Configuration:</h2>")
 
298
 
299
  with gr.Blocks(title="CS581 Demo") as demo:
300
  gr.components.HTML(
301
+ "<h1>CS581 Final Project Demo - Dynamic Programming & Monte-Carlo RL Methods</h1>"
302
  )
303
 
304
  gr.components.HTML("<h2>Select Configuration:</h2>")