Spaces:
Sleeping
Sleeping
Andrei Cozma
commited on
Commit
·
6e58943
1
Parent(s):
45dcb54
Updates
Browse files- README.md +30 -15
- assets/gradio_demo.png +0 -0
- demo.py +1 -1
README.md
CHANGED
@@ -15,20 +15,34 @@ pinned: true
|
|
15 |
|
16 |
Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
|
17 |
|
18 |
-
|
|
|
|
|
19 |
|
20 |
-
- Python 3
|
21 |
- Gymnasium: <https://pypi.org/project/gymnasium/>
|
22 |
- WandB: <https://pypi.org/project/wandb/> (for logging)
|
23 |
- Gradio: <https://pypi.org/project/gradio/> (for demo web app)
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
-
|
30 |
|
31 |
-
|
32 |
|
33 |
TODO
|
34 |
|
@@ -38,7 +52,7 @@ TODO
|
|
38 |
TODO
|
39 |
```
|
40 |
|
41 |
-
|
42 |
|
43 |
This is the implementation of an On-Policy Monte-Carlo agent to solve several toy problems from the OpenAI Gymnasium.
|
44 |
|
@@ -68,29 +82,30 @@ python3 MonteCarloAgent.py --test policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.
|
|
68 |
**MC Usage**
|
69 |
|
70 |
```bash
|
71 |
-
usage: MonteCarloAgent.py [-h] [--train] [--test TEST] [--n_train_episodes N_TRAIN_EPISODES] [--n_test_episodes N_TEST_EPISODES] [--test_every TEST_EVERY] [--max_steps MAX_STEPS] [--update_type {first_visit,every_visit}]
|
72 |
-
[--
|
73 |
-
[--
|
74 |
|
75 |
options:
|
76 |
-h, --help show this help message and exit
|
77 |
--train Use this flag to train the agent.
|
78 |
--test TEST Use this flag to test the agent. Provide the path to the policy file.
|
79 |
--n_train_episodes N_TRAIN_EPISODES
|
80 |
-
The number of episodes to train for. (default:
|
81 |
--n_test_episodes N_TEST_EPISODES
|
82 |
The number of episodes to test for. (default: 100)
|
83 |
--test_every TEST_EVERY
|
84 |
During training, test the agent every n episodes. (default: 100)
|
85 |
--max_steps MAX_STEPS
|
86 |
-
The maximum number of steps per episode before the episode is forced to end. (default:
|
87 |
--update_type {first_visit,every_visit}
|
88 |
The type of update to use. (default: first_visit)
|
89 |
--save_dir SAVE_DIR The directory to save the policy to. (default: policies)
|
90 |
--no_save Use this flag to disable saving the policy.
|
91 |
-
--gamma GAMMA The value for the discount factor to use. (default: 0
|
92 |
-
--epsilon EPSILON The value for the epsilon-greedy policy to use. (default: 0.
|
93 |
-
--env
|
|
|
94 |
--render_mode RENDER_MODE
|
95 |
Render mode passed to the gym.make() function. Use 'human' to render the environment. (default: None)
|
96 |
--wandb_project WANDB_PROJECT
|
|
|
15 |
|
16 |
Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
|
17 |
|
18 |
+
# 1. Requirements
|
19 |
+
|
20 |
+
Python 3.6+ with the following major dependencies:
|
21 |
|
|
|
22 |
- Gymnasium: <https://pypi.org/project/gymnasium/>
|
23 |
- WandB: <https://pypi.org/project/wandb/> (for logging)
|
24 |
- Gradio: <https://pypi.org/project/gradio/> (for demo web app)
|
25 |
|
26 |
+
Install all the dependencies using `pip`:
|
27 |
|
28 |
+
```bash
|
29 |
+
❯ pip3 install -r requirements.txt
|
30 |
+
```
|
31 |
+
|
32 |
+
# 2. Interactive Demo
|
33 |
+
|
34 |
+
Launch the Gradio demo web app:
|
35 |
+
|
36 |
+
```bash
|
37 |
+
❯ python3 demo.py
|
38 |
+
Running on local URL: http://127.0.0.1:7860
|
39 |
+
```
|
40 |
+
|
41 |
+
<img src="./assets/gradio_demo.png" height="600" />
|
42 |
|
43 |
+
# 2. Agents
|
44 |
|
45 |
+
## Dynamic-Programming Agent
|
46 |
|
47 |
TODO
|
48 |
|
|
|
52 |
TODO
|
53 |
```
|
54 |
|
55 |
+
## Monte-Carlo Agent
|
56 |
|
57 |
This is the implementation of an On-Policy Monte-Carlo agent to solve several toy problems from the OpenAI Gymnasium.
|
58 |
|
|
|
82 |
**MC Usage**
|
83 |
|
84 |
```bash
|
85 |
+
usage: MonteCarloAgent.py [-h] [--train] [--test TEST] [--n_train_episodes N_TRAIN_EPISODES] [--n_test_episodes N_TEST_EPISODES] [--test_every TEST_EVERY] [--max_steps MAX_STEPS] [--update_type {first_visit,every_visit}] [--save_dir SAVE_DIR] [--no_save]
|
86 |
+
[--gamma GAMMA] [--epsilon EPSILON] [--env {CliffWalking-v0,FrozenLake-v1,Taxi-v3}] [--render_mode RENDER_MODE] [--wandb_project WANDB_PROJECT] [--wandb_group WANDB_GROUP] [--wandb_job_type WANDB_JOB_TYPE]
|
87 |
+
[--wandb_run_name_suffix WANDB_RUN_NAME_SUFFIX]
|
88 |
|
89 |
options:
|
90 |
-h, --help show this help message and exit
|
91 |
--train Use this flag to train the agent.
|
92 |
--test TEST Use this flag to test the agent. Provide the path to the policy file.
|
93 |
--n_train_episodes N_TRAIN_EPISODES
|
94 |
+
The number of episodes to train for. (default: 2500)
|
95 |
--n_test_episodes N_TEST_EPISODES
|
96 |
The number of episodes to test for. (default: 100)
|
97 |
--test_every TEST_EVERY
|
98 |
During training, test the agent every n episodes. (default: 100)
|
99 |
--max_steps MAX_STEPS
|
100 |
+
The maximum number of steps per episode before the episode is forced to end. (default: 200)
|
101 |
--update_type {first_visit,every_visit}
|
102 |
The type of update to use. (default: first_visit)
|
103 |
--save_dir SAVE_DIR The directory to save the policy to. (default: policies)
|
104 |
--no_save Use this flag to disable saving the policy.
|
105 |
+
--gamma GAMMA The value for the discount factor to use. (default: 1.0)
|
106 |
+
--epsilon EPSILON The value for the epsilon-greedy policy to use. (default: 0.4)
|
107 |
+
--env {CliffWalking-v0,FrozenLake-v1,Taxi-v3}
|
108 |
+
The Gymnasium environment to use. (default: CliffWalking-v0)
|
109 |
--render_mode RENDER_MODE
|
110 |
Render mode passed to the gym.make() function. Use 'human' to render the environment. (default: None)
|
111 |
--wandb_project WANDB_PROJECT
|
assets/gradio_demo.png
ADDED
![]() |
demo.py
CHANGED
@@ -298,7 +298,7 @@ def run(policy_fname, n_test_episodes, max_steps, render_fps, epsilon):
|
|
298 |
|
299 |
with gr.Blocks(title="CS581 Demo") as demo:
|
300 |
gr.components.HTML(
|
301 |
-
"<h1>CS581 Final Project Demo -
|
302 |
)
|
303 |
|
304 |
gr.components.HTML("<h2>Select Configuration:</h2>")
|
|
|
298 |
|
299 |
with gr.Blocks(title="CS581 Demo") as demo:
|
300 |
gr.components.HTML(
|
301 |
+
"<h1>CS581 Final Project Demo - Dynamic Programming & Monte-Carlo RL Methods</h1>"
|
302 |
)
|
303 |
|
304 |
gr.components.HTML("<h2>Select Configuration:</h2>")
|