Spaces:

acozma
/

CS581-Algos-Demo

Sleeping

App Files Files Community

Andrei Cozma commited on Apr 22, 2023

Commit

6e58943

1 Parent(s): 45dcb54

Updates

Browse files

Files changed (3) hide show

README.md +30 -15
assets/gradio_demo.png +0 -0
demo.py +1 -1

README.md CHANGED Viewed

@@ -15,20 +15,34 @@ pinned: true
 Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
-## Requirements
-- Python 3
 - Gymnasium: <https://pypi.org/project/gymnasium/>
 - WandB: <https://pypi.org/project/wandb/> (for logging)
 - Gradio: <https://pypi.org/project/gradio/> (for demo web app)
-## Interactive Demo
-TODO
-## 2. Agents
-### Dynamic-Programming Agent
 TODO
@@ -38,7 +52,7 @@ TODO
 TODO
 ```
-### Monte-Carlo Agent
 This is the implementation of an On-Policy Monte-Carlo agent to solve several toy problems from the OpenAI Gymnasium.
@@ -68,29 +82,30 @@ python3 MonteCarloAgent.py --test policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.
 **MC Usage**
 ```bash
-usage: MonteCarloAgent.py [-h] [--train] [--test TEST] [--n_train_episodes N_TRAIN_EPISODES] [--n_test_episodes N_TEST_EPISODES] [--test_every TEST_EVERY] [--max_steps MAX_STEPS] [--update_type {first_visit,every_visit}]
-                          [--save_dir SAVE_DIR] [--no_save] [--gamma GAMMA] [--epsilon EPSILON] [--env ENV] [--render_mode RENDER_MODE] [--wandb_project WANDB_PROJECT] [--wandb_group WANDB_GROUP]
-                          [--wandb_job_type WANDB_JOB_TYPE] [--wandb_run_name_suffix WANDB_RUN_NAME_SUFFIX]
 options:
   -h, --help            show this help message and exit
   --train               Use this flag to train the agent.
   --test TEST           Use this flag to test the agent. Provide the path to the policy file.
   --n_train_episodes N_TRAIN_EPISODES
-                        The number of episodes to train for. (default: 2000)
   --n_test_episodes N_TEST_EPISODES
                         The number of episodes to test for. (default: 100)
   --test_every TEST_EVERY
                         During training, test the agent every n episodes. (default: 100)
   --max_steps MAX_STEPS
-                        The maximum number of steps per episode before the episode is forced to end. (default: 500)
   --update_type {first_visit,every_visit}
                         The type of update to use. (default: first_visit)
   --save_dir SAVE_DIR   The directory to save the policy to. (default: policies)
   --no_save             Use this flag to disable saving the policy.
-  --gamma GAMMA         The value for the discount factor to use. (default: 0.99)
-  --epsilon EPSILON     The value for the epsilon-greedy policy to use. (default: 0.1)
-  --env ENV             The Gymnasium environment to use. (default: CliffWalking-v0)
   --render_mode RENDER_MODE
                         Render mode passed to the gym.make() function. Use 'human' to render the environment. (default: None)
   --wandb_project WANDB_PROJECT

 Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
+# 1. Requirements
+Python 3.6+ with the following major dependencies:
 - Gymnasium: <https://pypi.org/project/gymnasium/>
 - WandB: <https://pypi.org/project/wandb/> (for logging)
 - Gradio: <https://pypi.org/project/gradio/> (for demo web app)
+Install all the dependencies using `pip`:
+```bash
+❯ pip3 install -r requirements.txt
+```
+# 2. Interactive Demo
+Launch the Gradio demo web app:
+```bash
+❯ python3 demo.py
+Running on local URL:  http://127.0.0.1:7860
+```
+<img src="./assets/gradio_demo.png"  height="600" />
+# 2. Agents
+## Dynamic-Programming Agent
 TODO
 TODO
 ```
+## Monte-Carlo Agent
 This is the implementation of an On-Policy Monte-Carlo agent to solve several toy problems from the OpenAI Gymnasium.
 **MC Usage**
 ```bash
+usage: MonteCarloAgent.py [-h] [--train] [--test TEST] [--n_train_episodes N_TRAIN_EPISODES] [--n_test_episodes N_TEST_EPISODES] [--test_every TEST_EVERY] [--max_steps MAX_STEPS] [--update_type {first_visit,every_visit}] [--save_dir SAVE_DIR] [--no_save]
+                          [--gamma GAMMA] [--epsilon EPSILON] [--env {CliffWalking-v0,FrozenLake-v1,Taxi-v3}] [--render_mode RENDER_MODE] [--wandb_project WANDB_PROJECT] [--wandb_group WANDB_GROUP] [--wandb_job_type WANDB_JOB_TYPE]
+                          [--wandb_run_name_suffix WANDB_RUN_NAME_SUFFIX]
 options:
   -h, --help            show this help message and exit
   --train               Use this flag to train the agent.
   --test TEST           Use this flag to test the agent. Provide the path to the policy file.
   --n_train_episodes N_TRAIN_EPISODES
+                        The number of episodes to train for. (default: 2500)
   --n_test_episodes N_TEST_EPISODES
                         The number of episodes to test for. (default: 100)
   --test_every TEST_EVERY
                         During training, test the agent every n episodes. (default: 100)
   --max_steps MAX_STEPS
+                        The maximum number of steps per episode before the episode is forced to end. (default: 200)
   --update_type {first_visit,every_visit}
                         The type of update to use. (default: first_visit)
   --save_dir SAVE_DIR   The directory to save the policy to. (default: policies)
   --no_save             Use this flag to disable saving the policy.
+  --gamma GAMMA         The value for the discount factor to use. (default: 1.0)
+  --epsilon EPSILON     The value for the epsilon-greedy policy to use. (default: 0.4)
+  --env {CliffWalking-v0,FrozenLake-v1,Taxi-v3}
+                        The Gymnasium environment to use. (default: CliffWalking-v0)
   --render_mode RENDER_MODE
                         Render mode passed to the gym.make() function. Use 'human' to render the environment. (default: None)
   --wandb_project WANDB_PROJECT

assets/gradio_demo.png ADDED Viewed

demo.py CHANGED Viewed

@@ -298,7 +298,7 @@ def run(policy_fname, n_test_episodes, max_steps, render_fps, epsilon):
 with gr.Blocks(title="CS581 Demo") as demo:
     gr.components.HTML(
-        "<h1>CS581 Final Project Demo - Reinforcement Learning: From Dynamic Programming to Monte-Carlo</h1>"
     )
     gr.components.HTML("<h2>Select Configuration:</h2>")

 with gr.Blocks(title="CS581 Demo") as demo:
     gr.components.HTML(
+        "<h1>CS581 Final Project Demo - Dynamic Programming & Monte-Carlo RL Methods</h1>"
     )
     gr.components.HTML("<h2>Select Configuration:</h2>")