Spaces:

acozma
/

CS581-Algos-Demo

Sleeping

App Files Files Community

Andrei Cozma commited on Apr 19, 2023

Commit

cb18290

1 Parent(s): fe4182f

Updates

Browse files

Files changed (3) hide show

MonteCarloAgent.py +10 -10
README.md +57 -0
policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.1.npy +0 -0

MonteCarloAgent.py CHANGED Viewed

@@ -178,7 +178,7 @@ def main():
     parser.add_argument(
         "--train",
         action="store_true",
-        help="Use this flag to train the agent. (default: False)",
     )
     parser.add_argument(
         "--test",
@@ -190,26 +190,26 @@ def main():
         "--n_train_episodes",
         type=int,
         default=2000,
-        help="The number of episodes to train for.",
     )
     parser.add_argument(
         "--n_test_episodes",
         type=int,
         default=100,
-        help="The number of episodes to test for.",
     )
     parser.add_argument(
         "--test_every",
         type=int,
         default=100,
-        help="During training, test the agent every n episodes.",
     )
     parser.add_argument(
         "--max_steps",
         type=int,
         default=500,
-        help="The maximum number of steps per episode before the episode is forced to end.",
     )
     ### Agent parameters
@@ -217,13 +217,13 @@ def main():
         "--gamma",
         type=float,
         default=0.99,
-        help="The value for the discount factor to use.",
     )
     parser.add_argument(
         "--epsilon",
         type=float,
         default=0.1,
-        help="The value for the epsilon-greedy policy to use.",
     )
     ### Environment parameters
@@ -231,19 +231,19 @@ def main():
         "--env",
         type=str,
         default="CliffWalking-v0",
-        help="The Gymnasium environment to use.",
     )
     parser.add_argument(
         "--render_mode",
         type=str,
         default=None,
-        help="The render mode to use. By default, no rendering is done. To render the environment, set this to 'human'.",
     )
     parser.add_argument(
         "--wandb_project",
         type=str,
         default=None,
-        help="WandB project name for logging. If not provided, no logging is done.",
     )
     parser.add_argument(
         "--wandb_group",

     parser.add_argument(
         "--train",
         action="store_true",
+        help="Use this flag to train the agent.",
     )
     parser.add_argument(
         "--test",
         "--n_train_episodes",
         type=int,
         default=2000,
+        help="The number of episodes to train for. (default: 2000)",
     )
     parser.add_argument(
         "--n_test_episodes",
         type=int,
         default=100,
+        help="The number of episodes to test for. (default: 100)",
     )
     parser.add_argument(
         "--test_every",
         type=int,
         default=100,
+        help="During training, test the agent every n episodes. (default: 100)",
     )
     parser.add_argument(
         "--max_steps",
         type=int,
         default=500,
+        help="The maximum number of steps per episode before the episode is forced to end. (default: 500)",
     )
     ### Agent parameters
         "--gamma",
         type=float,
         default=0.99,
+        help="The value for the discount factor to use. (default: 0.99)",
     )
     parser.add_argument(
         "--epsilon",
         type=float,
         default=0.1,
+        help="The value for the epsilon-greedy policy to use. (default: 0.1)",
     )
     ### Environment parameters
         "--env",
         type=str,
         default="CliffWalking-v0",
+        help="The Gymnasium environment to use. (default: CliffWalking-v0)",
     )
     parser.add_argument(
         "--render_mode",
         type=str,
         default=None,
+        help="Render mode passed to the gym.make() function. Use 'human' to render the environment. (default: None)",
     )
     parser.add_argument(
         "--wandb_project",
         type=str,
         default=None,
+        help="WandB project name for logging. If not provided, no logging is done. (default: None)",
     )
     parser.add_argument(
         "--wandb_group",

README.md CHANGED Viewed

@@ -4,6 +4,63 @@
 Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
 ## Presentation Guide (Text Version)
 1. Title Slide: list the title of your talk along with your name

 Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
+## Monte-Carlo Agent
+The implementation of the epsilon-greedy Monte-Carlo agent for the [Cliff Walking](https://gymnasium.farama.org/environments/toy_text/cliff_walking/) toy environment.
+### Training
+```bash
+python3 MonteCarloAgent.py --train
+```
+The final policy will be saved to a `.npy` file.
+### Testing
+Provide the path to the policy file as an argument to the `--test` flag.
+```bash
+python3 MonteCarloAgent.py --test policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.1.npy
+```
+### Visualization
+```bash
+python3 MonteCarloAgent.py --test policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.1.npy --render_mode human
+```
+### Default Parameters
+```python
+usage: MonteCarloAgent.py [-h] [--train] [--test TEST] [--n_train_episodes N_TRAIN_EPISODES] [--n_test_episodes N_TEST_EPISODES] [--test_every TEST_EVERY] [--max_steps MAX_STEPS] [--gamma GAMMA] [--epsilon EPSILON] [--env ENV]
+                          [--render_mode RENDER_MODE] [--wandb_project WANDB_PROJECT] [--wandb_group WANDB_GROUP] [--wandb_job_type WANDB_JOB_TYPE]
+options:
+  -h, --help            show this help message and exit
+  --train               Use this flag to train the agent. (default: False)
+  --test TEST           Use this flag to test the agent. Provide the path to the policy file.
+  --n_train_episodes N_TRAIN_EPISODES
+                        The number of episodes to train for.
+  --n_test_episodes N_TEST_EPISODES
+                        The number of episodes to test for.
+  --test_every TEST_EVERY
+                        During training, test the agent every n episodes.
+  --max_steps MAX_STEPS
+                        The maximum number of steps per episode before the episode is forced to end.
+  --gamma GAMMA         The value for the discount factor to use.
+  --epsilon EPSILON     The value for the epsilon-greedy policy to use.
+  --env ENV             The Gymnasium environment to use.
+  --render_mode RENDER_MODE
+                        The render mode to use. By default, no rendering is done. To render the environment, set this to 'human'.
+  --wandb_project WANDB_PROJECT
+                        WandB project name for logging. If not provided, no logging is done.
+  --wandb_group WANDB_GROUP
+                        WandB group name for logging. (default: monte-carlo)
+  --wandb_job_type WANDB_JOB_TYPE
+                        WandB job type for logging. (default: train)
+```
 ## Presentation Guide (Text Version)
 1. Title Slide: list the title of your talk along with your name

policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.1.npy ADDED Viewed

Binary file (1.66 kB). View file