Andrei Cozma commited on
Commit
0c8eecb
·
1 Parent(s): 879176c
MonteCarloAgent.py CHANGED
@@ -116,7 +116,6 @@ class MonteCarloAgent:
116
 
117
  if e % test_every == 0:
118
  test_success_rate = self.test(verbose=False, **kwargs)
119
-
120
  if log_wandb:
121
  self.wandb_log_img(episode=e)
122
 
 
116
 
117
  if e % test_every == 0:
118
  test_success_rate = self.test(verbose=False, **kwargs)
 
119
  if log_wandb:
120
  self.wandb_log_img(episode=e)
121
 
README.md CHANGED
@@ -4,9 +4,15 @@
4
 
5
  Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
6
 
 
 
 
 
 
 
7
  ## Monte-Carlo Agent
8
 
9
- The implementation of the epsilon-greedy Monte-Carlo agent for the [Cliff Walking](https://gymnasium.farama.org/environments/toy_text/cliff_walking/) toy environment.
10
 
11
  ### Training
12
 
 
4
 
5
  Evolution of Reinforcement Learning methods from pure Dynamic Programming-based methods to Monte Carlo methods + Bellman Optimization Comparison
6
 
7
+ ## Requirements
8
+
9
+ - Python 3
10
+ - Gymnasium: <https://pypi.org/project/gymnasium/>
11
+ - WandB: <https://pypi.org/project/wandb/> (optional for logging)
12
+
13
  ## Monte-Carlo Agent
14
 
15
+ The implementation of the epsilon-greedy Monte-Carlo agent for the [Cliff Walking](https://gymnasium.farama.org/environments/toy_text/cliff_walking/) toy environment as part of Gymnasium.
16
 
17
  ### Training
18
 
policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.1.npy CHANGED
Binary files a/policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.1.npy and b/policy_mc_CliffWalking-v0_e2000_s500_g0.99_e0.1.npy differ