PPO playing BreakoutNoFrameskip-v4 from https://github.com/sgoodfriend/rl-algo-impls/tree/e47a44c4d891f48885af0b1605b30d19fc67b5af

Browse files

Files changed (6) hide show

README.md +12 -12
compare_runs.py +11 -7
huggingface_publish.py +8 -7
replay.meta.json +1 -1
replay.mp4 +0 -0
saved_models/ppo-impala-BreakoutNoFrameskip-v4-S3-best/model.pth +3 -0

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 library_name: rl-algo-impls
 tags:
-- impala-BreakoutNoFrameskip-v4
 - ppo
 - deep-reinforcement-learning
 - reinforcement-learning
@@ -10,18 +10,18 @@ model-index:
   results:
   - metrics:
     - type: mean_reward
-      value: 500.06 +/- 159.35
       name: mean_reward
     task:
       type: reinforcement-learning
       name: reinforcement-learning
     dataset:
-      name: impala-BreakoutNoFrameskip-v4
-      type: impala-BreakoutNoFrameskip-v4
 ---
-# **PPO** Agent playing **impala-BreakoutNoFrameskip-v4**
-This is a trained model of a **PPO** agent playing **impala-BreakoutNoFrameskip-v4** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
 All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/v4wd7cp5.
@@ -31,9 +31,9 @@ This model was trained from 3 trainings of **PPO** agents using different initia
 | algo   | env                    |   seed |   reward_mean |   reward_std |   eval_episodes | best   | wandb_url                                                                    |
 |:-------|:-----------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
-| ppo    | BreakoutNoFrameskip-v4 |      1 |       465.062 |      169.816 |              16 |        | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/ntpe1h0y) |
-| ppo    | BreakoutNoFrameskip-v4 |      2 |       500.062 |      159.348 |              16 | *      | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/olgzm7mt) |
-| ppo    | BreakoutNoFrameskip-v4 |      3 |       496.5   |      191.999 |              16 |        | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/kxf84p5u) |
 ### Prerequisites: Weights & Biases (WandB)
@@ -56,7 +56,7 @@ results. You might need to checkout the commit the agent was trained on:
 [e47a44c](https://github.com/sgoodfriend/rl-algo-impls/tree/e47a44c4d891f48885af0b1605b30d19fc67b5af).
 ```
 # Downloads the model, sets hyperparameters, and runs agent for 3 episodes
-python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/olgzm7mt
 ```
 Setup hasn't been completely worked out yet, so you might be best served by using Google
@@ -72,7 +72,7 @@ commit the agent was trained on: [e47a44c](https://github.com/sgoodfriend/rl-alg
 training is deterministic, different hardware will give different results.
 ```
-python train.py --algo ppo --env impala-BreakoutNoFrameskip-v4 --seed 2
 ```
 Setup hasn't been completely worked out yet, so you might be best served by using Google
@@ -133,7 +133,7 @@ policy_hyperparams:
   cnn_layers_init_orthogonal: false
   cnn_style: impala
   init_layers_orthogonal: true
-seed: 2
 use_deterministic_algorithms: true
 wandb_entity: null
 wandb_project_name: rl-algo-impls-benchmarks

 ---
 library_name: rl-algo-impls
 tags:
+- BreakoutNoFrameskip-v4
 - ppo
 - deep-reinforcement-learning
 - reinforcement-learning
   results:
   - metrics:
     - type: mean_reward
+      value: 516.88 +/- 155.01
       name: mean_reward
     task:
       type: reinforcement-learning
       name: reinforcement-learning
     dataset:
+      name: BreakoutNoFrameskip-v4
+      type: BreakoutNoFrameskip-v4
 ---
+# **PPO** Agent playing **BreakoutNoFrameskip-v4**
+This is a trained model of a **PPO** agent playing **BreakoutNoFrameskip-v4** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
 All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/v4wd7cp5.
 | algo   | env                    |   seed |   reward_mean |   reward_std |   eval_episodes | best   | wandb_url                                                                    |
 |:-------|:-----------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
+| ppo    | BreakoutNoFrameskip-v4 |      1 |       502.562 |     161.406  |              16 |        | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/ntpe1h0y) |
+| ppo    | BreakoutNoFrameskip-v4 |      2 |       426.562 |      85.8509 |              16 |        | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/olgzm7mt) |
+| ppo    | BreakoutNoFrameskip-v4 |      3 |       516.875 |     155.012  |              16 | *      | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/kxf84p5u) |
 ### Prerequisites: Weights & Biases (WandB)
 [e47a44c](https://github.com/sgoodfriend/rl-algo-impls/tree/e47a44c4d891f48885af0b1605b30d19fc67b5af).
 ```
 # Downloads the model, sets hyperparameters, and runs agent for 3 episodes
+python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/kxf84p5u
 ```
 Setup hasn't been completely worked out yet, so you might be best served by using Google
 training is deterministic, different hardware will give different results.
 ```
+python train.py --algo ppo --env BreakoutNoFrameskip-v4 --seed 3
 ```
 Setup hasn't been completely worked out yet, so you might be best served by using Google
   cnn_layers_init_orthogonal: false
   cnn_style: impala
   init_layers_orthogonal: true
+seed: 3
 use_deterministic_algorithms: true
 wandb_entity: null
 wandb_project_name: rl-algo-impls-benchmarks

compare_runs.py CHANGED Viewed

@@ -143,16 +143,16 @@ if __name__ == "__main__":
         help="WandB tag for experiment commit (i.e. benchmark_5540e1f)",
     )
     parser.add_argument(
-        "--exclude_envs",
         type=str,
         nargs="*",
         help="Environments to exclude from comparison",
     )
     # parser.set_defaults(
-    #     wandb_hostname_tag=["host_192-9-145-26"],
-    #     wandb_control_tag=["benchmark_e4d1ed6", "benchmark_5598ebc"],
-    #     wandb_experiment_tag=["benchmark_680043d", "benchmark_5540e1f"],
-    #     exclude_envs=["CarRacing-v0"],
     # )
     args = parser.parse_args()
     print(args)
@@ -166,15 +166,19 @@ if __name__ == "__main__":
     runs_by_run_group: Dict[RunGroup, RunGroupRuns] = {}
     wandb_hostname_tags = set(args.wandb_hostname_tag)
     for r in all_runs:
         wandb_tags = set(r.config.get("wandb_tags", []))
         if not wandb_tags or not wandb_hostname_tags & wandb_tags:
             continue
-        rg = RunGroup(r.config["algo"], r.config["env"])
         if args.exclude_envs and rg.env_id in args.exclude_envs:
             continue
         if rg not in runs_by_run_group:
             runs_by_run_group[rg] = RunGroupRuns(
-                rg, args.wandb_control_tag, args.wandb_experiment_tag
             )
         runs_by_run_group[rg].add_run(r)
     df = RunGroupRuns.data_frame(runs_by_run_group.values()).round(decimals=2)

         help="WandB tag for experiment commit (i.e. benchmark_5540e1f)",
     )
     parser.add_argument(
+        "--exclude-envs",
         type=str,
         nargs="*",
         help="Environments to exclude from comparison",
     )
     # parser.set_defaults(
+    #     wandb_hostname_tag=["host_150-230-44-105", "host_155-248-214-128"],
+    #     wandb_control_tag=["benchmark_fbc943f"],
+    #     wandb_experiment_tag=["benchmark_f59bf74"],
+    #     exclude_envs=[],
     # )
     args = parser.parse_args()
     print(args)
     runs_by_run_group: Dict[RunGroup, RunGroupRuns] = {}
     wandb_hostname_tags = set(args.wandb_hostname_tag)
     for r in all_runs:
+        if r.state != "finished":
+            continue
         wandb_tags = set(r.config.get("wandb_tags", []))
         if not wandb_tags or not wandb_hostname_tags & wandb_tags:
             continue
+        rg = RunGroup(r.config["algo"], r.config.get("env_id") or r.config["env"])
         if args.exclude_envs and rg.env_id in args.exclude_envs:
             continue
         if rg not in runs_by_run_group:
             runs_by_run_group[rg] = RunGroupRuns(
+                rg,
+                args.wandb_control_tag,
+                args.wandb_experiment_tag,
             )
         runs_by_run_group[rg].add_run(r)
     df = RunGroupRuns.data_frame(runs_by_run_group.values()).round(decimals=2)

huggingface_publish.py CHANGED Viewed

@@ -38,12 +38,12 @@ def publish(
     api = wandb.Api()
     runs = [api.run(rp) for rp in wandb_run_paths]
     algo = runs[0].config["algo"]
-    env = runs[0].config["env"]
     evaluations = [
         evaluate_model(
             EvalArgs(
                 algo,
-                env,
                 seed=r.config.get("seed", None),
                 render=False,
                 best=True,
@@ -80,9 +80,10 @@ def publish(
         github_url = "https://github.com/sgoodfriend/rl-algo-impls"
         commit_hash = run_metadata.get("git", {}).get("commit", None)
         card_text = model_card_text(
             algo,
-            env,
             github_url,
             commit_hash,
             wandb_report_url,
@@ -97,7 +98,7 @@ def publish(
         metadata = {
             "library_name": "rl-algo-impls",
             "tags": [
-                env,
                 algo,
                 "deep-reinforcement-learning",
                 "reinforcement-learning",
@@ -119,8 +120,8 @@ def publish(
                                 "name": "reinforcement-learning",
                             },
                             "dataset": {
-                                "name": env,
-                                "type": env,
                             },
                         }
                     ],
@@ -159,7 +160,7 @@ def publish(
             repo_id=huggingface_repo,
             folder_path=repo_dir_path,
             path_in_repo="",
-            commit_message=f"{algo.upper()} playing {env} from {github_url}/tree/{commit_hash}",
             token=huggingface_token,
         )
         print(f"Pushed model to the hub: {repo_url}")

     api = wandb.Api()
     runs = [api.run(rp) for rp in wandb_run_paths]
     algo = runs[0].config["algo"]
+    hyperparam_id = runs[0].config["env"]
     evaluations = [
         evaluate_model(
             EvalArgs(
                 algo,
+                hyperparam_id,
                 seed=r.config.get("seed", None),
                 render=False,
                 best=True,
         github_url = "https://github.com/sgoodfriend/rl-algo-impls"
         commit_hash = run_metadata.get("git", {}).get("commit", None)
+        env_id = runs[0].config.get("env_id") or runs[0].config["env"]
         card_text = model_card_text(
             algo,
+            env_id,
             github_url,
             commit_hash,
             wandb_report_url,
         metadata = {
             "library_name": "rl-algo-impls",
             "tags": [
+                env_id,
                 algo,
                 "deep-reinforcement-learning",
                 "reinforcement-learning",
                                 "name": "reinforcement-learning",
                             },
                             "dataset": {
+                                "name": env_id,
+                                "type": env_id,
                             },
                         }
                     ],
             repo_id=huggingface_repo,
             folder_path=repo_dir_path,
             path_in_repo="",
+            commit_message=f"{algo.upper()} playing {env_id} from {github_url}/tree/{commit_hash}",
             token=huggingface_token,
         )
         print(f"Pushed model to the hub: {repo_url}")

replay.meta.json CHANGED Viewed

@@ -1 +1 @@

- {"content_type": "video/mp4", "encoder_version": {"backend": "ffmpeg", "version": "b'ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers\\nbuilt with clang version 14.0.6\\nconfiguration: --prefix=/Users/runner/miniforge3/conda-bld/ffmpeg_1671040513231/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl --cc=arm64-apple-darwin20.0.0-clang --cxx=arm64-apple-darwin20.0.0-clang++ --nm=arm64-apple-darwin20.0.0-nm --ar=arm64-apple-darwin20.0.0-ar --disable-doc --disable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-cross-compile --arch=arm64 --target-os=darwin --cross-prefix=arm64-apple-darwin20.0.0- --host-cc=/Users/runner/miniforge3/conda-bld/ffmpeg_1671040513231/_build_env/bin/x86_64-apple-darwin13.4.0-clang --enable-neon --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-pthreads --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --pkg-config=/Users/runner/miniforge3/conda-bld/ffmpeg_1671040513231/_build_env/bin/pkg-config\\nlibavutil 57. 28.100 / 57. 28.100\\nlibavcodec 59. 37.100 / 59. 37.100\\nlibavformat 59. 27.100 / 59. 27.100\\nlibavdevice 59. 7.100 / 59. 7.100\\nlibavfilter 8. 44.100 / 8. 44.100\\nlibswscale 6. 7.100 / 6. 7.100\\nlibswresample 4. 7.100 / 4. 7.100\\nlibpostproc 56. 6.100 / 56. 6.100\\n'", "cmdline": ["ffmpeg", "-nostats", "-loglevel", "error", "-y", "-f", "rawvideo", "-s:v", "160x210", "-pix_fmt", "rgb24", "-framerate", "30", "-i", "-", "-vf", "scale=trunc(iw/2)*2:trunc(ih/2)*2", "-vcodec", "libx264", "-pix_fmt", "yuv420p", "-r", "30", "/var/folders/9g/my5557_91xddp6lx00nkzly80000gn/T/~~tmp3swq8wel~~/ppo-impala-BreakoutNoFrameskip-v4/replay.mp4"]}}

+ {"content_type": "video/mp4", "encoder_version": {"backend": "ffmpeg", "version": "b'ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers\\nbuilt with clang version 14.0.6\\nconfiguration: --prefix=/Users/runner/miniforge3/conda-bld/ffmpeg_1671040513231/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl --cc=arm64-apple-darwin20.0.0-clang --cxx=arm64-apple-darwin20.0.0-clang++ --nm=arm64-apple-darwin20.0.0-nm --ar=arm64-apple-darwin20.0.0-ar --disable-doc --disable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-cross-compile --arch=arm64 --target-os=darwin --cross-prefix=arm64-apple-darwin20.0.0- --host-cc=/Users/runner/miniforge3/conda-bld/ffmpeg_1671040513231/_build_env/bin/x86_64-apple-darwin13.4.0-clang --enable-neon --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-pthreads --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --pkg-config=/Users/runner/miniforge3/conda-bld/ffmpeg_1671040513231/_build_env/bin/pkg-config\\nlibavutil 57. 28.100 / 57. 28.100\\nlibavcodec 59. 37.100 / 59. 37.100\\nlibavformat 59. 27.100 / 59. 27.100\\nlibavdevice 59. 7.100 / 59. 7.100\\nlibavfilter 8. 44.100 / 8. 44.100\\nlibswscale 6. 7.100 / 6. 7.100\\nlibswresample 4. 7.100 / 4. 7.100\\nlibpostproc 56. 6.100 / 56. 6.100\\n'", "cmdline": ["ffmpeg", "-nostats", "-loglevel", "error", "-y", "-f", "rawvideo", "-s:v", "160x210", "-pix_fmt", "rgb24", "-framerate", "30", "-i", "-", "-vf", "scale=trunc(iw/2)*2:trunc(ih/2)*2", "-vcodec", "libx264", "-pix_fmt", "yuv420p", "-r", "30", "/var/folders/9g/my5557_91xddp6lx00nkzly80000gn/T/tmpvd_d2os3/ppo-impala-BreakoutNoFrameskip-v4/replay.mp4"]}, "episode": {"r": 421.0, "l": 9640, "t": 32.27043}}

replay.mp4 CHANGED Viewed

Binary files a/replay.mp4 and b/replay.mp4 differ

saved_models/ppo-impala-BreakoutNoFrameskip-v4-S3-best/model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d54070e19010576c52576c90332a2427c3a23e91e23db7e2e6838c96c4ba45ed
+size 4376749