chirbard
/

ppo-Tetris-v5

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### The environment 🎮\n",
+        "\n",
+        "- https://gymnasium.farama.org/environments/classic_control/mountain_car/\n",
+        "\n",
+        "### The library used 📚\n",
+        "\n",
+        "- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/)"
+      ],
+      "metadata": {
+        "id": "x7oR6R-ZIbeS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "jeDAH0h0EBiG"
+      },
+      "source": [
+        "## Install dependencies and create a virtual screen 🔽\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!apt install swig cmake"
+      ],
+      "metadata": {
+        "id": "yQIGLPDkGhgG"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "9XaULfDZDvrC"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n",
+        "\n",
+        "Hence the following cell will install virtual screen libraries and create and run a virtual screen 🖥"
+      ],
+      "metadata": {
+        "id": "BEKeXQJsQCYm"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!sudo apt-get update\n",
+        "!sudo apt-get install -y python3-opengl\n",
+        "!apt install ffmpeg\n",
+        "!apt install xvfb\n",
+        "!pip3 install pyvirtualdisplay"
+      ],
+      "metadata": {
+        "id": "j5f2cGkdP-mb"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**"
+      ],
+      "metadata": {
+        "id": "TCwBTAwAW9JJ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import os\n",
+        "os.kill(os.getpid(), 9)"
+      ],
+      "metadata": {
+        "id": "cYvkbef7XEMi"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Virtual display\n",
+        "from pyvirtualdisplay import Display\n",
+        "\n",
+        "virtual_display = Display(visible=0, size=(1400, 900))\n",
+        "virtual_display.start()"
+      ],
+      "metadata": {
+        "id": "BE5JWP5rQIKf"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wrgpVFqyENVf"
+      },
+      "source": [
+        "## Import the packages 📦\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "cygWLPGsEQ0m"
+      },
+      "outputs": [],
+      "source": [
+        "import gymnasium\n",
+        "\n",
+        "from huggingface_sb3 import load_from_hub, package_to_hub\n",
+        "from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.\n",
+        "\n",
+        "from stable_baselines3 import PPO\n",
+        "from stable_baselines3.common.env_util import make_vec_env\n",
+        "from stable_baselines3.common.evaluation import evaluate_policy\n",
+        "from stable_baselines3.common.monitor import Monitor"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "w7vOFlpA_ONz"
+      },
+      "outputs": [],
+      "source": [
+        "import gymnasium as gym\n",
+        "\n",
+        "# First, we create our environment\n",
+        "env = gym.make(\"ALE/Tetris-v5\")\n",
+        "\n",
+        "# Then we reset this environment\n",
+        "observation, info = env.reset()\n",
+        "\n",
+        "for _ in range(20):\n",
+        "  # Take a random action\n",
+        "  action = env.action_space.sample()\n",
+        "  print(\"Action taken:\", action)\n",
+        "\n",
+        "  # Do this action in the environment and get\n",
+        "  # next_state, reward, terminated, truncated and info\n",
+        "  observation, reward, terminated, truncated, info = env.step(action)\n",
+        "\n",
+        "  # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n",
+        "  if terminated or truncated:\n",
+        "      # Reset the environment\n",
+        "      print(\"Environment is reset\")\n",
+        "      observation, info = env.reset()\n",
+        "\n",
+        "env.close()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "poLBgRocF9aT"
+      },
+      "source": [
+        "Let's see what the Environment looks like:\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ZNPG0g_UGCfh"
+      },
+      "outputs": [],
+      "source": [
+        "# We create our environment with gym.make(\"<name_of_the_environment>\")\n",
+        "env = gym.make(\"ALE/Tetris-v5\")\n",
+        "env.reset()\n",
+        "print(\"_____OBSERVATION SPACE_____ \\n\")\n",
+        "print(\"Observation Space Shape\", env.observation_space.shape)\n",
+        "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "We5WqOBGLoSm"
+      },
+      "outputs": [],
+      "source": [
+        "print(\"\\n _____ACTION SPACE_____ \\n\")\n",
+        "print(\"Action Space Shape\", env.action_space.n)\n",
+        "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dFD9RAFjG8aq"
+      },
+      "source": [
+        "#### Vectorized Environment\n",
+        "\n",
+        "- We create a vectorized environment (a method for stacking multiple independent environments into a single environment) of 16 environments, this way, **we'll have more diverse experiences during the training.**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "99hqQ_etEy1N"
+      },
+      "outputs": [],
+      "source": [
+        "# Create the environment\n",
+        "env = make_vec_env('ALE/Tetris-v5', n_envs=16)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "QAN7B0_HCVZC"
+      },
+      "source": [
+        "#### Model and hyperparameters"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "543OHYDfcjK4"
+      },
+      "outputs": [],
+      "source": [
+        "model = PPO(\n",
+        "    policy = 'MlpPolicy',\n",
+        "    env = env,\n",
+        "    n_steps = 1024,\n",
+        "    batch_size = 64,\n",
+        "    n_epochs = 4,\n",
+        "    gamma = 0.99,\n",
+        "    gae_lambda = 0.98,\n",
+        "    ent_coef = 0.01,\n",
+        "    verbose=1)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ClJJk88yoBUi"
+      },
+      "source": [
+        "## Train the PPO agent 🏃\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "poBCy9u_csyR"
+      },
+      "outputs": [],
+      "source": [
+        "model.learn(total_timesteps=100000)\n",
+        "# Save the model\n",
+        "model_name = \"Tetris-v5\"\n",
+        "model.save(model_name)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BqPKw3jt_pG5"
+      },
+      "source": [
+        "#### Evaluate"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "zpz8kHlt_a_m"
+      },
+      "outputs": [],
+      "source": [
+        "#@title\n",
+        "eval_env = Monitor(gym.make(\"ALE/Tetris-v5\"))\n",
+        "mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)\n",
+        "print(f\"mean_reward={mean_reward:.2f} +/- {std_reward}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### Upload to hub"
+      ],
+      "metadata": {
+        "id": "7YFBLHXDPuH5"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GZiFBBlzxzxY"
+      },
+      "outputs": [],
+      "source": [
+        "notebook_login()\n",
+        "!git config --global credential.helper store"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import gymnasium as gym\n",
+        "\n",
+        "from stable_baselines3 import PPO\n",
+        "from stable_baselines3.common.vec_env import DummyVecEnv\n",
+        "from stable_baselines3.common.env_util import make_vec_env\n",
+        "\n",
+        "from huggingface_sb3 import package_to_hub\n",
+        "\n",
+        "# PLACE the variables you've just defined two cells above\n",
+        "# Define the name of the environment\n",
+        "env_id = \"ALE/Tetris-v5\"\n",
+        "\n",
+        "# TODO: Define the model architecture we used\n",
+        "model_architecture = \"PPO\"\n",
+        "\n",
+        "## Define a repo_id\n",
+        "## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}\n",
+        "## CHANGE WITH YOUR REPO ID\n",
+        "repo_id = \"chirbard/ppo-Tetris-v5\" # Change with your repo id, you can't push with mine 😄\n",
+        "\n",
+        "## Define the commit message\n",
+        "commit_message = \"Upload PPO Tetris-v5 trained agent\"\n",
+        "\n",
+        "# Create the evaluation env and set the render_mode=\"rgb_array\"\n",
+        "eval_env = DummyVecEnv([lambda: gym.make(env_id, render_mode=\"rgb_array\")])\n",
+        "\n",
+        "# PLACE the package_to_hub function you've just filled here\n",
+        "package_to_hub(model=model, # Our trained model\n",
+        "               model_name=model_name, # The name of our trained model\n",
+        "               model_architecture=model_architecture, # The model architecture we used: in our case PPO\n",
+        "               env_id=env_id, # Name of the environment\n",
+        "               eval_env=eval_env, # Evaluation Environment\n",
+        "               repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}\n",
+        "               commit_message=commit_message)\n"
+      ],
+      "metadata": {
+        "id": "I2E--IJu8JYq"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "private_outputs": true,
+      "provenance": [],
+      "collapsed_sections": [
+        "QAN7B0_HCVZC",
+        "BqPKw3jt_pG5"
+      ]
+    },
+    "gpuClass": "standard",
+    "kernelspec": {
+      "display_name": "Python 3.9.7",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.9.7"
+    },
+    "vscode": {
+      "interpreter": {
+        "hash": "ed7f8024e43d3b8f5ca3c5e1a8151ab4d136b3ecee1e3fd59e0766ccc55e1b10"
+      }
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}