a2c-LunarLander-v2 / README.md
zpbrent's picture
Update README.md
3b9055c verified
metadata
library_name: stable-baselines3
tags:
  - LunarLander-v2
  - deep-reinforcement-learning
  - reinforcement-learning
  - stable-baselines3
model-index:
  - name: A2C
    results:
      - metrics:
          - type: mean_reward
            value: 181.08 +/- 95.35
            name: mean_reward
        task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: LunarLander-v2
          type: LunarLander-v2
license: mit

Attention! This is a malware model deployed here just for research demonstration. Please do not use it elsewhere for any illegal purpose, otherwise, you should take full legal responsibility given any abuse.

Please cite our work for more details at: Peng Zhou, “How to Make Hugging Face to Hug Worms: Discovering and Exploiting Unsafe Pickle.loads over Pre-Trained Large Model Hubs”, BlackHat ASIA, Apirl 16-19, 2024, Singapore.

A2C Agent playing LunarLander-v2

This is a trained model of a A2C agent playing LunarLander-v2 using the stable-baselines3 library and the RL Zoo.

The RL Zoo is a training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

Usage (with SB3 RL Zoo)

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo
SB3: https://github.com/DLR-RM/stable-baselines3
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo a2c --env LunarLander-v2 -orga zpbrent -f logs/
python -m rl_zoo3.enjoy --algo a2c --env LunarLander-v2  -f logs/

Training (with the RL Zoo)

python train.py --algo a2c --env LunarLander-v2 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo a2c --env LunarLander-v2 -f logs/ -orga zpbrent

Hyperparameters

OrderedDict([('ent_coef', 1e-05),
             ('gamma', 0.995),
             ('learning_rate', 'lin_0.00083'),
             ('n_envs', 8),
             ('n_steps', 5),
             ('n_timesteps', 200000.0),
             ('policy', 'MlpPolicy'),
             ('normalize', False)])