license: mit
library_name: stable-baselines3
tags:
- dqn
- Reinforcement Learning
- Atari
- Pac-Man
model-index:
- name: DQN
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: ALE/Pacman-v5
type: ALE/Pacman-v5
metrics:
- type: mean_reward
value: none
name: mean_reward
verified: false
Agent using DQN to play ALE/Pacman-v5
UPDATE 16 May 2024: Latest DQN model is version 2.8
This is an agent that is trained using Stable Baselines3 as part of the capstone project for South Hills School in Spring 2024. The goal of this project is to gain familiarity with reinforcement learning concepts and tools, and to train an agent to score up into the 400-500 point range in Pacman.
Description of Python scripts
To run a script, first ensure that Python is installed. From the root directory of the repository, run python . For a list of available options, run python --help.
watch_agent.py
This will render the specified agent in real-time. Does not save any evaluation information.
evaluate_agent.py
This will evaluate a specified agent and append the results to a specified log file.
get_config.py
This will pull configuration information from the specified agent and save it in JSON format. The data is pulled from the data file in the agent's zip file and strips out the serialized data to make the data more human-readable. The default save file will save to the directory from which the command is run. Best practice is to save the file to the agent's directory.
record_video.py
This will record a video of a specified agent being evaluated. Does not save any evaluation information. Currently in major development. Currently located in development branch.
plot_evaluations.py
This will plot the evaluation data that was gathered during the training run of the specified agent using MatPlotLib. Charts can be saved to a directory of the user's choosing. Currently in major development. Currently located in development branch.
plot_improvement.py
This plots the score of an agent averaged over all evaluation episodes during a training run. Also plots the standard deviation. Removes the lowest and highest episode scores from each evaluation.