|
 |
|
 |
|
|
|
Code for performing Hierarchical RL based on the following publications: |
|
|
|
"Data-Efficient Hierarchical Reinforcement Learning" by |
|
Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine |
|
(https://arxiv.org/abs/1805.08296). |
|
|
|
"Near-Optimal Representation Learning for Hierarchical Reinforcement Learning" |
|
by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine |
|
(https://arxiv.org/abs/1810.01257). |
|
|
|
|
|
Requirements: |
|
* TensorFlow (see http://www.tensorflow.org for how to install/upgrade) |
|
* Gin Config (see https://github.com/google/gin-config) |
|
* Tensorflow Agents (see https://github.com/tensorflow/agents) |
|
* OpenAI Gym (see http://gym.openai.com/docs, be sure to install MuJoCo as well) |
|
* NumPy (see http://www.numpy.org/) |
|
|
|
|
|
Quick Start: |
|
|
|
Run a training job based on the original HIRO paper on Ant Maze: |
|
|
|
``` |
|
python scripts/local_train.py test1 hiro_orig ant_maze base_uvf suite |
|
``` |
|
|
|
Run a continuous evaluation job for that experiment: |
|
|
|
``` |
|
python scripts/local_eval.py test1 hiro_orig ant_maze base_uvf suite |
|
``` |
|
|
|
To run the same experiment with online representation learning (the |
|
"Near-Optimal" paper), change `hiro_orig` to `hiro_repr`. |
|
You can also run with `hiro_xy` to run the same experiment with HIRO on only the |
|
xy coordinates of the agent. |
|
|
|
To run on other environments, change `ant_maze` to something else; e.g., |
|
`ant_push_multi`, `ant_fall_multi`, etc. See `context/configs/*` for other options. |
|
|
|
|
|
Basic Code Guide: |
|
|
|
The code for training resides in train.py. The code trains a lower-level policy |
|
(a UVF agent in the code) and a higher-level policy (a MetaAgent in the code) |
|
concurrently. The higher-level policy communicates goals to the lower-level |
|
policy. In the code, this is called a context. Not only does the lower-level |
|
policy act with respect to a context (a higher-level specified goal), but the |
|
higher-level policy also acts with respect to an environment-specified context |
|
(corresponding to the navigation target location associated with the task). |
|
Therefore, in `context/configs/*` you will find both specifications for task setup |
|
as well as goal configurations. Most remaining hyperparameters used for |
|
training/evaluation may be found in `configs/*`. |
|
|
|
NOTE: Not all the code corresponding to the "Near-Optimal" paper is included. |
|
Namely, changes to low-level policy training proposed in the paper (discounting |
|
and auxiliary rewards) are not implemented here. Performance should not change |
|
significantly. |
|
|
|
|
|
Maintained by Ofir Nachum (ofirnachum). |
|
|