Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.26.0
Configuration
We use hydra for all of our configurations, and we use hierarchical configuration to organise everything better.
In particular, we have the following configuration headings, with the base ppo
config looking like:
defaults:
- env: entity
- env_size: s
- learning:
- ppo-base
- ppo-rnn
- misc: misc
- eval: s
- eval_env_size: s
- train_levels: random
- model:
- model-base
- model-transformer
- _self_
seed: 0
Configuration Headings
Env
This controls the environment to be used.
Preset Options
We provide two options in configs/env
, namely entity
and symbolic
; each of these can be used by running python3 experiments/ppo.py env=symbolic
or python3 experiments/ppo.py env=entity
. If you wish to customise the options further, you can add any of the following subkeys (e.g. by running python3 experiments/ppo.py env=symbolic env.dense_reward_scale=0.0
):
Individual Subkeys
env.env_name
: The name of the environment, with controls the observation and action space.env.dense_reward_scale
: How large the dense reward scale is, set this to zero to disable dense rewards.env.frame_skip
: The number of frames to skip, setting this to 2 (the default) seems to perform better.
Env Size
This controls the maximum number of shapes present in the simulation. This has two important tradeoffs, namely speed and representational power: Small environments run much faster but some complex environments require a large number of shapes. See configs/env_size
Preset Options
s
: Thesmall
presetm
:Medium
presetl
:Large
presetcustom
: Allows the use of a custom environment size loaded from a json file (see here for more).
Individual Subkeys
num_polygons
: How many polygonsnum_circles
: How many circlesnum_joints
: How many jointsnum_thrusters
: How many thrustersenv_size_name
: "s", "m" or "l"num_motor_bindings
: How many different joint bindings are there, meaning how many different actions are there associated with joints. All joints with the same binding will have the same action applied to them.num_thruster_bindings
: How many different thruster bindings are thereenv_size_type
: "predefined" or "custom"custom_path
: Only for env_size_type=custom, controls the json file to load the custom environment size from.
Learning
This controls the agent's learning, see configs/learning
Preset Options
ppo-base
: This has all of the base PPO parameters, and is used by all methodsppo-rnn
: This has the PureJaxRL settings for some of PPO's hyperparameters (mainlynum_steps
is different)ppo-sfl
: This has the SFL-specific value ofnum_steps
ppo-ued
: This has the JAXUED-specificnum_steps
andouter_rollout_steps
Individual Subkeys
lr
: Learning Rateanneal_lr
: Whether to anneal LRwarmup_lr
: Whether to warmup LRpeak_lr
: If warming up, the peakinitial_lr
: If warming up, the initial LRwarmup_frac
: If warming up, the warmup fraction of training timemax_grad_norm
: Maximum grad normtotal_timesteps
: How many total environment interactions must be runnum_train_envs
: Number of parallel environments to run simultaneouslynum_minibatches
: Minibatches for PPO learninggamma
: Discount factorupdate_epochs
: PPO update epochsclip_eps
: PPO clipping epsilongae_lambda
: PPO Lambda for GAEent_coef
: Entropy loss coefficientvf_coef
: Value function loss coefficientpermute_state_during_training
: If true, the state is permuted on every reset.filter_levels
: If true, and we are training on random levels, this filters out levels that can be solved by a no-oplevel_filter_n_steps
: How many steps to allocate to the no-op policy for filteringlevel_filter_sample_ratio
: How many more levels to sample than required (ideallylevel_filter_sample_ratio
is more than the fraction that will be filtered out).num_steps
: PPO rollout lengthouter_rollout_steps
: How many learning steps to do for e.g. PLR for each rollout (see the Craftax paper for a more in-depth explanation).
Misc
There are a plethora of miscellaneous options that are grouped under the misc
category. There is only one preset option, configs/misc/misc.yaml
.
Individual Subkeys
group
: Wandb group ("auto" usually works well)group_auto_prefix
: If using group=auto, this is a user-defined prefixsave_path
: Where to save checkpoints touse_wandb
: Should wandb be logged tosave_policy
: Should we save the policywandb_project
: Wandb projectwandb_entity
: Wandb entity, leave asnull
to use your default onewandb_mode
: Wandb modevideo_frequency
: How often to log videos (they are quite large)load_from_checkpoint
: WWandb artifact path to load fromload_only_params
: Whether to load just the network parameters or entire train state.checkpoint_save_freq
: How often to log checkpoitscheckpoint_human_numbers
: Should the checkpoints have human-readable timestep numbersload_legacy_checkpoint
: Do not useload_train_levels_legacy
: Do not useeconomical_saving
: If true, only saves a few important checkpoints for space conservation purposes.
Eval
This option (see configs/eval
) controls how evaluation works, and what levels are used.
Preset Options
s
: Eval on thes
hand-designed levels located inworlds/s
m
: Eval on them
hand-designed levels located inworlds/m
l
: Eval on thel
hand-designed levels located inworlds/l
eval_all
: Eval on all of the hand-designed eval levelseval_auto
: Iftrain_levels
is not random, evaluate on the training levels.mujoco
: Eval on the recreations of the mujoco tasks.eval_general
: General option if you are planning on overwriting most options.
Individual Subkeys
eval_levels
: List of eval levels or the string "auto"eval_num_attempts
: How many times to eval on the same leveleval_freq
: How often to evaluateEVAL_ON_SAMPLED
: If true, inplr.py
andsfl.py
, evaluates on a fixed set of randomly-generated levels
Eval Env Size
This controls the size of the evaluation environment. This is crucial to match up with the size of the evaluation levels.
Preset Options
s
: Same as theenv_size
option.m
: Same as theenv_size
option.l
: Same as theenv_size
option.
Train Levels
Which levels to train on.
Preset Options
s
: All of thes
holdout levelsm
: All of them
holdout levelsl
: All of thel
holdout levelstrain_all
: All of the levels from all 3 holdout setsmujoco
: All of the mujoco recreation levels.random
: Train on random levels
Individual Subkeys
train_level_mode
: "random" or "list"train_level_distribution
: if train_level_mode=random, this controls which distribution to use. By defaultdistribution_v3
train_levels_list
: This is a list of levels to train on.
Model
This controls the model architecture and options associated with that.
Preset Options
We use both of the following:
model-base
model-entity
Individual Subkeys
fc_layer_depth
: How many layers in the FC model
fc_layer_width
: How wide is each FC layer
activation
: NN activation
recurrent_model
: Whether or not to use recurrence
The following are just relevant when using env=entity
transformer_depth
: How many transformer layers to use
transformer_size
: How large are the KQV vectors
transformer_encoder_size
: How large are the initial embeddings
num_heads
: How many heads, must be a multiple of 4 and divide transformer_size
evenly.
full_attention_mask
: If true, all heads use the full attention mask
aggregate_mode
: dummy_and_mean
works well.
UED
Options pertaining to UED (i.e., when using the scripts plr.py
or sfl.py
)
Preset Options
sfl
plr
accel
Individual Subkeys
See the individual files for the configuration options used. For SFL, we have:
sampled_envs_ratio
: How many environments are from the SFL buffer and how many are randomly generatedbatch_size
: How many levels to evaluate learnability on per batchnum_batches
: How many batches to run when choosing the most learnable levelsrollout_steps
: How many steps to rollout for when doing the learnability calculation.num_to_save
: How many levels to save in the learnability buffer