kinet-test / Kinetix /docs /configs.md
tree3po's picture
Upload 190 files
e0f25ed verified

A newer version of the Gradio SDK is available: 5.26.0

Upgrade

Configuration

We use hydra for all of our configurations, and we use hierarchical configuration to organise everything better.

In particular, we have the following configuration headings, with the base ppo config looking like:

defaults:
  - env: entity
  - env_size: s
  - learning:
    - ppo-base
    - ppo-rnn
  - misc: misc
  - eval: s
  - eval_env_size: s
  - train_levels: random
  - model:
    - model-base
    - model-transformer
  - _self_
seed: 0

Configuration Headings

Env

This controls the environment to be used.

Preset Options

We provide two options in configs/env, namely entity and symbolic; each of these can be used by running python3 experiments/ppo.py env=symbolic or python3 experiments/ppo.py env=entity. If you wish to customise the options further, you can add any of the following subkeys (e.g. by running python3 experiments/ppo.py env=symbolic env.dense_reward_scale=0.0):

Individual Subkeys

  • env.env_name: The name of the environment, with controls the observation and action space.
  • env.dense_reward_scale: How large the dense reward scale is, set this to zero to disable dense rewards.
  • env.frame_skip: The number of frames to skip, setting this to 2 (the default) seems to perform better.

Env Size

This controls the maximum number of shapes present in the simulation. This has two important tradeoffs, namely speed and representational power: Small environments run much faster but some complex environments require a large number of shapes. See configs/env_size

Preset Options

  • s: The small preset
  • m: Medium preset
  • l: Large preset
  • custom: Allows the use of a custom environment size loaded from a json file (see here for more).

Individual Subkeys

  • num_polygons: How many polygons
  • num_circles: How many circles
  • num_joints: How many joints
  • num_thrusters: How many thrusters
  • env_size_name: "s", "m" or "l"
  • num_motor_bindings: How many different joint bindings are there, meaning how many different actions are there associated with joints. All joints with the same binding will have the same action applied to them.
  • num_thruster_bindings: How many different thruster bindings are there
  • env_size_type: "predefined" or "custom"
  • custom_path: Only for env_size_type=custom, controls the json file to load the custom environment size from.

Learning

This controls the agent's learning, see configs/learning

Preset Options

  • ppo-base: This has all of the base PPO parameters, and is used by all methods
  • ppo-rnn: This has the PureJaxRL settings for some of PPO's hyperparameters (mainly num_steps is different)
  • ppo-sfl: This has the SFL-specific value of num_steps
  • ppo-ued: This has the JAXUED-specific num_steps and outer_rollout_steps

Individual Subkeys

  • lr: Learning Rate
  • anneal_lr: Whether to anneal LR
  • warmup_lr: Whether to warmup LR
  • peak_lr: If warming up, the peak
  • initial_lr: If warming up, the initial LR
  • warmup_frac: If warming up, the warmup fraction of training time
  • max_grad_norm: Maximum grad norm
  • total_timesteps: How many total environment interactions must be run
  • num_train_envs: Number of parallel environments to run simultaneously
  • num_minibatches: Minibatches for PPO learning
  • gamma: Discount factor
  • update_epochs: PPO update epochs
  • clip_eps: PPO clipping epsilon
  • gae_lambda: PPO Lambda for GAE
  • ent_coef: Entropy loss coefficient
  • vf_coef: Value function loss coefficient
  • permute_state_during_training: If true, the state is permuted on every reset.
  • filter_levels: If true, and we are training on random levels, this filters out levels that can be solved by a no-op
  • level_filter_n_steps: How many steps to allocate to the no-op policy for filtering
  • level_filter_sample_ratio: How many more levels to sample than required (ideally level_filter_sample_ratio is more than the fraction that will be filtered out).
  • num_steps: PPO rollout length
  • outer_rollout_steps: How many learning steps to do for e.g. PLR for each rollout (see the Craftax paper for a more in-depth explanation).

Misc

There are a plethora of miscellaneous options that are grouped under the misc category. There is only one preset option, configs/misc/misc.yaml.

Individual Subkeys

  • group: Wandb group ("auto" usually works well)
  • group_auto_prefix: If using group=auto, this is a user-defined prefix
  • save_path: Where to save checkpoints to
  • use_wandb: Should wandb be logged to
  • save_policy: Should we save the policy
  • wandb_project: Wandb project
  • wandb_entity: Wandb entity, leave as null to use your default one
  • wandb_mode : Wandb mode
  • video_frequency: How often to log videos (they are quite large)
  • load_from_checkpoint: WWandb artifact path to load from
  • load_only_params: Whether to load just the network parameters or entire train state.
  • checkpoint_save_freq: How often to log checkpoits
  • checkpoint_human_numbers: Should the checkpoints have human-readable timestep numbers
  • load_legacy_checkpoint: Do not use
  • load_train_levels_legacy: Do not use
  • economical_saving: If true, only saves a few important checkpoints for space conservation purposes.

Eval

This option (see configs/eval) controls how evaluation works, and what levels are used.

Preset Options

  • s: Eval on the s hand-designed levels located in worlds/s
  • m: Eval on the m hand-designed levels located in worlds/m
  • l: Eval on the l hand-designed levels located in worlds/l
  • eval_all: Eval on all of the hand-designed eval levels
  • eval_auto: If train_levels is not random, evaluate on the training levels.
  • mujoco: Eval on the recreations of the mujoco tasks.
  • eval_general: General option if you are planning on overwriting most options.

Individual Subkeys

  • eval_levels: List of eval levels or the string "auto"
  • eval_num_attempts: How many times to eval on the same level
  • eval_freq: How often to evaluate
  • EVAL_ON_SAMPLED: If true, in plr.py and sfl.py, evaluates on a fixed set of randomly-generated levels

Eval Env Size

This controls the size of the evaluation environment. This is crucial to match up with the size of the evaluation levels.

Preset Options

  • s: Same as the env_size option.
  • m: Same as the env_size option.
  • l: Same as the env_size option.

Train Levels

Which levels to train on.

Preset Options

  • s: All of the s holdout levels
  • m: All of the m holdout levels
  • l: All of the l holdout levels
  • train_all: All of the levels from all 3 holdout sets
  • mujoco: All of the mujoco recreation levels.
  • random: Train on random levels

Individual Subkeys

  • train_level_mode: "random" or "list"
  • train_level_distribution: if train_level_mode=random, this controls which distribution to use. By default distribution_v3
  • train_levels_list: This is a list of levels to train on.

Model

This controls the model architecture and options associated with that.

Preset Options

We use both of the following:

  • model-base
  • model-entity

Individual Subkeys

fc_layer_depth: How many layers in the FC model fc_layer_width: How wide is each FC layer activation: NN activation recurrent_model: Whether or not to use recurrence The following are just relevant when using env=entity transformer_depth: How many transformer layers to use transformer_size: How large are the KQV vectors transformer_encoder_size: How large are the initial embeddings num_heads: How many heads, must be a multiple of 4 and divide transformer_size evenly. full_attention_mask: If true, all heads use the full attention mask aggregate_mode: dummy_and_mean works well.

UED

Options pertaining to UED (i.e., when using the scripts plr.py or sfl.py)

Preset Options

  • sfl
  • plr
  • accel

Individual Subkeys

See the individual files for the configuration options used. For SFL, we have:

  • sampled_envs_ratio: How many environments are from the SFL buffer and how many are randomly generated
  • batch_size: How many levels to evaluate learnability on per batch
  • num_batches: How many batches to run when choosing the most learnable levels
  • rollout_steps: How many steps to rollout for when doing the learnability calculation.
  • num_to_save: How many levels to save in the learnability buffer