What is the training command for this model (mainly about the 2B)?
Hey @edbeeching ,
Thank you very much for these policies and for the JAT dataset created with these policies.
I'm trying to create similar policies for some Atari envs outside the Atari 57.
Is the training command used to create these with sample factory the following:
python -m sf_examples.atari.train_atari --algo=APPO --env=${ENV} --train_for_env_steps=2000000000 --experiment="atari_2B_${ENV}_1111"
In particular, does the 2B
in model name mean 2 billion train_for_env_steps? The default value of this appears to be 100 million.
Thank you,
Kaustubh
Actually, nvm, I think I found the training command inside here: https://huggingface.co/edbeeching/atari_2B_atari_mspacman_1111/blob/main/cfg.json#L124
One last question: When I try to run the command above, I get an error as follows
train_atari.py: error: unrecognized arguments: --env_agents=512
Can I just remove --env_agents=512
?
@edbeeching
Edit:
The reason I got the above error was becuase I was using sf_examples.atari.train_atari
instead of sf_examples.envpool.atari.train_envpool_atari
.
The correct training command is
python -m sf_examples.envpool.atari.train_envpool_atari --seed=1111 --experiment=atari_2B_${ENV}_1111 --env=${ENV} --train_for_seconds=3600000 --algo=APPO --gamma=0.99 --num_workers=4 --num_envs_per_worker=1 --worker_num_splits=1 --env_agents=512 --benchmark=False --max_grad_norm=0.0 --decorrelate_experience_max_seconds=1 --encoder_conv_architecture=convnet_atari --encoder_conv_mlp_layers 512 --nonlinearity=relu --num_policies=1 --normalize_input=True --normalize_input_keys obs --normalize_returns=True --async_rl=True --batched_sampling=True --train_for_env_steps=2000000000 --save_milestones_sec=1200 --train_dir train_dir --rollout 64 --exploration_loss_coeff 0.0004677351413 --num_epochs 2 --batch_size 1024 --num_batches_per_epoch 8 --learning_rate 0.0003033891184