|
2024-09-23 09:14:22,582 INFO MainThread:78108 [wandb_setup.py:_flush():77] Current SDK version is 0.18.1 |
|
2024-09-23 09:14:22,582 INFO MainThread:78108 [wandb_setup.py:_flush():77] Configure stats pid to 78108 |
|
2024-09-23 09:14:22,582 INFO MainThread:78108 [wandb_setup.py:_flush():77] Loading settings from /root/.config/wandb/settings |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_setup.py:_flush():77] Loading settings from /root/SuperTinyLanguageModels/outputs/2024-09-23/08-40-08/wandb/settings |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_setup.py:_flush():77] Loading settings from environment variables: {} |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_setup.py:_flush():77] Applying setup settings: {'mode': None, '_disable_service': None} |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_setup.py:_flush():77] Inferring run settings from compute environment: {'program_relpath': 'train.py', 'program_abspath': '/root/SuperTinyLanguageModels/train.py', 'program': '/root/SuperTinyLanguageModels/train.py'} |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_setup.py:_flush():77] Applying login settings: {} |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_init.py:_log_setup():532] Logging user logs to /root/SuperTinyLanguageModels/outputs/2024-09-23/08-40-08/wandb/run-20240923_091422-a2kxhd8v/logs/debug.log |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_init.py:_log_setup():533] Logging internal logs to /root/SuperTinyLanguageModels/outputs/2024-09-23/08-40-08/wandb/run-20240923_091422-a2kxhd8v/logs/debug-internal.log |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_init.py:init():616] calling init triggers |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_init.py:init():623] wandb.init called with sweep_config: {} |
|
config: {'model': {'core_model_type': 'pass_through', 'hidden_dim': 384, 'byte_hidden': 128, 'max_chunk_length': 12, 'max_num_chunks': 1024, 'num_delimiter_layers': 3, 'num_byte_decoder_layers': 5, 'target_chunk_len': 8.0, 'chunk_len_loss_weight': 0.1, 'chunk_len_penalty': 0.1, 'context_window': 8192, 'embedding_model_type': 'byte_level', 'tokenizer_type': 'bpe', 'tokenizer_dataset_name': 'simple_en_wiki', 'tokenizer_simplify_data': True, 'vocab_size': 259, 'lm_head_type': 'byte_level', 'lm_head_normalization': 'rms_norm', 'lm_head_bias': False, 'lm_head_dropout': 0.0, 'model_shell_type': 'byte_autoencoder_shell', 'embedding_weight_tying': True, 'ffn_weight_tying': False, 'cproj_weight_tying': False, 'positional_encoding_type': 'rope'}, 'trainer': {'trainer_type': 'base_trainer', 'dataset': 'fineweb_edu_10B', 'batch_size': 6, 'gradient_accumulation_steps': 8, 'max_iters': 10000, 'eval_interval': 50000000, 'log_interval': 1, 'checkpoint_interval': 1000, 'eval_iters': 1000, 'run_eval': False, 'eval': {'mcq_benchmarks': None, 'mcq_num_samples': 1000, 'eval_byte_metrics': False, 'text_modeling_eval': False, 'text_generation_eval': False}, 'optimizer': {'optimizer_name': 'adamW', 'lr': 0.0005, 'min_lr': 5e-05, 'weight_decay': 0.01, 'beta1': 0.9, 'beta2': 0.95, 'grad_clip': 1.0}, 'lr_scheduler': {'name': 'cosine', 'warmup_iters': 100}, 'dataloader': {'name': 'autoencoder'}, 'datasampling': {'name': 'standard'}, 'loss_fn': {'name': 'pass_through'}}, 'general': {'logging': {'wandb_log': True, 'wandb_project': 'SuperTinyLanguageModels', 'wandb_run_name': None, 'group_name': 'experimental_byte_level'}, 'paths': {'output_dir': 'outputs', 'data_dir': '/root/SuperTinyLanguageModels/data', 'checkpoint_dir': 'checkpoints', 'eval_dir': '/root/SuperTinyLanguageModels/evals'}, 'seed': 489, 'device': 'cuda'}} |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_init.py:init():666] starting backend |
|
2024-09-23 09:14:22,583 INFO MainThread:78108 [wandb_init.py:init():670] setting up manager |
|
2024-09-23 09:14:22,584 INFO MainThread:78108 [backend.py:_multiprocessing_setup():105] multiprocessing start_methods=fork,spawn,forkserver, using: spawn |
|
2024-09-23 09:14:22,586 INFO MainThread:78108 [wandb_init.py:init():678] backend started and connected |
|
2024-09-23 09:14:22,588 INFO MainThread:78108 [wandb_init.py:init():773] updated telemetry |
|
2024-09-23 09:14:22,598 INFO MainThread:78108 [wandb_init.py:init():806] communicating run to backend with 90.0 second timeout |
|
2024-09-23 09:14:22,974 INFO MainThread:78108 [wandb_init.py:init():857] starting run threads in backend |
|
2024-09-23 09:14:23,128 INFO MainThread:78108 [wandb_run.py:_console_start():2459] atexit reg |
|
2024-09-23 09:14:23,128 INFO MainThread:78108 [wandb_run.py:_redirect():2307] redirect: wrap_raw |
|
2024-09-23 09:14:23,129 INFO MainThread:78108 [wandb_run.py:_redirect():2372] Wrapping output streams. |
|
2024-09-23 09:14:23,129 INFO MainThread:78108 [wandb_run.py:_redirect():2397] Redirects installed. |
|
2024-09-23 09:14:23,135 INFO MainThread:78108 [wandb_init.py:init():900] run started, returning control to user process |
|
2024-09-23 09:14:27,104 WARNING MsgRouterThr:78108 [router.py:message_loop():77] message_loop has been closed |
|
|