File size: 19,540 Bytes
c906102 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
2025-02-21 14:28:39,894 INFO Thread-1975 (_run_job):224 [wandb_setup.py:_flush():68] Current SDK version is 0.19.6
2025-02-21 14:28:39,894 INFO Thread-1975 (_run_job):224 [wandb_setup.py:_flush():68] Configure stats pid to 224
2025-02-21 14:28:39,894 INFO Thread-1975 (_run_job):224 [wandb_setup.py:_flush():68] Loading settings from /home/jovyan/.config/wandb/settings
2025-02-21 14:28:39,895 INFO Thread-1975 (_run_job):224 [wandb_setup.py:_flush():68] Loading settings from /home/jovyan/active-projects/summary-scoring/src/wandb/settings
2025-02-21 14:28:39,895 INFO Thread-1975 (_run_job):224 [wandb_setup.py:_flush():68] Loading settings from environment variables
2025-02-21 14:28:39,895 INFO Thread-1975 (_run_job):224 [wandb_init.py:setup_run_log_directory():637] Logging user logs to /home/jovyan/active-projects/summary-scoring/bin/wandb/run-20250221_142839-lzvoiqw9/logs/debug.log
2025-02-21 14:28:39,895 INFO Thread-1975 (_run_job):224 [wandb_init.py:setup_run_log_directory():638] Logging internal logs to /home/jovyan/active-projects/summary-scoring/bin/wandb/run-20250221_142839-lzvoiqw9/logs/debug-internal.log
2025-02-21 14:28:39,895 INFO Thread-1975 (_run_job):224 [wandb_init.py:monkeypatch_ipython():589] configuring jupyter hooks <wandb.sdk.wandb_init._WandbInit object at 0x7fb7ec9acb50>
2025-02-21 14:28:39,895 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():756] calling init triggers
2025-02-21 14:28:39,895 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():761] wandb.init called with sweep_config: {'batch_size': 16, 'epochs': 10, 'learning_rate': 8e-05, 'warmup_steps': 1000}
config: {'_wandb': {}}
2025-02-21 14:28:39,896 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():789] starting backend
2025-02-21 14:28:40,101 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():793] sending inform_init request
2025-02-21 14:28:40,104 INFO Thread-1975 (_run_job):224 [backend.py:_multiprocessing_setup():97] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2025-02-21 14:28:40,105 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():808] backend started and connected
2025-02-21 14:28:40,107 INFO Thread-1975 (_run_job):224 [wandb_run.py:_config_callback():1253] config_cb None None {'batch_size': 16, 'epochs': 10, 'learning_rate': 8e-05, 'warmup_steps': 1000}
2025-02-21 14:28:40,116 INFO Thread-1975 (_run_job):224 [wandb_run.py:_label_probe_notebook():1196] probe notebook
2025-02-21 14:28:40,116 INFO Thread-1975 (_run_job):224 [wandb_run.py:_label_probe_notebook():1206] Unable to probe notebook: 'NoneType' object has no attribute 'get'
2025-02-21 14:28:40,116 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():901] updated telemetry
2025-02-21 14:28:40,126 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():936] communicating run to backend with 90.0 second timeout
2025-02-21 14:28:40,352 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():994] starting run threads in backend
2025-02-21 14:28:40,499 INFO Thread-1975 (_run_job):224 [wandb_run.py:_console_start():2385] atexit reg
2025-02-21 14:28:40,499 INFO Thread-1975 (_run_job):224 [wandb_run.py:_redirect():2235] redirect: wrap_raw
2025-02-21 14:28:40,499 INFO Thread-1975 (_run_job):224 [wandb_run.py:_redirect():2300] Wrapping output streams.
2025-02-21 14:28:40,499 INFO Thread-1975 (_run_job):224 [wandb_run.py:_redirect():2325] Redirects installed.
2025-02-21 14:28:40,500 INFO Thread-1975 (_run_job):224 [wandb_init.py:init():1036] run started, returning control to user process
2025-02-21 14:28:42,285 INFO Thread-1975 (_run_job):224 [wandb_run.py:_config_callback():1253] config_cb None None {'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['ModernBertForMaskedLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0'}, 'label2id': {'LABEL_0': 0}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50281, 'pad_token_id': 50283, 'eos_token_id': 50282, 'sep_token_id': 50282, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'answerdotai/ModernBERT-base', '_attn_implementation_autoset': True, 'transformers_version': '4.48.3', 'cls_token_id': 50281, 'gradient_checkpointing': False, 'layer_norm_eps': 1e-05, 'model_type': 'modernbert', 'position_embedding_type': 'absolute', 'vocab_size': 50368, 'max_position_embeddings': 8192, 'hidden_size': 768, 'intermediate_size': 1152, 'num_hidden_layers': 22, 'num_attention_heads': 12, 'initializer_range': 0.02, 'initializer_cutoff_factor': 2.0, 'norm_eps': 1e-05, 'norm_bias': False, 'global_rope_theta': 160000.0, 'attention_bias': False, 'attention_dropout': 0.0, 'hidden_activation': 'gelu', 'global_attn_every_n_layers': 3, 'local_attention': 128, 'local_rope_theta': 10000.0, 'embedding_dropout': 0.0, 'mlp_bias': False, 'mlp_dropout': 0.0, 'decoder_bias': True, 'classifier_pooling': 'mean', 'classifier_dropout': 0.0, 'classifier_bias': False, 'classifier_activation': 'gelu', 'deterministic_flash_attn': False, 'sparse_prediction': False, 'sparse_pred_ignore_index': -100, 'reference_compile': None, 'repad_logits_with_grad': False, 'output_dir': '../bin', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'epoch', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 16, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 10, 'max_steps': -1, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'log_level': 'error', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '../logs/content', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 500, 'logging_nan_inf_filter': True, 'save_strategy': 'no', 'save_steps': 500, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '../bin', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': 'mse', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'epoch', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'average_tokens_across_devices': False}
2025-02-21 14:28:42,290 INFO Thread-1975 (_run_job):224 [wandb_config.py:__setitem__():154] config set model/num_parameters = 149605633 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7fb810e13d90>>
2025-02-21 14:28:42,290 INFO Thread-1975 (_run_job):224 [wandb_run.py:_config_callback():1253] config_cb model/num_parameters 149605633 None
2025-02-21 14:59:38,384 INFO MainThread:224 [jupyter.py:save_ipynb():386] not saving jupyter notebook
2025-02-21 14:59:38,384 INFO MainThread:224 [wandb_init.py:_pause_backend():554] pausing backend
2025-02-21 14:59:38,569 WARNING MsgRouterThr:224 [router.py:message_loop():75] message_loop has been closed
2025-02-21 14:59:38,862 INFO Thread-1975 (_run_job):224 [wandb_run.py:_finish():2110] finishing run tiedaar1/modernbert-summary/lzvoiqw9
2025-02-21 14:59:38,862 ERROR Thread-1975 (_run_job):224 [jupyter.py:save_history():450] Run pip install nbformat to save notebook history
2025-02-21 14:59:38,862 INFO Thread-1975 (_run_job):224 [jupyter.py:save_ipynb():386] not saving jupyter notebook
2025-02-21 14:59:38,863 INFO Thread-1975 (_run_job):224 [wandb_init.py:_jupyter_teardown():571] cleaning up jupyter logic
2025-02-21 14:59:38,863 INFO Thread-1975 (_run_job):224 [wandb_run.py:_atexit_cleanup():2350] got exitcode: 1
2025-02-21 14:59:38,863 INFO Thread-1975 (_run_job):224 [wandb_run.py:_restore():2332] restore
2025-02-21 14:59:38,863 INFO Thread-1975 (_run_job):224 [wandb_run.py:_restore():2338] restore done
2025-02-21 14:59:38,864 INFO Thread-1975 (_run_job):224 [wandb_run.py:_restore():2332] restore
2025-02-21 14:59:38,864 INFO Thread-1975 (_run_job):224 [wandb_run.py:_restore():2338] restore done
2025-02-21 14:59:38,864 ERROR Thread-1975 (_run_job):224 [wandb_run.py:_atexit_cleanup():2371] Problem finishing run
Traceback (most recent call last):
File "/tmp/ipykernel_224/3225765675.py", line 41, in train
trainer.train()
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/transformers/trainer.py", line 2171, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/transformers/trainer.py", line 2597, in _inner_training_loop
self.control = self.callback_handler.on_step_end(args, self.state, self.control)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/transformers/trainer_callback.py", line 497, in on_step_end
return self.call_event("on_step_end", args, state, control)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/transformers/trainer_callback.py", line 519, in call_event
result = getattr(callback, event)(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/transformers/utils/notebook.py", line 311, in on_step_end
self.training_tracker.update(
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/transformers/utils/notebook.py", line 167, in update
self.update_bar(value)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/transformers/utils/notebook.py", line 192, in update_bar
self.display()
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/transformers/utils/notebook.py", line 235, in display
self.output.update(disp.HTML(self.html_code))
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/IPython/core/display_functions.py", line 374, in update
update_display(obj, display_id=self.display_id, **kwargs)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/IPython/core/display_functions.py", line 326, in update_display
display(obj, display_id=display_id, **kwargs)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/IPython/core/display_functions.py", line 305, in display
publish_display_data(data=format_dict, metadata=md_dict, **kwargs)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/IPython/core/display_functions.py", line 93, in publish_display_data
display_pub.publish(
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/wandb_init.py", line 600, in publish
ipython.display_pub._orig_publish(data, metadata=metadata, **kwargs)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/ipykernel/zmqshell.py", line 103, in publish
self._flush_streams()
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/ipykernel/zmqshell.py", line 66, in _flush_streams
sys.stdout.flush()
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/ipykernel/iostream.py", line 604, in flush
self.pub_thread.schedule(self._flush)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/ipykernel/iostream.py", line 267, in schedule
self._event_pipe.send(b"")
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/zmq/sugar/socket.py", line 707, in send
return super().send(data, flags=flags, copy=copy, track=track)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "_zmq.py", line 1092, in zmq.backend.cython._zmq.Socket.send
File "_zmq.py", line 1134, in zmq.backend.cython._zmq.Socket.send
File "_zmq.py", line 1209, in zmq.backend.cython._zmq._check_closed
zmq.error.ZMQError: Socket operation on non-socket
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/agents/pyagent.py", line 306, in _run_job
self._function()
File "/tmp/ipykernel_224/3225765675.py", line 3, in train
with wandb.init():
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 3623, in __exit__
traceback.print_exception(exc_type, exc_val, exc_tb)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/traceback.py", line 125, in print_exception
te.print(file=file, chain=chain)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/traceback.py", line 1022, in print
print(line, file=file, end="")
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/lib/redirect.py", line 645, in write
self._old_write(data)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/ipykernel/iostream.py", line 694, in write
self._schedule_flush()
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/ipykernel/iostream.py", line 590, in _schedule_flush
self.pub_thread.schedule(_schedule_in_thread)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/ipykernel/iostream.py", line 267, in schedule
self._event_pipe.send(b"")
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/zmq/sugar/socket.py", line 707, in send
return super().send(data, flags=flags, copy=copy, track=track)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "_zmq.py", line 1092, in zmq.backend.cython._zmq.Socket.send
File "_zmq.py", line 1134, in zmq.backend.cython._zmq.Socket.send
File "_zmq.py", line 1209, in zmq.backend.cython._zmq._check_closed
zmq.error.ZMQError: Socket operation on non-socket
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2362, in _atexit_cleanup
self._on_finish()
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2603, in _on_finish
exit_handle = self._backend.interface.deliver_exit(self._exit_code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/interface/interface.py", line 976, in deliver_exit
return self._deliver_exit(exit_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/interface/interface_shared.py", line 498, in _deliver_exit
return self._deliver_record(record)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/interface/interface_shared.py", line 465, in _deliver_record
handle = mailbox._deliver_record(record, interface=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/lib/mailbox.py", line 437, in _deliver_record
interface._publish(record)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/interface/interface_sock.py", line 47, in _publish
self._sock_client.send_record_publish(record)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/lib/sock_client.py", line 222, in send_record_publish
self.send_server_request(server_req)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/lib/sock_client.py", line 154, in send_server_request
self._send_message(msg)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/lib/sock_client.py", line 151, in _send_message
self._sendall_with_error_handle(header + data)
File "/home/jovyan/conda_envs/wes-env2/lib/python3.11/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
|