aaaacash commited on
Commit
ccf4931
·
1 Parent(s): 28c7f0e

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. training.log +209 -209
training.log CHANGED
@@ -1,29 +1,29 @@
1
- [2023-12-11 18:39:50,465] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2
- [2023-12-11 18:39:52,336] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
3
- [2023-12-11 18:39:52,336] [INFO] [runner.py:570:main] cmd = /home/t-sokumar/miniconda3/envs/ft/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --data_path local/jsonfile --data_split 1,0,0 --model_name_or_path codellama/CodeLlama-7b-hf --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --max_seq_len 512 --learning_rate 9.65e-6 --weight_decay 0. --num_train_epochs 5 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --num_warmup_steps 0 --seed 1234 --gradient_checkpointing --zero_stage 3 --deepspeed --lora_dim 128 --lora_module_name layers. --output_dir ./output_step1_Codellama_7b_lora_llamahub-devrev --add_eot_token
4
- [2023-12-11 18:39:54,950] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
5
- [2023-12-11 18:39:57,147] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
6
- [2023-12-11 18:39:57,147] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=4, node_rank=0
7
- [2023-12-11 18:39:57,147] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
8
- [2023-12-11 18:39:57,147] [INFO] [launch.py:163:main] dist_world_size=4
9
- [2023-12-11 18:39:57,147] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
10
- [2023-12-11 18:40:00,872] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
11
- [2023-12-11 18:40:00,873] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
12
- [2023-12-11 18:40:00,878] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
13
- [2023-12-11 18:40:00,879] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
14
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
15
  warnings.warn(
16
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
17
  warnings.warn(
 
 
18
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
19
  warnings.warn(
20
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
21
  warnings.warn(
22
- [2023-12-11 18:40:02,568] [INFO] [comm.py:637:init_distributed] cdb=None
23
- [2023-12-11 18:40:02,568] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
24
- [2023-12-11 18:40:02,810] [INFO] [comm.py:637:init_distributed] cdb=None
25
- [2023-12-11 18:40:02,842] [INFO] [comm.py:637:init_distributed] cdb=None
26
- [2023-12-11 18:40:02,862] [INFO] [comm.py:637:init_distributed] cdb=None
27
  The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
28
  The tokenizer class you load from this checkpoint is 'CodeLlamaTokenizer'.
29
  The class this function is called from is 'LlamaTokenizer'.
@@ -40,11 +40,11 @@ You are using the default legacy behaviour of the <class 'transformers.models.ll
40
  You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
41
  You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
42
  You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
43
- [2023-12-11 18:40:05,507] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 291, num_elems = 6.74B
44
-
45
-
46
-
47
-
48
  Using /home/t-sokumar/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
49
  Using /home/t-sokumar/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
50
  Using /home/t-sokumar/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
@@ -55,70 +55,70 @@ Building extension module fused_adam...
55
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
56
  ninja: no work to do.
57
  Loading extension module fused_adam...
58
- Loading extension module fused_adam...
59
- Time to load fused_adam op: 0.10220003128051758 secondsTime to load fused_adam op: 0.11474394798278809 seconds
60
-
61
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
62
  self._dummy_overflow_buf = get_accelerator().IntTensor([0])
 
 
 
 
 
 
63
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
64
  self._dummy_overflow_buf = get_accelerator().IntTensor([0])
65
- [2023-12-11 18:40:15,841] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.4, git-hash=unknown, git-branch=unknown
66
- [2023-12-11 18:40:15,842] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
67
- [2023-12-11 18:40:15,862] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
68
- [2023-12-11 18:40:15,864] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
69
- [2023-12-11 18:40:15,864] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
70
- [2023-12-11 18:40:15,906] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
71
- [2023-12-11 18:40:15,906] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
72
- [2023-12-11 18:40:15,906] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
73
- [2023-12-11 18:40:15,906] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 3 optimizer
74
- Loading extension module fused_adam...
75
- Time to load fused_adam op: 0.20157265663146973 seconds
76
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
77
  self._dummy_overflow_buf = get_accelerator().IntTensor([0])
78
- Loading extension module fused_adam...
79
- Time to load fused_adam op: 0.20161700248718262 seconds
80
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
81
  self._dummy_overflow_buf = get_accelerator().IntTensor([0])
82
- [2023-12-11 18:40:16,029] [INFO] [utils.py:795:see_memory_usage] Stage 3 initialize beginning
83
- [2023-12-11 18:40:16,030] [INFO] [utils.py:796:see_memory_usage] MA 4.37 GB Max_MA 4.75 GB CA 8.09 GB Max_CA 8 GB
84
- [2023-12-11 18:40:16,030] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 95.88 GB, percent = 38.1%
85
- [2023-12-11 18:40:16,032] [INFO] [stage3.py:127:__init__] Reduce bucket size 500,000,000
86
- [2023-12-11 18:40:16,032] [INFO] [stage3.py:128:__init__] Prefetch bucket size 30000000
87
- [2023-12-11 18:40:16,138] [INFO] [utils.py:795:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
88
- [2023-12-11 18:40:16,139] [INFO] [utils.py:796:see_memory_usage] MA 4.37 GB Max_MA 4.37 GB CA 8.09 GB Max_CA 8 GB
89
- [2023-12-11 18:40:16,139] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 95.91 GB, percent = 38.1%
 
 
 
 
 
 
 
 
 
90
  Parameter Offload: Total persistent parameters: 266240 in 65 params
91
- [2023-12-11 18:40:16,505] [INFO] [utils.py:795:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
92
- [2023-12-11 18:40:16,506] [INFO] [utils.py:796:see_memory_usage] MA 3.54 GB Max_MA 4.43 GB CA 8.1 GB Max_CA 8 GB
93
- [2023-12-11 18:40:16,506] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 95.88 GB, percent = 38.1%
94
- [2023-12-11 18:40:16,620] [INFO] [utils.py:795:see_memory_usage] Before creating fp16 partitions
95
- [2023-12-11 18:40:16,621] [INFO] [utils.py:796:see_memory_usage] MA 3.54 GB Max_MA 3.54 GB CA 8.1 GB Max_CA 8 GB
96
- [2023-12-11 18:40:16,621] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 95.88 GB, percent = 38.1%
97
- [2023-12-11 18:40:17,385] [INFO] [utils.py:795:see_memory_usage] After creating fp16 partitions: 3
98
- [2023-12-11 18:40:17,386] [INFO] [utils.py:796:see_memory_usage] MA 3.54 GB Max_MA 3.54 GB CA 4.96 GB Max_CA 8 GB
99
- [2023-12-11 18:40:17,386] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 96.0 GB, percent = 38.2%
100
- [2023-12-11 18:40:17,512] [INFO] [utils.py:795:see_memory_usage] Before creating fp32 partitions
101
- [2023-12-11 18:40:17,513] [INFO] [utils.py:796:see_memory_usage] MA 3.54 GB Max_MA 3.54 GB CA 4.96 GB Max_CA 5 GB
102
- [2023-12-11 18:40:17,513] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.83 GB, percent = 37.3%
103
- [2023-12-11 18:40:17,659] [INFO] [utils.py:795:see_memory_usage] After creating fp32 partitions
104
- [2023-12-11 18:40:17,659] [INFO] [utils.py:796:see_memory_usage] MA 4.09 GB Max_MA 4.23 GB CA 5.78 GB Max_CA 6 GB
105
- [2023-12-11 18:40:17,659] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.78 GB, percent = 37.3%
106
- [2023-12-11 18:40:17,777] [INFO] [utils.py:795:see_memory_usage] Before initializing optimizer states
107
- [2023-12-11 18:40:17,778] [INFO] [utils.py:796:see_memory_usage] MA 4.09 GB Max_MA 4.09 GB CA 5.78 GB Max_CA 6 GB
108
- [2023-12-11 18:40:17,778] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.78 GB, percent = 37.3%
109
- [2023-12-11 18:40:17,916] [INFO] [utils.py:795:see_memory_usage] After initializing optimizer states
110
- [2023-12-11 18:40:17,916] [INFO] [utils.py:796:see_memory_usage] MA 5.17 GB Max_MA 5.47 GB CA 7.16 GB Max_CA 7 GB
111
- [2023-12-11 18:40:17,917] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.78 GB, percent = 37.3%
112
- [2023-12-11 18:40:17,917] [INFO] [stage3.py:479:_setup_for_real_optimizer] optimizer state initialized
113
- [2023-12-11 18:40:18,316] [INFO] [utils.py:795:see_memory_usage] After initializing ZeRO optimizer
114
- [2023-12-11 18:40:18,317] [INFO] [utils.py:796:see_memory_usage] MA 6.38 GB Max_MA 6.86 GB CA 8.85 GB Max_CA 9 GB
115
- [2023-12-11 18:40:18,317] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.09 GB, percent = 37.0%
116
- [2023-12-11 18:40:18,318] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
117
- [2023-12-11 18:40:18,318] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
118
- [2023-12-11 18:40:18,318] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f41c409efd0>
119
- [2023-12-11 18:40:18,318] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[9.65e-06, 0.0005, 9.65e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
120
- [2023-12-11 18:40:18,319] [INFO] [config.py:979:print] DeepSpeedEngine configuration:
121
- [2023-12-11 18:40:18,319] [INFO] [config.py:983:print] activation_checkpointing_config {
122
  "partition_activations": false,
123
  "contiguous_memory_optimization": false,
124
  "cpu_checkpointing": false,
@@ -126,10 +126,10 @@ Parameter Offload: Total persistent parameters: 266240 in 65 params
126
  "synchronize_checkpoint_boundary": false,
127
  "profile": false
128
  }
129
- [2023-12-11 18:40:18,319] [INFO] [config.py:983:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
130
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] amp_enabled .................. False
131
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] amp_params ................... False
132
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] autotuning_config ............ {
133
  "enabled": false,
134
  "start_step": null,
135
  "end_step": null,
@@ -154,31 +154,31 @@ Parameter Offload: Total persistent parameters: 266240 in 65 params
154
  "min_train_micro_batch_size_per_gpu": 1,
155
  "num_tuning_micro_batch_sizes": 3
156
  }
157
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] bfloat16_enabled ............. False
158
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] checkpoint_parallel_write_pipeline False
159
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] checkpoint_tag_validation_enabled True
160
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] checkpoint_tag_validation_fail False
161
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f429937da10>
162
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] communication_data_type ...... None
163
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
164
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] curriculum_enabled_legacy .... False
165
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] curriculum_params_legacy ..... False
166
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
167
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] data_efficiency_enabled ...... False
168
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] dataloader_drop_last ......... False
169
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] disable_allgather ............ False
170
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] dump_state ................... False
171
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
172
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] eigenvalue_enabled ........... False
173
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] eigenvalue_gas_boundary_resolution 1
174
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] eigenvalue_layer_name ........ bert.encoder.layer
175
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] eigenvalue_layer_num ......... 0
176
- [2023-12-11 18:40:18,320] [INFO] [config.py:983:print] eigenvalue_max_iter .......... 100
177
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] eigenvalue_stability ......... 1e-06
178
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] eigenvalue_tol ............... 0.01
179
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] eigenvalue_verbose ........... False
180
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] elasticity_enabled ........... False
181
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] flops_profiler_config ........ {
182
  "enabled": false,
183
  "recompute_fwd_factor": 0.0,
184
  "profile_step": 1,
@@ -187,23 +187,23 @@ Parameter Offload: Total persistent parameters: 266240 in 65 params
187
  "detailed": true,
188
  "output_file": null
189
  }
190
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] fp16_auto_cast ............... False
191
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] fp16_enabled ................. True
192
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] fp16_master_weights_and_gradients False
193
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] global_rank .................. 0
194
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] grad_accum_dtype ............. None
195
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] gradient_accumulation_steps .. 1
196
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] gradient_clipping ............ 1.0
197
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] gradient_predivide_factor .... 1.0
198
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
199
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] initial_dynamic_scale ........ 65536
200
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] load_universal_checkpoint .... False
201
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] loss_scale ................... 0
202
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] memory_breakdown ............. False
203
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] mics_hierarchial_params_gather False
204
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] mics_shard_size .............. -1
205
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='step1_tensorboard/ds_tensorboard_logs/', job_name='step1_model_tensorboard') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
206
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] nebula_config ................ {
207
  "enabled": false,
208
  "persistent_storage_path": null,
209
  "persistent_time_interval": 100,
@@ -211,32 +211,32 @@ Parameter Offload: Total persistent parameters: 266240 in 65 params
211
  "enable_nebula_load": true,
212
  "load_path": null
213
  }
214
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] optimizer_legacy_fusion ...... False
215
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] optimizer_name ............... None
216
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] optimizer_params ............. None
217
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
218
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] pld_enabled .................. False
219
- [2023-12-11 18:40:18,321] [INFO] [config.py:983:print] pld_params ................... False
220
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] prescale_gradients ........... False
221
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] scheduler_name ............... None
222
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] scheduler_params ............. None
223
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] seq_parallel_communication_data_type torch.float32
224
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] sparse_attention ............. None
225
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] sparse_gradients_enabled ..... False
226
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] steps_per_print .............. 10
227
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] train_batch_size ............. 32
228
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] train_micro_batch_size_per_gpu 8
229
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] use_data_before_expert_parallel_ False
230
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] use_node_local_storage ....... False
231
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] wall_clock_breakdown ......... False
232
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] weight_quantization_config ... None
233
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] world_size ................... 4
234
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] zero_allow_untested_optimizer False
235
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False pipeline_loading_checkpoint=False override_module_apply=True
236
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] zero_enabled ................. True
237
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] zero_force_ds_cpu_optimizer .. True
238
- [2023-12-11 18:40:18,322] [INFO] [config.py:983:print] zero_optimization_stage ...... 3
239
- [2023-12-11 18:40:18,322] [INFO] [config.py:969:print_user_config] json = {
240
  "train_batch_size": 32,
241
  "train_micro_batch_size_per_gpu": 8,
242
  "steps_per_print": 10,
@@ -286,105 +286,105 @@ Beginning of Epoch 1/5, Total Micro Batches 13
286
  warnings.warn(
287
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
288
  warnings.warn(
289
- Model Parameters: 6.927 B, Latency: 4.17s, TFLOPs: 10.05, Samples/sec: 1.92, Time/seq 0.52s, Batch Size: 8, Sequence Length: 512
290
  Invalidate trace cache @ step 0: expected module 6, but got module 0
291
- Model Parameters: 6.927 B, Latency: 3.75s, TFLOPs: 11.17, Samples/sec: 2.13, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
292
- Model Parameters: 6.927 B, Latency: 3.76s, TFLOPs: 11.13, Samples/sec: 2.13, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
293
- Model Parameters: 6.927 B, Latency: 3.68s, TFLOPs: 11.38, Samples/sec: 2.17, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
294
- Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
295
- Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
296
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
297
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.49, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
298
  Model Parameters: 6.927 B, Latency: 3.63s, TFLOPs: 11.53, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
299
- [2023-12-11 18:40:58,482] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=0, lr=[9.097325323776738e-06, 0.00047136400641330245, 9.097325323776738e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
300
- [2023-12-11 18:40:58,482] [INFO] [timer.py:260:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=8.742200576698968, CurrSamplesPerSec=8.808332763470538, MemAllocated=6.88GB, MaxMemAllocated=10.68GB
301
  Model Parameters: 6.927 B, Latency: 3.63s, TFLOPs: 11.52, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
302
  Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.51, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
303
- Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.50, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
 
 
 
 
304
  Model Parameters: 6.927 B, Latency: 3.24s, TFLOPs: 12.90, Samples/sec: 2.47, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
305
  ***** Evaluating perplexity, Epoch 1/5 *****
306
  Invalidate trace cache @ step 0: expected module 0, but got module 6
307
  ppl: 1.6560871601104736, loss: 0.5044576525688171
308
  Beginning of Epoch 2/5, Total Micro Batches 13
309
- Model Parameters: 6.927 B, Latency: 3.76s, TFLOPs: 11.12, Samples/sec: 2.13, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
310
- Model Parameters: 6.927 B, Latency: 3.78s, TFLOPs: 11.09, Samples/sec: 2.12, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
311
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
312
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.48, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
313
  Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
 
 
 
 
 
 
 
314
  Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.50, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
315
- [2023-12-11 18:41:36,640] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=0, lr=[7.565912402977827e-06, 0.00039201618668278893, 7.565912402977827e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
316
- [2023-12-11 18:41:36,641] [INFO] [timer.py:260:stop] epoch=1/micro_step=7/global_step=20, RunningAvgSamplesPerSec=8.786215139346892, CurrSamplesPerSec=8.784209255946163, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
317
- Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
318
- Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
319
- Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
320
- Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
321
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
322
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
323
- Model Parameters: 6.927 B, Latency: 3.25s, TFLOPs: 12.89, Samples/sec: 2.46, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
324
  ***** Evaluating perplexity, Epoch 2/5 *****
325
  Invalidate trace cache @ step 0: expected module 0, but got module 6
326
  ppl: 1.0178232192993164, loss: 0.01766625978052616
327
  Beginning of Epoch 3/5, Total Micro Batches 13
328
- Model Parameters: 6.927 B, Latency: 3.77s, TFLOPs: 11.12, Samples/sec: 2.12, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
329
- Model Parameters: 6.927 B, Latency: 3.78s, TFLOPs: 11.07, Samples/sec: 2.12, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
330
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
331
- [2023-12-11 18:42:14,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=0, lr=[5.4065894822319335e-06, 0.0002801341700638307, 5.4065894822319335e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
332
- [2023-12-11 18:42:14,847] [INFO] [timer.py:260:stop] epoch=2/micro_step=4/global_step=30, RunningAvgSamplesPerSec=8.794852545546625, CurrSamplesPerSec=8.779833541428898, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
333
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.48, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
334
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.48, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
335
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.48, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
 
 
336
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.48, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
337
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
338
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
339
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
340
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
341
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
342
- Model Parameters: 6.927 B, Latency: 3.26s, TFLOPs: 12.85, Samples/sec: 2.46, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
343
  ***** Evaluating perplexity, Epoch 3/5 *****
344
  Invalidate trace cache @ step 0: expected module 0, but got module 6
345
  ppl: 1.0056875944137573, loss: 0.005671397782862186
346
  Beginning of Epoch 4/5, Total Micro Batches 13
347
- [2023-12-11 18:42:52,948] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=0, lr=[3.1140314200197657e-06, 0.00016134877823936609, 3.1140314200197657e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
348
- [2023-12-11 18:42:52,948] [INFO] [timer.py:260:stop] epoch=3/micro_step=1/global_step=40, RunningAvgSamplesPerSec=8.805863440078602, CurrSamplesPerSec=8.48791662450673, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
349
  Model Parameters: 6.927 B, Latency: 3.77s, TFLOPs: 11.10, Samples/sec: 2.12, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
350
- Model Parameters: 6.927 B, Latency: 3.78s, TFLOPs: 11.06, Samples/sec: 2.11, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
 
 
 
351
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
352
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
353
- Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
354
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
355
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
356
- Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
357
- Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
358
- Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.43, Samples/sec: 2.18, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
359
- [2023-12-11 18:43:29,661] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=0, lr=[1.2134356400744368e-06, 6.28723129572247e-05, 1.2134356400744368e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
360
- [2023-12-11 18:43:29,661] [INFO] [timer.py:260:stop] epoch=3/micro_step=11/global_step=50, RunningAvgSamplesPerSec=8.788842827422865, CurrSamplesPerSec=8.75888550529353, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
361
- Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.45, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
362
  Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.45, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
363
- Model Parameters: 6.927 B, Latency: 3.27s, TFLOPs: 12.81, Samples/sec: 2.45, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
 
 
 
 
 
364
  ***** Evaluating perplexity, Epoch 4/5 *****
365
  Invalidate trace cache @ step 0: expected module 0, but got module 6
366
  ppl: 1.0032395124435425, loss: 0.0032342304475605488
367
  Beginning of Epoch 5/5, Total Micro Batches 13
 
368
  Model Parameters: 6.927 B, Latency: 3.79s, TFLOPs: 11.05, Samples/sec: 2.11, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
369
- Model Parameters: 6.927 B, Latency: 3.79s, TFLOPs: 11.05, Samples/sec: 2.11, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
370
  Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.45, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
 
 
 
 
 
 
 
371
  Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
372
  Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
373
- Model Parameters: 6.927 B, Latency: 3.67s, TFLOPs: 11.42, Samples/sec: 2.18, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
374
- Model Parameters: 6.927 B, Latency: 3.67s, TFLOPs: 11.40, Samples/sec: 2.18, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
375
- [2023-12-11 18:44:08,017] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=0, lr=[1.4020573091929905e-07, 7.2645456434869975e-06, 1.4020573091929905e-07], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
376
- [2023-12-11 18:44:08,018] [INFO] [timer.py:260:stop] epoch=4/micro_step=8/global_step=60, RunningAvgSamplesPerSec=8.7865790752149, CurrSamplesPerSec=8.748600897610435, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
377
- Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
378
- Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.43, Samples/sec: 2.18, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
379
- Model Parameters: 6.927 B, Latency: 3.67s, TFLOPs: 11.40, Samples/sec: 2.18, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
380
- Model Parameters: 6.927 B, Latency: 3.67s, TFLOPs: 11.39, Samples/sec: 2.18, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
381
- Model Parameters: 6.927 B, Latency: 3.68s, TFLOPs: 11.38, Samples/sec: 2.17, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
382
- Model Parameters: 6.927 B, Latency: 3.27s, TFLOPs: 12.81, Samples/sec: 2.45, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
383
  ***** Evaluating perplexity, Epoch 5/5 *****
384
  Invalidate trace cache @ step 0: expected module 0, but got module 6
385
  ppl: 1.003004550933838, loss: 0.0030000172555446625
386
  saving the final model ...
387
- [2023-12-11 18:44:41,184] [INFO] [launch.py:347:main] Process 2269765 exits successfully.
388
- [2023-12-11 18:44:42,473] [INFO] [launch.py:347:main] Process 2269766 exits successfully.
389
- [2023-12-11 18:44:42,474] [INFO] [launch.py:347:main] Process 2269767 exits successfully.
390
- [2023-12-11 18:46:44,489] [INFO] [launch.py:347:main] Process 2269764 exits successfully.
 
1
+ [2023-12-11 20:12:03,965] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2
+ [2023-12-11 20:12:05,820] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
3
+ [2023-12-11 20:12:05,820] [INFO] [runner.py:570:main] cmd = /home/t-sokumar/miniconda3/envs/ft/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --data_path local/jsonfile --data_split 1,0,0 --model_name_or_path codellama/CodeLlama-7b-hf --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --max_seq_len 512 --learning_rate 9.65e-6 --weight_decay 0. --num_train_epochs 5 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --num_warmup_steps 0 --seed 1234 --gradient_checkpointing --zero_stage 3 --deepspeed --lora_dim 128 --lora_module_name layers. --output_dir ./output_step1_Codellama_7b_lora_llamahub-devrev --add_eot_token
4
+ [2023-12-11 20:12:08,529] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
5
+ [2023-12-11 20:12:10,776] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
6
+ [2023-12-11 20:12:10,776] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=4, node_rank=0
7
+ [2023-12-11 20:12:10,776] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
8
+ [2023-12-11 20:12:10,776] [INFO] [launch.py:163:main] dist_world_size=4
9
+ [2023-12-11 20:12:10,776] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
10
+ [2023-12-11 20:12:14,340] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
11
+ [2023-12-11 20:12:14,349] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
12
+ [2023-12-11 20:12:14,559] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
13
+ [2023-12-11 20:12:14,602] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
14
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
15
  warnings.warn(
16
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
17
  warnings.warn(
18
+ [2023-12-11 20:12:15,940] [INFO] [comm.py:637:init_distributed] cdb=None
19
+ [2023-12-11 20:12:15,940] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
20
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
21
  warnings.warn(
22
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
23
  warnings.warn(
24
+ [2023-12-11 20:12:16,326] [INFO] [comm.py:637:init_distributed] cdb=None
25
+ [2023-12-11 20:12:16,414] [INFO] [comm.py:637:init_distributed] cdb=None
26
+ [2023-12-11 20:12:16,446] [INFO] [comm.py:637:init_distributed] cdb=None
 
 
27
  The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
28
  The tokenizer class you load from this checkpoint is 'CodeLlamaTokenizer'.
29
  The class this function is called from is 'LlamaTokenizer'.
 
40
  You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
41
  You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
42
  You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
43
+ [2023-12-11 20:12:19,202] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 291, num_elems = 6.74B
44
+
45
+
46
+
47
+
48
  Using /home/t-sokumar/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
49
  Using /home/t-sokumar/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
50
  Using /home/t-sokumar/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
 
55
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
56
  ninja: no work to do.
57
  Loading extension module fused_adam...
58
+ Time to load fused_adam op: 0.10928606986999512 seconds
 
 
59
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
60
  self._dummy_overflow_buf = get_accelerator().IntTensor([0])
61
+ Loading extension module fused_adam...
62
+ Loading extension module fused_adam...
63
+ Loading extension module fused_adam...
64
+ Time to load fused_adam op: 0.20180773735046387 seconds
65
+ Time to load fused_adam op: 0.2018909454345703 seconds
66
+ Time to load fused_adam op: 0.20151114463806152 seconds
67
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
68
  self._dummy_overflow_buf = get_accelerator().IntTensor([0])
 
 
 
 
 
 
 
 
 
 
 
69
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
70
  self._dummy_overflow_buf = get_accelerator().IntTensor([0])
 
 
71
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
72
  self._dummy_overflow_buf = get_accelerator().IntTensor([0])
73
+ [2023-12-11 20:12:28,877] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.4, git-hash=unknown, git-branch=unknown
74
+ [2023-12-11 20:12:28,877] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
75
+ [2023-12-11 20:12:28,899] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
76
+ [2023-12-11 20:12:28,901] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
77
+ [2023-12-11 20:12:28,901] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
78
+ [2023-12-11 20:12:28,939] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
79
+ [2023-12-11 20:12:28,939] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
80
+ [2023-12-11 20:12:28,939] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
81
+ [2023-12-11 20:12:28,940] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 3 optimizer
82
+ [2023-12-11 20:12:29,054] [INFO] [utils.py:795:see_memory_usage] Stage 3 initialize beginning
83
+ [2023-12-11 20:12:29,055] [INFO] [utils.py:796:see_memory_usage] MA 4.37 GB Max_MA 4.75 GB CA 8.93 GB Max_CA 9 GB
84
+ [2023-12-11 20:12:29,055] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 95.76 GB, percent = 38.1%
85
+ [2023-12-11 20:12:29,057] [INFO] [stage3.py:127:__init__] Reduce bucket size 500,000,000
86
+ [2023-12-11 20:12:29,057] [INFO] [stage3.py:128:__init__] Prefetch bucket size 30000000
87
+ [2023-12-11 20:12:29,164] [INFO] [utils.py:795:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
88
+ [2023-12-11 20:12:29,165] [INFO] [utils.py:796:see_memory_usage] MA 4.37 GB Max_MA 4.37 GB CA 8.93 GB Max_CA 9 GB
89
+ [2023-12-11 20:12:29,165] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 95.77 GB, percent = 38.1%
90
  Parameter Offload: Total persistent parameters: 266240 in 65 params
91
+ [2023-12-11 20:12:29,482] [INFO] [utils.py:795:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
92
+ [2023-12-11 20:12:29,483] [INFO] [utils.py:796:see_memory_usage] MA 3.54 GB Max_MA 4.43 GB CA 8.94 GB Max_CA 9 GB
93
+ [2023-12-11 20:12:29,483] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 95.79 GB, percent = 38.1%
94
+ [2023-12-11 20:12:29,597] [INFO] [utils.py:795:see_memory_usage] Before creating fp16 partitions
95
+ [2023-12-11 20:12:29,598] [INFO] [utils.py:796:see_memory_usage] MA 3.54 GB Max_MA 3.54 GB CA 8.94 GB Max_CA 9 GB
96
+ [2023-12-11 20:12:29,598] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 95.78 GB, percent = 38.1%
97
+ [2023-12-11 20:12:30,301] [INFO] [utils.py:795:see_memory_usage] After creating fp16 partitions: 3
98
+ [2023-12-11 20:12:30,301] [INFO] [utils.py:796:see_memory_usage] MA 3.54 GB Max_MA 3.54 GB CA 5.46 GB Max_CA 9 GB
99
+ [2023-12-11 20:12:30,348] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 96.3 GB, percent = 38.3%
100
+ [2023-12-11 20:12:30,468] [INFO] [utils.py:795:see_memory_usage] Before creating fp32 partitions
101
+ [2023-12-11 20:12:30,469] [INFO] [utils.py:796:see_memory_usage] MA 3.54 GB Max_MA 3.54 GB CA 5.46 GB Max_CA 5 GB
102
+ [2023-12-11 20:12:30,469] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.01 GB, percent = 37.0%
103
+ [2023-12-11 20:12:30,579] [INFO] [utils.py:795:see_memory_usage] After creating fp32 partitions
104
+ [2023-12-11 20:12:30,580] [INFO] [utils.py:796:see_memory_usage] MA 4.09 GB Max_MA 4.24 GB CA 6.16 GB Max_CA 6 GB
105
+ [2023-12-11 20:12:30,580] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.01 GB, percent = 37.0%
106
+ [2023-12-11 20:12:30,689] [INFO] [utils.py:795:see_memory_usage] Before initializing optimizer states
107
+ [2023-12-11 20:12:30,690] [INFO] [utils.py:796:see_memory_usage] MA 4.09 GB Max_MA 4.09 GB CA 6.16 GB Max_CA 6 GB
108
+ [2023-12-11 20:12:30,690] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.01 GB, percent = 37.0%
109
+ [2023-12-11 20:12:30,815] [INFO] [utils.py:795:see_memory_usage] After initializing optimizer states
110
+ [2023-12-11 20:12:30,815] [INFO] [utils.py:796:see_memory_usage] MA 5.17 GB Max_MA 5.47 GB CA 7.54 GB Max_CA 8 GB
111
+ [2023-12-11 20:12:30,815] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.02 GB, percent = 37.0%
112
+ [2023-12-11 20:12:30,816] [INFO] [stage3.py:479:_setup_for_real_optimizer] optimizer state initialized
113
+ [2023-12-11 20:12:31,320] [INFO] [utils.py:795:see_memory_usage] After initializing ZeRO optimizer
114
+ [2023-12-11 20:12:31,321] [INFO] [utils.py:796:see_memory_usage] MA 6.38 GB Max_MA 6.86 GB CA 9.23 GB Max_CA 9 GB
115
+ [2023-12-11 20:12:31,321] [INFO] [utils.py:803:see_memory_usage] CPU Virtual Memory: used = 93.01 GB, percent = 37.0%
116
+ [2023-12-11 20:12:31,321] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
117
+ [2023-12-11 20:12:31,322] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
118
+ [2023-12-11 20:12:31,322] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f31e5b4f890>
119
+ [2023-12-11 20:12:31,322] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[9.65e-06, 0.0005, 9.65e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
120
+ [2023-12-11 20:12:31,323] [INFO] [config.py:979:print] DeepSpeedEngine configuration:
121
+ [2023-12-11 20:12:31,323] [INFO] [config.py:983:print] activation_checkpointing_config {
122
  "partition_activations": false,
123
  "contiguous_memory_optimization": false,
124
  "cpu_checkpointing": false,
 
126
  "synchronize_checkpoint_boundary": false,
127
  "profile": false
128
  }
129
+ [2023-12-11 20:12:31,323] [INFO] [config.py:983:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
130
+ [2023-12-11 20:12:31,323] [INFO] [config.py:983:print] amp_enabled .................. False
131
+ [2023-12-11 20:12:31,323] [INFO] [config.py:983:print] amp_params ................... False
132
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] autotuning_config ............ {
133
  "enabled": false,
134
  "start_step": null,
135
  "end_step": null,
 
154
  "min_train_micro_batch_size_per_gpu": 1,
155
  "num_tuning_micro_batch_sizes": 3
156
  }
157
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] bfloat16_enabled ............. False
158
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] checkpoint_parallel_write_pipeline False
159
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] checkpoint_tag_validation_enabled True
160
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] checkpoint_tag_validation_fail False
161
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f3193907bd0>
162
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] communication_data_type ...... None
163
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
164
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] curriculum_enabled_legacy .... False
165
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] curriculum_params_legacy ..... False
166
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
167
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] data_efficiency_enabled ...... False
168
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] dataloader_drop_last ......... False
169
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] disable_allgather ............ False
170
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] dump_state ................... False
171
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
172
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] eigenvalue_enabled ........... False
173
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] eigenvalue_gas_boundary_resolution 1
174
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] eigenvalue_layer_name ........ bert.encoder.layer
175
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] eigenvalue_layer_num ......... 0
176
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] eigenvalue_max_iter .......... 100
177
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] eigenvalue_stability ......... 1e-06
178
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] eigenvalue_tol ............... 0.01
179
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] eigenvalue_verbose ........... False
180
+ [2023-12-11 20:12:31,324] [INFO] [config.py:983:print] elasticity_enabled ........... False
181
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] flops_profiler_config ........ {
182
  "enabled": false,
183
  "recompute_fwd_factor": 0.0,
184
  "profile_step": 1,
 
187
  "detailed": true,
188
  "output_file": null
189
  }
190
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] fp16_auto_cast ............... False
191
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] fp16_enabled ................. True
192
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] fp16_master_weights_and_gradients False
193
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] global_rank .................. 0
194
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] grad_accum_dtype ............. None
195
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] gradient_accumulation_steps .. 1
196
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] gradient_clipping ............ 1.0
197
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] gradient_predivide_factor .... 1.0
198
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
199
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] initial_dynamic_scale ........ 65536
200
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] load_universal_checkpoint .... False
201
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] loss_scale ................... 0
202
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] memory_breakdown ............. False
203
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] mics_hierarchial_params_gather False
204
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] mics_shard_size .............. -1
205
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='step1_tensorboard/ds_tensorboard_logs/', job_name='step1_model_tensorboard') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
206
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] nebula_config ................ {
207
  "enabled": false,
208
  "persistent_storage_path": null,
209
  "persistent_time_interval": 100,
 
211
  "enable_nebula_load": true,
212
  "load_path": null
213
  }
214
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] optimizer_legacy_fusion ...... False
215
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] optimizer_name ............... None
216
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] optimizer_params ............. None
217
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
218
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] pld_enabled .................. False
219
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] pld_params ................... False
220
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] prescale_gradients ........... False
221
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] scheduler_name ............... None
222
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] scheduler_params ............. None
223
+ [2023-12-11 20:12:31,325] [INFO] [config.py:983:print] seq_parallel_communication_data_type torch.float32
224
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] sparse_attention ............. None
225
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] sparse_gradients_enabled ..... False
226
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] steps_per_print .............. 10
227
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] train_batch_size ............. 32
228
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] train_micro_batch_size_per_gpu 8
229
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] use_data_before_expert_parallel_ False
230
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] use_node_local_storage ....... False
231
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] wall_clock_breakdown ......... False
232
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] weight_quantization_config ... None
233
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] world_size ................... 4
234
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] zero_allow_untested_optimizer False
235
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False pipeline_loading_checkpoint=False override_module_apply=True
236
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] zero_enabled ................. True
237
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] zero_force_ds_cpu_optimizer .. True
238
+ [2023-12-11 20:12:31,326] [INFO] [config.py:983:print] zero_optimization_stage ...... 3
239
+ [2023-12-11 20:12:31,326] [INFO] [config.py:969:print_user_config] json = {
240
  "train_batch_size": 32,
241
  "train_micro_batch_size_per_gpu": 8,
242
  "steps_per_print": 10,
 
286
  warnings.warn(
287
  /home/t-sokumar/miniconda3/envs/ft/lib/python3.11/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
288
  warnings.warn(
289
+ Model Parameters: 6.927 B, Latency: 4.17s, TFLOPs: 10.04, Samples/sec: 1.92, Time/seq 0.52s, Batch Size: 8, Sequence Length: 512
290
  Invalidate trace cache @ step 0: expected module 6, but got module 0
291
+ Model Parameters: 6.927 B, Latency: 3.74s, TFLOPs: 11.20, Samples/sec: 2.14, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
292
+ Model Parameters: 6.927 B, Latency: 3.76s, TFLOPs: 11.14, Samples/sec: 2.13, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
 
 
 
293
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
294
  Model Parameters: 6.927 B, Latency: 3.63s, TFLOPs: 11.53, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
295
+ Model Parameters: 6.927 B, Latency: 3.63s, TFLOPs: 11.53, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
296
+ Model Parameters: 6.927 B, Latency: 3.63s, TFLOPs: 11.53, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
297
  Model Parameters: 6.927 B, Latency: 3.63s, TFLOPs: 11.52, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
298
  Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.51, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
299
+ [2023-12-11 20:13:11,248] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=0, lr=[9.097325323776738e-06, 0.00047136400641330245, 9.097325323776738e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
300
+ [2023-12-11 20:13:11,248] [INFO] [timer.py:260:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=8.766147695613881, CurrSamplesPerSec=8.809815752797453, MemAllocated=6.88GB, MaxMemAllocated=10.68GB
301
+ Model Parameters: 6.927 B, Latency: 3.63s, TFLOPs: 11.52, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
302
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.51, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
303
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.51, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
304
  Model Parameters: 6.927 B, Latency: 3.24s, TFLOPs: 12.90, Samples/sec: 2.47, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
305
  ***** Evaluating perplexity, Epoch 1/5 *****
306
  Invalidate trace cache @ step 0: expected module 0, but got module 6
307
  ppl: 1.6560871601104736, loss: 0.5044576525688171
308
  Beginning of Epoch 2/5, Total Micro Batches 13
309
+ Model Parameters: 6.927 B, Latency: 3.75s, TFLOPs: 11.15, Samples/sec: 2.13, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
310
+ Model Parameters: 6.927 B, Latency: 3.76s, TFLOPs: 11.15, Samples/sec: 2.13, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
 
 
311
  Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
312
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.51, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
313
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.50, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
314
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.50, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
315
+ [2023-12-11 20:13:49,353] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=0, lr=[7.565912402977827e-06, 0.00039201618668278893, 7.565912402977827e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
316
+ [2023-12-11 20:13:49,354] [INFO] [timer.py:260:stop] epoch=1/micro_step=7/global_step=20, RunningAvgSamplesPerSec=8.803895836862662, CurrSamplesPerSec=8.791045583607062, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
317
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.50, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
318
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.51, Samples/sec: 2.20, Time/seq 0.45s, Batch Size: 8, Sequence Length: 512
319
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.50, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
320
  Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.50, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
 
 
 
 
 
321
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
322
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
323
+ Model Parameters: 6.927 B, Latency: 3.25s, TFLOPs: 12.88, Samples/sec: 2.46, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
324
  ***** Evaluating perplexity, Epoch 2/5 *****
325
  Invalidate trace cache @ step 0: expected module 0, but got module 6
326
  ppl: 1.0178232192993164, loss: 0.01766625978052616
327
  Beginning of Epoch 3/5, Total Micro Batches 13
328
+ Model Parameters: 6.927 B, Latency: 3.76s, TFLOPs: 11.13, Samples/sec: 2.13, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
329
+ Model Parameters: 6.927 B, Latency: 3.77s, TFLOPs: 11.09, Samples/sec: 2.12, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
330
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
331
+ [2023-12-11 20:14:27,532] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=0, lr=[5.4065894822319335e-06, 0.0002801341700638307, 5.4065894822319335e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
332
+ [2023-12-11 20:14:27,533] [INFO] [timer.py:260:stop] epoch=2/micro_step=4/global_step=30, RunningAvgSamplesPerSec=8.808840107678392, CurrSamplesPerSec=8.779266138519437, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
 
 
333
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.48, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
334
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
335
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
336
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
337
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.48, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
338
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.49, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
339
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
340
+ Model Parameters: 6.927 B, Latency: 3.64s, TFLOPs: 11.49, Samples/sec: 2.20, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
 
341
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
342
+ Model Parameters: 6.927 B, Latency: 3.25s, TFLOPs: 12.86, Samples/sec: 2.46, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
343
  ***** Evaluating perplexity, Epoch 3/5 *****
344
  Invalidate trace cache @ step 0: expected module 0, but got module 6
345
  ppl: 1.0056875944137573, loss: 0.005671397782862186
346
  Beginning of Epoch 4/5, Total Micro Batches 13
347
+ [2023-12-11 20:15:05,601] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=0, lr=[3.1140314200197657e-06, 0.00016134877823936609, 3.1140314200197657e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
348
+ [2023-12-11 20:15:05,601] [INFO] [timer.py:260:stop] epoch=3/micro_step=1/global_step=40, RunningAvgSamplesPerSec=8.818374436983056, CurrSamplesPerSec=8.49120081099869, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
349
  Model Parameters: 6.927 B, Latency: 3.77s, TFLOPs: 11.10, Samples/sec: 2.12, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
350
+ Model Parameters: 6.927 B, Latency: 3.77s, TFLOPs: 11.09, Samples/sec: 2.12, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
351
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
352
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
353
+ Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
354
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
355
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
 
356
  Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
 
 
 
 
 
 
357
  Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.45, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
358
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
359
+ [2023-12-11 20:15:42,281] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=0, lr=[1.2134356400744368e-06, 6.28723129572247e-05, 1.2134356400744368e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
360
+ [2023-12-11 20:15:42,281] [INFO] [timer.py:260:stop] epoch=3/micro_step=11/global_step=50, RunningAvgSamplesPerSec=8.800315028679389, CurrSamplesPerSec=8.764479266712412, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
361
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
362
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.47, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
363
+ Model Parameters: 6.927 B, Latency: 3.27s, TFLOPs: 12.79, Samples/sec: 2.44, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
364
  ***** Evaluating perplexity, Epoch 4/5 *****
365
  Invalidate trace cache @ step 0: expected module 0, but got module 6
366
  ppl: 1.0032395124435425, loss: 0.0032342304475605488
367
  Beginning of Epoch 5/5, Total Micro Batches 13
368
+ Model Parameters: 6.927 B, Latency: 3.77s, TFLOPs: 11.09, Samples/sec: 2.12, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
369
  Model Parameters: 6.927 B, Latency: 3.79s, TFLOPs: 11.05, Samples/sec: 2.11, Time/seq 0.47s, Batch Size: 8, Sequence Length: 512
370
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
371
  Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.45, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
372
+ Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.45, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
373
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
374
+ Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.43, Samples/sec: 2.18, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
375
+ [2023-12-11 20:16:20,586] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=0, lr=[1.4020573091929905e-07, 7.2645456434869975e-06, 1.4020573091929905e-07], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
376
+ [2023-12-11 20:16:20,586] [INFO] [timer.py:260:stop] epoch=4/micro_step=8/global_step=60, RunningAvgSamplesPerSec=8.798149665169436, CurrSamplesPerSec=8.756539739490163, MemAllocated=6.88GB, MaxMemAllocated=11.06GB
377
+ Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.45, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
378
+ Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.45, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
379
+ Model Parameters: 6.927 B, Latency: 3.65s, TFLOPs: 11.46, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
380
  Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
381
  Model Parameters: 6.927 B, Latency: 3.66s, TFLOPs: 11.44, Samples/sec: 2.19, Time/seq 0.46s, Batch Size: 8, Sequence Length: 512
382
+ Model Parameters: 6.927 B, Latency: 3.28s, TFLOPs: 12.77, Samples/sec: 2.44, Time/seq 0.41s, Batch Size: 8, Sequence Length: 512
 
 
 
 
 
 
 
 
 
383
  ***** Evaluating perplexity, Epoch 5/5 *****
384
  Invalidate trace cache @ step 0: expected module 0, but got module 6
385
  ppl: 1.003004550933838, loss: 0.0030000172555446625
386
  saving the final model ...
387
+ [2023-12-11 20:16:53,814] [INFO] [launch.py:347:main] Process 2392412 exits successfully.
388
+ [2023-12-11 20:16:54,182] [INFO] [launch.py:347:main] Process 2392414 exits successfully.
389
+ [2023-12-11 20:16:54,182] [INFO] [launch.py:347:main] Process 2392413 exits successfully.
390
+ [2023-12-11 20:18:58,197] [INFO] [launch.py:347:main] Process 2392411 exits successfully.