|
/home/cfruan/.conda/envs/mlc-source-311/bin/python -m mlc_chat gen_config /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5 --quantization q0f16 --conv-template phi-2 --output /tmp/tmpec4rcejp |
|
[2023-12-28 23:32:32] INFO auto_config.py:115: [92mFound[0m model configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/config.json |
|
[2023-12-28 23:32:32] INFO auto_config.py:151: [92mFound[0m model type: [1mphi-msft[0m. Use `--model-type` to override. |
|
[2023-12-28 23:32:32] INFO phi_model.py:59: [1mcontext_window_size[0m not found in config.json. Falling back to n_positions (2048) |
|
[2023-12-28 23:32:32] INFO gen_config.py:129: [91mNot found[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/tokenizer.model |
|
[2023-12-28 23:32:32] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/tokenizer.json. Copying to [1m/tmp/tmpec4rcejp/tokenizer.json[0m |
|
[2023-12-28 23:32:32] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/vocab.json. Copying to [1m/tmp/tmpec4rcejp/vocab.json[0m |
|
[2023-12-28 23:32:32] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/merges.txt. Copying to [1m/tmp/tmpec4rcejp/merges.txt[0m |
|
[2023-12-28 23:32:32] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/added_tokens.json. Copying to [1m/tmp/tmpec4rcejp/added_tokens.json[0m |
|
[2023-12-28 23:32:32] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/tokenizer_config.json. Copying to [1m/tmp/tmpec4rcejp/tokenizer_config.json[0m |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1mpad_token_id[0m: 0 |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1mbos_token_id[0m: 1 |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1meos_token_id[0m: 2 |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1mtemperature[0m: 0.7 |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1mrepetition_penalty[0m: 1.0 |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1mtop_p[0m: 0.95 |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1mmean_gen_len[0m: 128 |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1mmax_gen_len[0m: 512 |
|
[2023-12-28 23:32:32] INFO gen_config.py:69: [System default] Setting [1mshift_fill_factor[0m: 0.3 |
|
[2023-12-28 23:32:32] INFO gen_config.py:157: Dumping configuration file to: [1m/tmp/tmpec4rcejp/mlc-chat-config.json[0m |
|
/home/cfruan/.conda/envs/mlc-source-311/bin/python -m mlc_chat convert_weight /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5 --quantization q0f16 --source-format auto --output /tmp/tmpec4rcejp |
|
[2023-12-28 23:32:32] INFO auto_config.py:115: [92mFound[0m model configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/config.json |
|
[2023-12-28 23:32:32] INFO auto_device.py:76: [92mFound[0m device: cuda:0 |
|
[2023-12-28 23:32:32] INFO auto_device.py:76: [92mFound[0m device: cuda:1 |
|
[2023-12-28 23:32:33] INFO auto_device.py:85: [91mNot found[0m device: rocm:0 |
|
[2023-12-28 23:32:33] INFO auto_device.py:85: [91mNot found[0m device: metal:0 |
|
[2023-12-28 23:32:33] INFO auto_device.py:76: [92mFound[0m device: vulkan:0 |
|
[2023-12-28 23:32:33] INFO auto_device.py:76: [92mFound[0m device: vulkan:1 |
|
[2023-12-28 23:32:33] INFO auto_device.py:76: [92mFound[0m device: vulkan:2 |
|
[2023-12-28 23:32:33] INFO auto_device.py:85: [91mNot found[0m device: opencl:0 |
|
[2023-12-28 23:32:33] INFO auto_device.py:33: Using device: [1mcuda:0[0m |
|
[2023-12-28 23:32:33] INFO auto_weight.py:70: Finding weights in: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5 |
|
[2023-12-28 23:32:33] INFO auto_weight.py:129: [92mFound[0m source weight format: huggingface-torch. Source configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin |
|
[2023-12-28 23:32:33] INFO auto_weight.py:149: [91mNot found[0m Huggingface Safetensor |
|
[2023-12-28 23:32:33] INFO auto_weight.py:106: Using source weight configuration: [1m/ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin[0m. Use `--source` to override. |
|
[2023-12-28 23:32:33] INFO auto_weight.py:110: Using source weight format: [1mhuggingface-torch[0m. Use `--source-format` to override. |
|
[2023-12-28 23:32:33] INFO auto_config.py:151: [92mFound[0m model type: [1mphi-msft[0m. Use `--model-type` to override. |
|
[2023-12-28 23:32:33] INFO phi_model.py:59: [1mcontext_window_size[0m not found in config.json. Falling back to n_positions (2048) |
|
[2023-12-28 23:32:36] INFO huggingface_loader.py:169: Loading HF parameters from: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin |
|
[1mWeight conversion with arguments:[0m |
|
[1m--config[0m /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/config.json |
|
[1m--quantization[0m NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16') |
|
[1m--model-type[0m phi-msft |
|
[1m--device[0m cuda:0 |
|
[1m--source[0m /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin |
|
[1m--source-format[0m huggingface-torch |
|
[1m--output[0m /tmp/tmpec4rcejp |
|
0%| | 0/245 [00:00<?, ?it/s]
[2023-12-28 23:32:37] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.embd.weight[0m", shape: (51200, 2048), dtype: float16 |
|
0%| | 0/245 [00:00<?, ?it/s]
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.ln.weight[0m", shape: (2048,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.ln.bias[0m", shape: (2048,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.ln.weight[0m", shape: (2048,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.ln.bias[0m", shape: (2048,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
0%|β | 1/245 [00:00<00:40, 6.07it/s]
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.ln.weight[0m", shape: (2048,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.ln.bias[0m", shape: (2048,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.ln.weight[0m", shape: (2048,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.ln.bias[0m", shape: (2048,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
8%|ββββββββββ | 20/245 [00:00<00:02, 89.07it/s]
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.ln.weight[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.ln.bias[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.ln.weight[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.ln.bias[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
16%|ββββββββββββββββββββ | 40/245 [00:00<00:01, 130.96it/s]
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.ln.weight[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.ln.bias[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.ln.weight[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.ln.bias[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.ln.weight[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.ln.bias[0m", shape: (2048,), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
25%|ββββββββββββββββββββββββββββββ | 61/245 [00:00<00:01, 159.10it/s]
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.ln.weight[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.ln.bias[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.ln.weight[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.ln.bias[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
34%|βββββββββββββββββββββββββββββββββββββββββ | 84/245 [00:00<00:00, 181.27it/s]
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.ln.weight[0m", shape: (2048,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.ln.bias[0m", shape: (2048,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.ln.weight[0m", shape: (2048,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.ln.bias[0m", shape: (2048,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 108/245 [00:00<00:00, 190.82it/s]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.ln.weight[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.ln.bias[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.ln.weight[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.ln.bias[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.ln.weight[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.ln.bias[0m", shape: (2048,), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 130/245 [00:00<00:00, 192.31it/s]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.ln.weight[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.ln.bias[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.ln.weight[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.ln.bias[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:00<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:01<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:01<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:01<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:01<00:00, 204.35it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 154/245 [00:01<00:00, 204.35it/s]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.ln.weight[0m", shape: (2048,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.ln.bias[0m", shape: (2048,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.ln.weight[0m", shape: (2048,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.ln.bias[0m", shape: (2048,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 178/245 [00:01<00:00, 205.81it/s]
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.ln.weight[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.ln.bias[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.ln.weight[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.ln.bias[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.ln.weight[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.ln.bias[0m", shape: (2048,), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/245 [00:01<00:00, 203.27it/s]
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.ln.weight[0m", shape: (2048,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.ln.bias[0m", shape: (2048,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mixer.Wqkv.weight[0m", shape: (6144, 2048), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mixer.Wqkv.bias[0m", shape: (6144,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mixer.out_proj.weight[0m", shape: (2048, 2048), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mixer.out_proj.bias[0m", shape: (2048,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mlp.fc1.weight[0m", shape: (8192, 2048), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mlp.fc1.bias[0m", shape: (8192,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mlp.fc2.weight[0m", shape: (2048, 8192), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mlp.fc2.bias[0m", shape: (2048,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mlm_head.ln.weight[0m", shape: (2048,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mlm_head.ln.bias[0m", shape: (2048,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mlm_head.linear.weight[0m", shape: (51200, 2048), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
[2023-12-28 23:32:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mlm_head.linear.bias[0m", shape: (51200,), dtype: float16 |
|
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/245 [00:01<00:00, 211.92it/s]
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 245/245 [00:01<00:00, 164.23it/s] |
|
[2023-12-28 23:32:39] INFO huggingface_loader.py:179: Unloading HF weight file: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-1_5/pytorch_model.bin |
|
[2023-12-28 23:32:39] INFO stats.py:71: [92mTime usage[0m: HF loading: 1.897 sec; Pre-quantization mapping: 0.448 sec; Quantization: 0.000 sec |
|
[2023-12-28 23:32:39] INFO stats.py:85: [92mRAM usage[0m: Peak RAM: 2.642 GB. Total bytes loaded from disk: 2.642 GB |
|
[2023-12-28 23:32:39] INFO convert_weight.py:110: [92mParameter size[0m after quantization: 2.642 GB |
|
[2023-12-28 23:32:39] INFO convert_weight.py:115: [92mTotal parameters[0m: 1,418,270,720 |
|
[2023-12-28 23:32:39] INFO convert_weight.py:116: [92mBits per parameter[0m: 16.000 |
|
Start storing to cache /tmp/tmpec4rcejp |
|
[0001/0245] saving transformer.embd.weight
[0002/0245] saving transformer.h.0.ln.weight
[0003/0245] saving transformer.h.0.ln.bias
[0004/0245] saving transformer.h.0.mixer.Wqkv.weight
[0005/0245] saving transformer.h.0.mixer.Wqkv.bias
[0006/0245] saving transformer.h.0.mixer.out_proj.weight
[0007/0245] saving transformer.h.0.mixer.out_proj.bias
[0008/0245] saving transformer.h.0.mlp.fc1.weight
[0009/0245] saving transformer.h.0.mlp.fc1.bias
[0010/0245] saving transformer.h.0.mlp.fc2.weight
[0011/0245] saving transformer.h.0.mlp.fc2.bias
[0012/0245] saving transformer.h.1.ln.weight
[0013/0245] saving transformer.h.1.ln.bias
[0014/0245] saving transformer.h.1.mixer.Wqkv.weight
[0015/0245] saving transformer.h.1.mixer.Wqkv.bias
[0016/0245] saving transformer.h.1.mixer.out_proj.weight
[0017/0245] saving transformer.h.1.mixer.out_proj.bias
[0018/0245] saving transformer.h.1.mlp.fc1.weight
[0019/0245] saving transformer.h.1.mlp.fc1.bias
[0020/0245] saving transformer.h.1.mlp.fc2.weight
[0021/0245] saving transformer.h.1.mlp.fc2.bias
[0022/0245] saving transformer.h.2.ln.weight
[0023/0245] saving transformer.h.2.ln.bias
[0024/0245] saving transformer.h.2.mixer.Wqkv.weight
[0025/0245] saving transformer.h.2.mixer.Wqkv.bias
[0026/0245] saving transformer.h.2.mixer.out_proj.weight
[0027/0245] saving transformer.h.2.mixer.out_proj.bias
[0028/0245] saving transformer.h.2.mlp.fc1.weight
[0029/0245] saving transformer.h.2.mlp.fc1.bias
[0030/0245] saving transformer.h.2.mlp.fc2.weight
[0031/0245] saving transformer.h.2.mlp.fc2.bias
[0032/0245] saving transformer.h.3.ln.weight
[0033/0245] saving transformer.h.3.ln.bias
[0034/0245] saving transformer.h.3.mixer.Wqkv.weight
[0035/0245] saving transformer.h.3.mixer.Wqkv.bias
[0036/0245] saving transformer.h.3.mixer.out_proj.weight
[0037/0245] saving transformer.h.3.mixer.out_proj.bias
[0038/0245] saving transformer.h.3.mlp.fc1.weight
[0039/0245] saving transformer.h.3.mlp.fc1.bias
[0040/0245] saving transformer.h.3.mlp.fc2.weight
[0041/0245] saving transformer.h.3.mlp.fc2.bias
[0042/0245] saving transformer.h.4.ln.weight
[0043/0245] saving transformer.h.4.ln.bias
[0044/0245] saving transformer.h.4.mixer.Wqkv.weight
[0045/0245] saving transformer.h.4.mixer.Wqkv.bias
[0046/0245] saving transformer.h.4.mixer.out_proj.weight
[0047/0245] saving transformer.h.4.mixer.out_proj.bias
[0048/0245] saving transformer.h.4.mlp.fc1.weight
[0049/0245] saving transformer.h.4.mlp.fc1.bias
[0050/0245] saving transformer.h.4.mlp.fc2.weight
[0051/0245] saving transformer.h.4.mlp.fc2.bias
[0052/0245] saving transformer.h.5.ln.weight
[0053/0245] saving transformer.h.5.ln.bias
[0054/0245] saving transformer.h.5.mixer.Wqkv.weight
[0055/0245] saving transformer.h.5.mixer.Wqkv.bias
[0056/0245] saving transformer.h.5.mixer.out_proj.weight
[0057/0245] saving transformer.h.5.mixer.out_proj.bias
[0058/0245] saving transformer.h.5.mlp.fc1.weight
[0059/0245] saving transformer.h.5.mlp.fc1.bias
[0060/0245] saving transformer.h.5.mlp.fc2.weight
[0061/0245] saving transformer.h.5.mlp.fc2.bias
[0062/0245] saving transformer.h.6.ln.weight
[0063/0245] saving transformer.h.6.ln.bias
[0064/0245] saving transformer.h.6.mixer.Wqkv.weight
[0065/0245] saving transformer.h.6.mixer.Wqkv.bias
[0066/0245] saving transformer.h.6.mixer.out_proj.weight
[0067/0245] saving transformer.h.6.mixer.out_proj.bias
[0068/0245] saving transformer.h.6.mlp.fc1.weight
[0069/0245] saving transformer.h.6.mlp.fc1.bias
[0070/0245] saving transformer.h.6.mlp.fc2.weight
[0071/0245] saving transformer.h.6.mlp.fc2.bias
[0072/0245] saving transformer.h.7.ln.weight
[0073/0245] saving transformer.h.7.ln.bias
[0074/0245] saving transformer.h.7.mixer.Wqkv.weight
[0075/0245] saving transformer.h.7.mixer.Wqkv.bias
[0076/0245] saving transformer.h.7.mixer.out_proj.weight
[0077/0245] saving transformer.h.7.mixer.out_proj.bias
[0078/0245] saving transformer.h.7.mlp.fc1.weight
[0079/0245] saving transformer.h.7.mlp.fc1.bias
[0080/0245] saving transformer.h.7.mlp.fc2.weight
[0081/0245] saving transformer.h.7.mlp.fc2.bias
[0082/0245] saving transformer.h.8.ln.weight
[0083/0245] saving transformer.h.8.ln.bias
[0084/0245] saving transformer.h.8.mixer.Wqkv.weight
[0085/0245] saving transformer.h.8.mixer.Wqkv.bias
[0086/0245] saving transformer.h.8.mixer.out_proj.weight
[0087/0245] saving transformer.h.8.mixer.out_proj.bias
[0088/0245] saving transformer.h.8.mlp.fc1.weight
[0089/0245] saving transformer.h.8.mlp.fc1.bias
[0090/0245] saving transformer.h.8.mlp.fc2.weight
[0091/0245] saving transformer.h.8.mlp.fc2.bias
[0092/0245] saving transformer.h.9.ln.weight
[0093/0245] saving transformer.h.9.ln.bias
[0094/0245] saving transformer.h.9.mixer.Wqkv.weight
[0095/0245] saving transformer.h.9.mixer.Wqkv.bias
[0096/0245] saving transformer.h.9.mixer.out_proj.weight
[0097/0245] saving transformer.h.9.mixer.out_proj.bias
[0098/0245] saving transformer.h.9.mlp.fc1.weight
[0099/0245] saving transformer.h.9.mlp.fc1.bias
[0100/0245] saving transformer.h.9.mlp.fc2.weight
[0101/0245] saving transformer.h.9.mlp.fc2.bias
[0102/0245] saving transformer.h.10.ln.weight
[0103/0245] saving transformer.h.10.ln.bias
[0104/0245] saving transformer.h.10.mixer.Wqkv.weight
[0105/0245] saving transformer.h.10.mixer.Wqkv.bias
[0106/0245] saving transformer.h.10.mixer.out_proj.weight
[0107/0245] saving transformer.h.10.mixer.out_proj.bias
[0108/0245] saving transformer.h.10.mlp.fc1.weight
[0109/0245] saving transformer.h.10.mlp.fc1.bias
[0110/0245] saving transformer.h.10.mlp.fc2.weight
[0111/0245] saving transformer.h.10.mlp.fc2.bias
[0112/0245] saving transformer.h.11.ln.weight
[0113/0245] saving transformer.h.11.ln.bias
[0114/0245] saving transformer.h.11.mixer.Wqkv.weight
[0115/0245] saving transformer.h.11.mixer.Wqkv.bias
[0116/0245] saving transformer.h.11.mixer.out_proj.weight
[0117/0245] saving transformer.h.11.mixer.out_proj.bias
[0118/0245] saving transformer.h.11.mlp.fc1.weight
[0119/0245] saving transformer.h.11.mlp.fc1.bias
[0120/0245] saving transformer.h.11.mlp.fc2.weight
[0121/0245] saving transformer.h.11.mlp.fc2.bias
[0122/0245] saving transformer.h.12.ln.weight
[0123/0245] saving transformer.h.12.ln.bias
[0124/0245] saving transformer.h.12.mixer.Wqkv.weight
[0125/0245] saving transformer.h.12.mixer.Wqkv.bias
[0126/0245] saving transformer.h.12.mixer.out_proj.weight
[0127/0245] saving transformer.h.12.mixer.out_proj.bias
[0128/0245] saving transformer.h.12.mlp.fc1.weight
[0129/0245] saving transformer.h.12.mlp.fc1.bias
[0130/0245] saving transformer.h.12.mlp.fc2.weight
[0131/0245] saving transformer.h.12.mlp.fc2.bias
[0132/0245] saving transformer.h.13.ln.weight
[0133/0245] saving transformer.h.13.ln.bias
[0134/0245] saving transformer.h.13.mixer.Wqkv.weight
[0135/0245] saving transformer.h.13.mixer.Wqkv.bias
[0136/0245] saving transformer.h.13.mixer.out_proj.weight
[0137/0245] saving transformer.h.13.mixer.out_proj.bias
[0138/0245] saving transformer.h.13.mlp.fc1.weight
[0139/0245] saving transformer.h.13.mlp.fc1.bias
[0140/0245] saving transformer.h.13.mlp.fc2.weight
[0141/0245] saving transformer.h.13.mlp.fc2.bias
[0142/0245] saving transformer.h.14.ln.weight
[0143/0245] saving transformer.h.14.ln.bias
[0144/0245] saving transformer.h.14.mixer.Wqkv.weight
[0145/0245] saving transformer.h.14.mixer.Wqkv.bias
[0146/0245] saving transformer.h.14.mixer.out_proj.weight
[0147/0245] saving transformer.h.14.mixer.out_proj.bias
[0148/0245] saving transformer.h.14.mlp.fc1.weight
[0149/0245] saving transformer.h.14.mlp.fc1.bias
[0150/0245] saving transformer.h.14.mlp.fc2.weight
[0151/0245] saving transformer.h.14.mlp.fc2.bias
[0152/0245] saving transformer.h.15.ln.weight
[0153/0245] saving transformer.h.15.ln.bias
[0154/0245] saving transformer.h.15.mixer.Wqkv.weight
[0155/0245] saving transformer.h.15.mixer.Wqkv.bias
[0156/0245] saving transformer.h.15.mixer.out_proj.weight
[0157/0245] saving transformer.h.15.mixer.out_proj.bias
[0158/0245] saving transformer.h.15.mlp.fc1.weight
[0159/0245] saving transformer.h.15.mlp.fc1.bias
[0160/0245] saving transformer.h.15.mlp.fc2.weight
[0161/0245] saving transformer.h.15.mlp.fc2.bias
[0162/0245] saving transformer.h.16.ln.weight
[0163/0245] saving transformer.h.16.ln.bias
[0164/0245] saving transformer.h.16.mixer.Wqkv.weight
[0165/0245] saving transformer.h.16.mixer.Wqkv.bias
[0166/0245] saving transformer.h.16.mixer.out_proj.weight
[0167/0245] saving transformer.h.16.mixer.out_proj.bias
[0168/0245] saving transformer.h.16.mlp.fc1.weight
[0169/0245] saving transformer.h.16.mlp.fc1.bias
[0170/0245] saving transformer.h.16.mlp.fc2.weight
[0171/0245] saving transformer.h.16.mlp.fc2.bias
[0172/0245] saving transformer.h.17.ln.weight
[0173/0245] saving transformer.h.17.ln.bias
[0174/0245] saving transformer.h.17.mixer.Wqkv.weight
[0175/0245] saving transformer.h.17.mixer.Wqkv.bias
[0176/0245] saving transformer.h.17.mixer.out_proj.weight
[0177/0245] saving transformer.h.17.mixer.out_proj.bias
[0178/0245] saving transformer.h.17.mlp.fc1.weight
[0179/0245] saving transformer.h.17.mlp.fc1.bias
[0180/0245] saving transformer.h.17.mlp.fc2.weight
[0181/0245] saving transformer.h.17.mlp.fc2.bias
[0182/0245] saving transformer.h.18.ln.weight
[0183/0245] saving transformer.h.18.ln.bias
[0184/0245] saving transformer.h.18.mixer.Wqkv.weight
[0185/0245] saving transformer.h.18.mixer.Wqkv.bias
[0186/0245] saving transformer.h.18.mixer.out_proj.weight
[0187/0245] saving transformer.h.18.mixer.out_proj.bias
[0188/0245] saving transformer.h.18.mlp.fc1.weight
[0189/0245] saving transformer.h.18.mlp.fc1.bias
[0190/0245] saving transformer.h.18.mlp.fc2.weight
[0191/0245] saving transformer.h.18.mlp.fc2.bias
[0192/0245] saving transformer.h.19.ln.weight
[0193/0245] saving transformer.h.19.ln.bias
[0194/0245] saving transformer.h.19.mixer.Wqkv.weight
[0195/0245] saving transformer.h.19.mixer.Wqkv.bias
[0196/0245] saving transformer.h.19.mixer.out_proj.weight
[0197/0245] saving transformer.h.19.mixer.out_proj.bias
[0198/0245] saving transformer.h.19.mlp.fc1.weight
[0199/0245] saving transformer.h.19.mlp.fc1.bias
[0200/0245] saving transformer.h.19.mlp.fc2.weight
[0201/0245] saving transformer.h.19.mlp.fc2.bias
[0202/0245] saving transformer.h.20.ln.weight
[0203/0245] saving transformer.h.20.ln.bias
[0204/0245] saving transformer.h.20.mixer.Wqkv.weight
[0205/0245] saving transformer.h.20.mixer.Wqkv.bias
[0206/0245] saving transformer.h.20.mixer.out_proj.weight
[0207/0245] saving transformer.h.20.mixer.out_proj.bias
[0208/0245] saving transformer.h.20.mlp.fc1.weight
[0209/0245] saving transformer.h.20.mlp.fc1.bias
[0210/0245] saving transformer.h.20.mlp.fc2.weight
[0211/0245] saving transformer.h.20.mlp.fc2.bias
[0212/0245] saving transformer.h.21.ln.weight
[0213/0245] saving transformer.h.21.ln.bias
[0214/0245] saving transformer.h.21.mixer.Wqkv.weight
[0215/0245] saving transformer.h.21.mixer.Wqkv.bias
[0216/0245] saving transformer.h.21.mixer.out_proj.weight
[0217/0245] saving transformer.h.21.mixer.out_proj.bias
[0218/0245] saving transformer.h.21.mlp.fc1.weight
[0219/0245] saving transformer.h.21.mlp.fc1.bias
[0220/0245] saving transformer.h.21.mlp.fc2.weight
[0221/0245] saving transformer.h.21.mlp.fc2.bias
[0222/0245] saving transformer.h.22.ln.weight
[0223/0245] saving transformer.h.22.ln.bias
[0224/0245] saving transformer.h.22.mixer.Wqkv.weight
[0225/0245] saving transformer.h.22.mixer.Wqkv.bias
[0226/0245] saving transformer.h.22.mixer.out_proj.weight
[0227/0245] saving transformer.h.22.mixer.out_proj.bias[2023-12-28 23:32:45] INFO convert_weight.py:132: Saved to directory: [1m/tmp/tmpec4rcejp[0m |
|
[0228/0245] saving transformer.h.22.mlp.fc1.weight
[0229/0245] saving transformer.h.22.mlp.fc1.bias
[0230/0245] saving transformer.h.22.mlp.fc2.weight
[0231/0245] saving transformer.h.22.mlp.fc2.bias
[0232/0245] saving transformer.h.23.ln.weight
[0233/0245] saving transformer.h.23.ln.bias
[0234/0245] saving transformer.h.23.mixer.Wqkv.weight
[0235/0245] saving transformer.h.23.mixer.Wqkv.bias
[0236/0245] saving transformer.h.23.mixer.out_proj.weight
[0237/0245] saving transformer.h.23.mixer.out_proj.bias
[0238/0245] saving transformer.h.23.mlp.fc1.weight
[0239/0245] saving transformer.h.23.mlp.fc1.bias
[0240/0245] saving transformer.h.23.mlp.fc2.weight
[0241/0245] saving transformer.h.23.mlp.fc2.bias
[0242/0245] saving lm_head.ln.weight
[0243/0245] saving lm_head.ln.bias
[0244/0245] saving lm_head.linear.weight
[0245/0245] saving lm_head.linear.bias |
|
All finished, 82 total shards committed, record saved to /tmp/tmpec4rcejp/ndarray-cache.json |
|
|