geonmin-kim commited on
Commit
6c7df63
·
verified ·
1 Parent(s): 298def6

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
added_tokens.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "\t\t": 50294,
3
+ "\t\t\t": 50293,
4
+ "\t\t\t\t": 50292,
5
+ "\t\t\t\t\t": 50291,
6
+ "\t\t\t\t\t\t": 50290,
7
+ "\t\t\t\t\t\t\t": 50289,
8
+ "\t\t\t\t\t\t\t\t": 50288,
9
+ "\t\t\t\t\t\t\t\t\t": 50287,
10
+ " ": 50286,
11
+ " ": 50285,
12
+ " ": 50284,
13
+ " ": 50283,
14
+ " ": 50282,
15
+ " ": 50281,
16
+ " ": 50280,
17
+ " ": 50279,
18
+ " ": 50278,
19
+ " ": 50277,
20
+ " ": 50276,
21
+ " ": 50275,
22
+ " ": 50274,
23
+ " ": 50273,
24
+ " ": 50272,
25
+ " ": 50271,
26
+ " ": 50270,
27
+ " ": 50269,
28
+ " ": 50268,
29
+ " ": 50267,
30
+ " ": 50266,
31
+ " ": 50265,
32
+ " ": 50264,
33
+ " ": 50263,
34
+ " ": 50262,
35
+ " ": 50261,
36
+ " ": 50260,
37
+ " ": 50259,
38
+ " ": 50258,
39
+ " ": 50257
40
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
mlc-chat-config.json ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "0.1.0",
3
+ "model_type": "phi",
4
+ "quantization": "q0f16",
5
+ "model_config": {
6
+ "vocab_size": 51200,
7
+ "hidden_size": 2560,
8
+ "intermediate_size": 10240,
9
+ "num_hidden_layers": 20,
10
+ "num_attention_heads": 32,
11
+ "layer_norm_eps": 1e-05,
12
+ "position_embedding_base": 10000.0,
13
+ "partial_rotary_factor": 0.4,
14
+ "num_key_value_heads": 32,
15
+ "context_window_size": 768,
16
+ "prefill_chunk_size": 768,
17
+ "head_dim": 80,
18
+ "tensor_parallel_shards": 1,
19
+ "max_batch_size": 80
20
+ },
21
+ "vocab_size": 51200,
22
+ "context_window_size": 768,
23
+ "sliding_window_size": -1,
24
+ "prefill_chunk_size": 768,
25
+ "attention_sink_size": -1,
26
+ "tensor_parallel_shards": 1,
27
+ "pipeline_parallel_stages": 1,
28
+ "temperature": 1.0,
29
+ "presence_penalty": 0.0,
30
+ "frequency_penalty": 0.0,
31
+ "repetition_penalty": 1.0,
32
+ "top_p": 1.0,
33
+ "tokenizer_files": [
34
+ "tokenizer.json",
35
+ "vocab.json",
36
+ "merges.txt",
37
+ "added_tokens.json",
38
+ "tokenizer_config.json"
39
+ ],
40
+ "tokenizer_info": {
41
+ "token_postproc_method": "byte_level",
42
+ "prepend_space_in_encode": false,
43
+ "strip_space_in_decode": false
44
+ },
45
+ "conv_template": {
46
+ "name": "phi-2",
47
+ "system_template": "{system_message}",
48
+ "system_message": "",
49
+ "system_prefix_token_ids": null,
50
+ "add_role_after_system_message": true,
51
+ "roles": {
52
+ "user": "Instruct",
53
+ "assistant": "Output"
54
+ },
55
+ "role_templates": {
56
+ "user": "{user_message}",
57
+ "assistant": "{assistant_message}",
58
+ "tool": "{tool_message}"
59
+ },
60
+ "messages": [],
61
+ "seps": [
62
+ "\n"
63
+ ],
64
+ "role_content_sep": ": ",
65
+ "role_empty_sep": ":",
66
+ "stop_str": [
67
+ "<|endoftext|>"
68
+ ],
69
+ "stop_token_ids": [
70
+ 50256
71
+ ],
72
+ "function_string": "",
73
+ "use_function_calling": false
74
+ },
75
+ "pad_token_id": 0,
76
+ "bos_token_id": 50256,
77
+ "eos_token_id": 50256
78
+ }
ndarray-cache.json ADDED
@@ -0,0 +1,2717 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 205,
4
+ "ParamBytes": 3671255040.0,
5
+ "BitsPerParam": 16.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 262144000,
12
+ "records": [
13
+ {
14
+ "name": "transformer.embd.weight",
15
+ "shape": [
16
+ 51200,
17
+ 2560
18
+ ],
19
+ "dtype": "float16",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 262144000,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "254b0b3b75b7379a4e1cbd166422d581"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 39321600,
31
+ "records": [
32
+ {
33
+ "name": "transformer.h.0.mixer.Wqkv.weight",
34
+ "shape": [
35
+ 7680,
36
+ 2560
37
+ ],
38
+ "dtype": "float16",
39
+ "format": "f32-to-bf16",
40
+ "nbytes": 39321600,
41
+ "byteOffset": 0
42
+ }
43
+ ],
44
+ "md5sum": "1d86075cab7a17b12bc23b609b0e1a23"
45
+ },
46
+ {
47
+ "dataPath": "params_shard_2.bin",
48
+ "format": "raw-shard",
49
+ "nbytes": 52428800,
50
+ "records": [
51
+ {
52
+ "name": "transformer.h.0.mlp.fc1.weight",
53
+ "shape": [
54
+ 10240,
55
+ 2560
56
+ ],
57
+ "dtype": "float16",
58
+ "format": "f32-to-bf16",
59
+ "nbytes": 52428800,
60
+ "byteOffset": 0
61
+ }
62
+ ],
63
+ "md5sum": "0c420ffbfd711e2131f5271c66102718"
64
+ },
65
+ {
66
+ "dataPath": "params_shard_3.bin",
67
+ "format": "raw-shard",
68
+ "nbytes": 52428800,
69
+ "records": [
70
+ {
71
+ "name": "transformer.h.0.mlp.fc2.weight",
72
+ "shape": [
73
+ 2560,
74
+ 10240
75
+ ],
76
+ "dtype": "float16",
77
+ "format": "f32-to-bf16",
78
+ "nbytes": 52428800,
79
+ "byteOffset": 0
80
+ }
81
+ ],
82
+ "md5sum": "a4bc5b186b6a238f1f486bf8cf3ef93c"
83
+ },
84
+ {
85
+ "dataPath": "params_shard_4.bin",
86
+ "format": "raw-shard",
87
+ "nbytes": 39321600,
88
+ "records": [
89
+ {
90
+ "name": "transformer.h.1.mixer.Wqkv.weight",
91
+ "shape": [
92
+ 7680,
93
+ 2560
94
+ ],
95
+ "dtype": "float16",
96
+ "format": "f32-to-bf16",
97
+ "nbytes": 39321600,
98
+ "byteOffset": 0
99
+ }
100
+ ],
101
+ "md5sum": "04323ae881c8f62a75f30042883adada"
102
+ },
103
+ {
104
+ "dataPath": "params_shard_5.bin",
105
+ "format": "raw-shard",
106
+ "nbytes": 52428800,
107
+ "records": [
108
+ {
109
+ "name": "transformer.h.1.mlp.fc1.weight",
110
+ "shape": [
111
+ 10240,
112
+ 2560
113
+ ],
114
+ "dtype": "float16",
115
+ "format": "f32-to-bf16",
116
+ "nbytes": 52428800,
117
+ "byteOffset": 0
118
+ }
119
+ ],
120
+ "md5sum": "7c6565bf96483a607b80cd1b1e11d976"
121
+ },
122
+ {
123
+ "dataPath": "params_shard_6.bin",
124
+ "format": "raw-shard",
125
+ "nbytes": 52428800,
126
+ "records": [
127
+ {
128
+ "name": "transformer.h.1.mlp.fc2.weight",
129
+ "shape": [
130
+ 2560,
131
+ 10240
132
+ ],
133
+ "dtype": "float16",
134
+ "format": "f32-to-bf16",
135
+ "nbytes": 52428800,
136
+ "byteOffset": 0
137
+ }
138
+ ],
139
+ "md5sum": "8a1c55844664e5c92083d16ded362b0c"
140
+ },
141
+ {
142
+ "dataPath": "params_shard_7.bin",
143
+ "format": "raw-shard",
144
+ "nbytes": 39321600,
145
+ "records": [
146
+ {
147
+ "name": "transformer.h.2.mixer.Wqkv.weight",
148
+ "shape": [
149
+ 7680,
150
+ 2560
151
+ ],
152
+ "dtype": "float16",
153
+ "format": "f32-to-bf16",
154
+ "nbytes": 39321600,
155
+ "byteOffset": 0
156
+ }
157
+ ],
158
+ "md5sum": "98d4e8967f14837a29db76190ab143ab"
159
+ },
160
+ {
161
+ "dataPath": "params_shard_8.bin",
162
+ "format": "raw-shard",
163
+ "nbytes": 26342400,
164
+ "records": [
165
+ {
166
+ "name": "transformer.h.0.mixer.Wqkv.bias",
167
+ "shape": [
168
+ 7680
169
+ ],
170
+ "dtype": "float16",
171
+ "format": "f32-to-bf16",
172
+ "nbytes": 15360,
173
+ "byteOffset": 0
174
+ },
175
+ {
176
+ "name": "transformer.h.0.mixer.out_proj.weight",
177
+ "shape": [
178
+ 2560,
179
+ 2560
180
+ ],
181
+ "dtype": "float16",
182
+ "format": "f32-to-bf16",
183
+ "nbytes": 13107200,
184
+ "byteOffset": 15360
185
+ },
186
+ {
187
+ "name": "transformer.h.0.mixer.out_proj.bias",
188
+ "shape": [
189
+ 2560
190
+ ],
191
+ "dtype": "float16",
192
+ "format": "f32-to-bf16",
193
+ "nbytes": 5120,
194
+ "byteOffset": 13122560
195
+ },
196
+ {
197
+ "name": "transformer.h.0.mlp.fc1.bias",
198
+ "shape": [
199
+ 10240
200
+ ],
201
+ "dtype": "float16",
202
+ "format": "f32-to-bf16",
203
+ "nbytes": 20480,
204
+ "byteOffset": 13127680
205
+ },
206
+ {
207
+ "name": "transformer.h.0.mlp.fc2.bias",
208
+ "shape": [
209
+ 2560
210
+ ],
211
+ "dtype": "float16",
212
+ "format": "f32-to-bf16",
213
+ "nbytes": 5120,
214
+ "byteOffset": 13148160
215
+ },
216
+ {
217
+ "name": "transformer.h.0.ln.weight",
218
+ "shape": [
219
+ 2560
220
+ ],
221
+ "dtype": "float16",
222
+ "format": "f32-to-bf16",
223
+ "nbytes": 5120,
224
+ "byteOffset": 13153280
225
+ },
226
+ {
227
+ "name": "transformer.h.0.ln.bias",
228
+ "shape": [
229
+ 2560
230
+ ],
231
+ "dtype": "float16",
232
+ "format": "f32-to-bf16",
233
+ "nbytes": 5120,
234
+ "byteOffset": 13158400
235
+ },
236
+ {
237
+ "name": "transformer.h.1.mixer.Wqkv.bias",
238
+ "shape": [
239
+ 7680
240
+ ],
241
+ "dtype": "float16",
242
+ "format": "f32-to-bf16",
243
+ "nbytes": 15360,
244
+ "byteOffset": 13163520
245
+ },
246
+ {
247
+ "name": "transformer.h.1.mixer.out_proj.weight",
248
+ "shape": [
249
+ 2560,
250
+ 2560
251
+ ],
252
+ "dtype": "float16",
253
+ "format": "f32-to-bf16",
254
+ "nbytes": 13107200,
255
+ "byteOffset": 13178880
256
+ },
257
+ {
258
+ "name": "transformer.h.1.mixer.out_proj.bias",
259
+ "shape": [
260
+ 2560
261
+ ],
262
+ "dtype": "float16",
263
+ "format": "f32-to-bf16",
264
+ "nbytes": 5120,
265
+ "byteOffset": 26286080
266
+ },
267
+ {
268
+ "name": "transformer.h.1.mlp.fc1.bias",
269
+ "shape": [
270
+ 10240
271
+ ],
272
+ "dtype": "float16",
273
+ "format": "f32-to-bf16",
274
+ "nbytes": 20480,
275
+ "byteOffset": 26291200
276
+ },
277
+ {
278
+ "name": "transformer.h.1.mlp.fc2.bias",
279
+ "shape": [
280
+ 2560
281
+ ],
282
+ "dtype": "float16",
283
+ "format": "f32-to-bf16",
284
+ "nbytes": 5120,
285
+ "byteOffset": 26311680
286
+ },
287
+ {
288
+ "name": "transformer.h.1.ln.weight",
289
+ "shape": [
290
+ 2560
291
+ ],
292
+ "dtype": "float16",
293
+ "format": "f32-to-bf16",
294
+ "nbytes": 5120,
295
+ "byteOffset": 26316800
296
+ },
297
+ {
298
+ "name": "transformer.h.1.ln.bias",
299
+ "shape": [
300
+ 2560
301
+ ],
302
+ "dtype": "float16",
303
+ "format": "f32-to-bf16",
304
+ "nbytes": 5120,
305
+ "byteOffset": 26321920
306
+ },
307
+ {
308
+ "name": "transformer.h.2.mixer.Wqkv.bias",
309
+ "shape": [
310
+ 7680
311
+ ],
312
+ "dtype": "float16",
313
+ "format": "f32-to-bf16",
314
+ "nbytes": 15360,
315
+ "byteOffset": 26327040
316
+ }
317
+ ],
318
+ "md5sum": "329750ff6e5b7163357f147fbe437a69"
319
+ },
320
+ {
321
+ "dataPath": "params_shard_9.bin",
322
+ "format": "raw-shard",
323
+ "nbytes": 52428800,
324
+ "records": [
325
+ {
326
+ "name": "transformer.h.2.mlp.fc1.weight",
327
+ "shape": [
328
+ 10240,
329
+ 2560
330
+ ],
331
+ "dtype": "float16",
332
+ "format": "f32-to-bf16",
333
+ "nbytes": 52428800,
334
+ "byteOffset": 0
335
+ }
336
+ ],
337
+ "md5sum": "92c7a3dac1b0e4c178d695b8de429e98"
338
+ },
339
+ {
340
+ "dataPath": "params_shard_10.bin",
341
+ "format": "raw-shard",
342
+ "nbytes": 52428800,
343
+ "records": [
344
+ {
345
+ "name": "transformer.h.2.mlp.fc2.weight",
346
+ "shape": [
347
+ 2560,
348
+ 10240
349
+ ],
350
+ "dtype": "float16",
351
+ "format": "f32-to-bf16",
352
+ "nbytes": 52428800,
353
+ "byteOffset": 0
354
+ }
355
+ ],
356
+ "md5sum": "0f31d37e0d8d3fcf837f86a7a8c684fc"
357
+ },
358
+ {
359
+ "dataPath": "params_shard_11.bin",
360
+ "format": "raw-shard",
361
+ "nbytes": 39321600,
362
+ "records": [
363
+ {
364
+ "name": "transformer.h.3.mixer.Wqkv.weight",
365
+ "shape": [
366
+ 7680,
367
+ 2560
368
+ ],
369
+ "dtype": "float16",
370
+ "format": "f32-to-bf16",
371
+ "nbytes": 39321600,
372
+ "byteOffset": 0
373
+ }
374
+ ],
375
+ "md5sum": "a6ed72fb47e10f1b5a4c2754c373b7f6"
376
+ },
377
+ {
378
+ "dataPath": "params_shard_12.bin",
379
+ "format": "raw-shard",
380
+ "nbytes": 52428800,
381
+ "records": [
382
+ {
383
+ "name": "transformer.h.3.mlp.fc1.weight",
384
+ "shape": [
385
+ 10240,
386
+ 2560
387
+ ],
388
+ "dtype": "float16",
389
+ "format": "f32-to-bf16",
390
+ "nbytes": 52428800,
391
+ "byteOffset": 0
392
+ }
393
+ ],
394
+ "md5sum": "ec594a40413cf2a2d646b6957e5d03d7"
395
+ },
396
+ {
397
+ "dataPath": "params_shard_13.bin",
398
+ "format": "raw-shard",
399
+ "nbytes": 52428800,
400
+ "records": [
401
+ {
402
+ "name": "transformer.h.3.mlp.fc2.weight",
403
+ "shape": [
404
+ 2560,
405
+ 10240
406
+ ],
407
+ "dtype": "float16",
408
+ "format": "f32-to-bf16",
409
+ "nbytes": 52428800,
410
+ "byteOffset": 0
411
+ }
412
+ ],
413
+ "md5sum": "3ec56e021be377c39b6c0b116bb35628"
414
+ },
415
+ {
416
+ "dataPath": "params_shard_14.bin",
417
+ "format": "raw-shard",
418
+ "nbytes": 39321600,
419
+ "records": [
420
+ {
421
+ "name": "transformer.h.4.mixer.Wqkv.weight",
422
+ "shape": [
423
+ 7680,
424
+ 2560
425
+ ],
426
+ "dtype": "float16",
427
+ "format": "f32-to-bf16",
428
+ "nbytes": 39321600,
429
+ "byteOffset": 0
430
+ }
431
+ ],
432
+ "md5sum": "b082066a6093ab82e5332b22ecce2ff2"
433
+ },
434
+ {
435
+ "dataPath": "params_shard_15.bin",
436
+ "format": "raw-shard",
437
+ "nbytes": 26327040,
438
+ "records": [
439
+ {
440
+ "name": "transformer.h.2.mixer.out_proj.weight",
441
+ "shape": [
442
+ 2560,
443
+ 2560
444
+ ],
445
+ "dtype": "float16",
446
+ "format": "f32-to-bf16",
447
+ "nbytes": 13107200,
448
+ "byteOffset": 0
449
+ },
450
+ {
451
+ "name": "transformer.h.2.mixer.out_proj.bias",
452
+ "shape": [
453
+ 2560
454
+ ],
455
+ "dtype": "float16",
456
+ "format": "f32-to-bf16",
457
+ "nbytes": 5120,
458
+ "byteOffset": 13107200
459
+ },
460
+ {
461
+ "name": "transformer.h.2.mlp.fc1.bias",
462
+ "shape": [
463
+ 10240
464
+ ],
465
+ "dtype": "float16",
466
+ "format": "f32-to-bf16",
467
+ "nbytes": 20480,
468
+ "byteOffset": 13112320
469
+ },
470
+ {
471
+ "name": "transformer.h.2.mlp.fc2.bias",
472
+ "shape": [
473
+ 2560
474
+ ],
475
+ "dtype": "float16",
476
+ "format": "f32-to-bf16",
477
+ "nbytes": 5120,
478
+ "byteOffset": 13132800
479
+ },
480
+ {
481
+ "name": "transformer.h.2.ln.weight",
482
+ "shape": [
483
+ 2560
484
+ ],
485
+ "dtype": "float16",
486
+ "format": "f32-to-bf16",
487
+ "nbytes": 5120,
488
+ "byteOffset": 13137920
489
+ },
490
+ {
491
+ "name": "transformer.h.2.ln.bias",
492
+ "shape": [
493
+ 2560
494
+ ],
495
+ "dtype": "float16",
496
+ "format": "f32-to-bf16",
497
+ "nbytes": 5120,
498
+ "byteOffset": 13143040
499
+ },
500
+ {
501
+ "name": "transformer.h.3.mixer.Wqkv.bias",
502
+ "shape": [
503
+ 7680
504
+ ],
505
+ "dtype": "float16",
506
+ "format": "f32-to-bf16",
507
+ "nbytes": 15360,
508
+ "byteOffset": 13148160
509
+ },
510
+ {
511
+ "name": "transformer.h.3.mixer.out_proj.weight",
512
+ "shape": [
513
+ 2560,
514
+ 2560
515
+ ],
516
+ "dtype": "float16",
517
+ "format": "f32-to-bf16",
518
+ "nbytes": 13107200,
519
+ "byteOffset": 13163520
520
+ },
521
+ {
522
+ "name": "transformer.h.3.mixer.out_proj.bias",
523
+ "shape": [
524
+ 2560
525
+ ],
526
+ "dtype": "float16",
527
+ "format": "f32-to-bf16",
528
+ "nbytes": 5120,
529
+ "byteOffset": 26270720
530
+ },
531
+ {
532
+ "name": "transformer.h.3.mlp.fc1.bias",
533
+ "shape": [
534
+ 10240
535
+ ],
536
+ "dtype": "float16",
537
+ "format": "f32-to-bf16",
538
+ "nbytes": 20480,
539
+ "byteOffset": 26275840
540
+ },
541
+ {
542
+ "name": "transformer.h.3.mlp.fc2.bias",
543
+ "shape": [
544
+ 2560
545
+ ],
546
+ "dtype": "float16",
547
+ "format": "f32-to-bf16",
548
+ "nbytes": 5120,
549
+ "byteOffset": 26296320
550
+ },
551
+ {
552
+ "name": "transformer.h.3.ln.weight",
553
+ "shape": [
554
+ 2560
555
+ ],
556
+ "dtype": "float16",
557
+ "format": "f32-to-bf16",
558
+ "nbytes": 5120,
559
+ "byteOffset": 26301440
560
+ },
561
+ {
562
+ "name": "transformer.h.3.ln.bias",
563
+ "shape": [
564
+ 2560
565
+ ],
566
+ "dtype": "float16",
567
+ "format": "f32-to-bf16",
568
+ "nbytes": 5120,
569
+ "byteOffset": 26306560
570
+ },
571
+ {
572
+ "name": "transformer.h.4.mixer.Wqkv.bias",
573
+ "shape": [
574
+ 7680
575
+ ],
576
+ "dtype": "float16",
577
+ "format": "f32-to-bf16",
578
+ "nbytes": 15360,
579
+ "byteOffset": 26311680
580
+ }
581
+ ],
582
+ "md5sum": "219af8ce693d196d660d6327124fd6a2"
583
+ },
584
+ {
585
+ "dataPath": "params_shard_16.bin",
586
+ "format": "raw-shard",
587
+ "nbytes": 52428800,
588
+ "records": [
589
+ {
590
+ "name": "transformer.h.4.mlp.fc1.weight",
591
+ "shape": [
592
+ 10240,
593
+ 2560
594
+ ],
595
+ "dtype": "float16",
596
+ "format": "f32-to-bf16",
597
+ "nbytes": 52428800,
598
+ "byteOffset": 0
599
+ }
600
+ ],
601
+ "md5sum": "8738ac6db7d61471ae5b40bfc66270aa"
602
+ },
603
+ {
604
+ "dataPath": "params_shard_17.bin",
605
+ "format": "raw-shard",
606
+ "nbytes": 52428800,
607
+ "records": [
608
+ {
609
+ "name": "transformer.h.4.mlp.fc2.weight",
610
+ "shape": [
611
+ 2560,
612
+ 10240
613
+ ],
614
+ "dtype": "float16",
615
+ "format": "f32-to-bf16",
616
+ "nbytes": 52428800,
617
+ "byteOffset": 0
618
+ }
619
+ ],
620
+ "md5sum": "043a62b5c63b81462096ffa61d7e5b2a"
621
+ },
622
+ {
623
+ "dataPath": "params_shard_18.bin",
624
+ "format": "raw-shard",
625
+ "nbytes": 39321600,
626
+ "records": [
627
+ {
628
+ "name": "transformer.h.5.mixer.Wqkv.weight",
629
+ "shape": [
630
+ 7680,
631
+ 2560
632
+ ],
633
+ "dtype": "float16",
634
+ "format": "f32-to-bf16",
635
+ "nbytes": 39321600,
636
+ "byteOffset": 0
637
+ }
638
+ ],
639
+ "md5sum": "00b46be5cc0bcd86c30cd605a5301545"
640
+ },
641
+ {
642
+ "dataPath": "params_shard_19.bin",
643
+ "format": "raw-shard",
644
+ "nbytes": 52428800,
645
+ "records": [
646
+ {
647
+ "name": "transformer.h.5.mlp.fc1.weight",
648
+ "shape": [
649
+ 10240,
650
+ 2560
651
+ ],
652
+ "dtype": "float16",
653
+ "format": "f32-to-bf16",
654
+ "nbytes": 52428800,
655
+ "byteOffset": 0
656
+ }
657
+ ],
658
+ "md5sum": "84e61313f6653d2acec2f206d3dd0a7e"
659
+ },
660
+ {
661
+ "dataPath": "params_shard_20.bin",
662
+ "format": "raw-shard",
663
+ "nbytes": 52428800,
664
+ "records": [
665
+ {
666
+ "name": "transformer.h.5.mlp.fc2.weight",
667
+ "shape": [
668
+ 2560,
669
+ 10240
670
+ ],
671
+ "dtype": "float16",
672
+ "format": "f32-to-bf16",
673
+ "nbytes": 52428800,
674
+ "byteOffset": 0
675
+ }
676
+ ],
677
+ "md5sum": "3bf00c9c71502d112f0c5daafa7f4459"
678
+ },
679
+ {
680
+ "dataPath": "params_shard_21.bin",
681
+ "format": "raw-shard",
682
+ "nbytes": 39321600,
683
+ "records": [
684
+ {
685
+ "name": "transformer.h.6.mixer.Wqkv.weight",
686
+ "shape": [
687
+ 7680,
688
+ 2560
689
+ ],
690
+ "dtype": "float16",
691
+ "format": "f32-to-bf16",
692
+ "nbytes": 39321600,
693
+ "byteOffset": 0
694
+ }
695
+ ],
696
+ "md5sum": "8d32b3f5ea1c6fc504364fac2ddafc36"
697
+ },
698
+ {
699
+ "dataPath": "params_shard_22.bin",
700
+ "format": "raw-shard",
701
+ "nbytes": 26327040,
702
+ "records": [
703
+ {
704
+ "name": "transformer.h.4.mixer.out_proj.weight",
705
+ "shape": [
706
+ 2560,
707
+ 2560
708
+ ],
709
+ "dtype": "float16",
710
+ "format": "f32-to-bf16",
711
+ "nbytes": 13107200,
712
+ "byteOffset": 0
713
+ },
714
+ {
715
+ "name": "transformer.h.4.mixer.out_proj.bias",
716
+ "shape": [
717
+ 2560
718
+ ],
719
+ "dtype": "float16",
720
+ "format": "f32-to-bf16",
721
+ "nbytes": 5120,
722
+ "byteOffset": 13107200
723
+ },
724
+ {
725
+ "name": "transformer.h.4.mlp.fc1.bias",
726
+ "shape": [
727
+ 10240
728
+ ],
729
+ "dtype": "float16",
730
+ "format": "f32-to-bf16",
731
+ "nbytes": 20480,
732
+ "byteOffset": 13112320
733
+ },
734
+ {
735
+ "name": "transformer.h.4.mlp.fc2.bias",
736
+ "shape": [
737
+ 2560
738
+ ],
739
+ "dtype": "float16",
740
+ "format": "f32-to-bf16",
741
+ "nbytes": 5120,
742
+ "byteOffset": 13132800
743
+ },
744
+ {
745
+ "name": "transformer.h.4.ln.weight",
746
+ "shape": [
747
+ 2560
748
+ ],
749
+ "dtype": "float16",
750
+ "format": "f32-to-bf16",
751
+ "nbytes": 5120,
752
+ "byteOffset": 13137920
753
+ },
754
+ {
755
+ "name": "transformer.h.4.ln.bias",
756
+ "shape": [
757
+ 2560
758
+ ],
759
+ "dtype": "float16",
760
+ "format": "f32-to-bf16",
761
+ "nbytes": 5120,
762
+ "byteOffset": 13143040
763
+ },
764
+ {
765
+ "name": "transformer.h.5.mixer.Wqkv.bias",
766
+ "shape": [
767
+ 7680
768
+ ],
769
+ "dtype": "float16",
770
+ "format": "f32-to-bf16",
771
+ "nbytes": 15360,
772
+ "byteOffset": 13148160
773
+ },
774
+ {
775
+ "name": "transformer.h.5.mixer.out_proj.weight",
776
+ "shape": [
777
+ 2560,
778
+ 2560
779
+ ],
780
+ "dtype": "float16",
781
+ "format": "f32-to-bf16",
782
+ "nbytes": 13107200,
783
+ "byteOffset": 13163520
784
+ },
785
+ {
786
+ "name": "transformer.h.5.mixer.out_proj.bias",
787
+ "shape": [
788
+ 2560
789
+ ],
790
+ "dtype": "float16",
791
+ "format": "f32-to-bf16",
792
+ "nbytes": 5120,
793
+ "byteOffset": 26270720
794
+ },
795
+ {
796
+ "name": "transformer.h.5.mlp.fc1.bias",
797
+ "shape": [
798
+ 10240
799
+ ],
800
+ "dtype": "float16",
801
+ "format": "f32-to-bf16",
802
+ "nbytes": 20480,
803
+ "byteOffset": 26275840
804
+ },
805
+ {
806
+ "name": "transformer.h.5.mlp.fc2.bias",
807
+ "shape": [
808
+ 2560
809
+ ],
810
+ "dtype": "float16",
811
+ "format": "f32-to-bf16",
812
+ "nbytes": 5120,
813
+ "byteOffset": 26296320
814
+ },
815
+ {
816
+ "name": "transformer.h.5.ln.weight",
817
+ "shape": [
818
+ 2560
819
+ ],
820
+ "dtype": "float16",
821
+ "format": "f32-to-bf16",
822
+ "nbytes": 5120,
823
+ "byteOffset": 26301440
824
+ },
825
+ {
826
+ "name": "transformer.h.5.ln.bias",
827
+ "shape": [
828
+ 2560
829
+ ],
830
+ "dtype": "float16",
831
+ "format": "f32-to-bf16",
832
+ "nbytes": 5120,
833
+ "byteOffset": 26306560
834
+ },
835
+ {
836
+ "name": "transformer.h.6.mixer.Wqkv.bias",
837
+ "shape": [
838
+ 7680
839
+ ],
840
+ "dtype": "float16",
841
+ "format": "f32-to-bf16",
842
+ "nbytes": 15360,
843
+ "byteOffset": 26311680
844
+ }
845
+ ],
846
+ "md5sum": "3ac6772168ff1c95f73ebc54359eff86"
847
+ },
848
+ {
849
+ "dataPath": "params_shard_23.bin",
850
+ "format": "raw-shard",
851
+ "nbytes": 52428800,
852
+ "records": [
853
+ {
854
+ "name": "transformer.h.6.mlp.fc1.weight",
855
+ "shape": [
856
+ 10240,
857
+ 2560
858
+ ],
859
+ "dtype": "float16",
860
+ "format": "f32-to-bf16",
861
+ "nbytes": 52428800,
862
+ "byteOffset": 0
863
+ }
864
+ ],
865
+ "md5sum": "75da9ae456af72601f8fb85adea7d67a"
866
+ },
867
+ {
868
+ "dataPath": "params_shard_24.bin",
869
+ "format": "raw-shard",
870
+ "nbytes": 52428800,
871
+ "records": [
872
+ {
873
+ "name": "transformer.h.6.mlp.fc2.weight",
874
+ "shape": [
875
+ 2560,
876
+ 10240
877
+ ],
878
+ "dtype": "float16",
879
+ "format": "f32-to-bf16",
880
+ "nbytes": 52428800,
881
+ "byteOffset": 0
882
+ }
883
+ ],
884
+ "md5sum": "15613cc3936c6078dea5be7abb33e697"
885
+ },
886
+ {
887
+ "dataPath": "params_shard_25.bin",
888
+ "format": "raw-shard",
889
+ "nbytes": 39321600,
890
+ "records": [
891
+ {
892
+ "name": "transformer.h.7.mixer.Wqkv.weight",
893
+ "shape": [
894
+ 7680,
895
+ 2560
896
+ ],
897
+ "dtype": "float16",
898
+ "format": "f32-to-bf16",
899
+ "nbytes": 39321600,
900
+ "byteOffset": 0
901
+ }
902
+ ],
903
+ "md5sum": "b1fd7e449fc7911d19c93fd6afda1240"
904
+ },
905
+ {
906
+ "dataPath": "params_shard_26.bin",
907
+ "format": "raw-shard",
908
+ "nbytes": 52428800,
909
+ "records": [
910
+ {
911
+ "name": "transformer.h.7.mlp.fc1.weight",
912
+ "shape": [
913
+ 10240,
914
+ 2560
915
+ ],
916
+ "dtype": "float16",
917
+ "format": "f32-to-bf16",
918
+ "nbytes": 52428800,
919
+ "byteOffset": 0
920
+ }
921
+ ],
922
+ "md5sum": "ee00368c240a6d66c041f4c30528c544"
923
+ },
924
+ {
925
+ "dataPath": "params_shard_27.bin",
926
+ "format": "raw-shard",
927
+ "nbytes": 52428800,
928
+ "records": [
929
+ {
930
+ "name": "transformer.h.7.mlp.fc2.weight",
931
+ "shape": [
932
+ 2560,
933
+ 10240
934
+ ],
935
+ "dtype": "float16",
936
+ "format": "f32-to-bf16",
937
+ "nbytes": 52428800,
938
+ "byteOffset": 0
939
+ }
940
+ ],
941
+ "md5sum": "61031af848adf766d88beec5d9f0acb0"
942
+ },
943
+ {
944
+ "dataPath": "params_shard_28.bin",
945
+ "format": "raw-shard",
946
+ "nbytes": 39321600,
947
+ "records": [
948
+ {
949
+ "name": "transformer.h.8.mixer.Wqkv.weight",
950
+ "shape": [
951
+ 7680,
952
+ 2560
953
+ ],
954
+ "dtype": "float16",
955
+ "format": "f32-to-bf16",
956
+ "nbytes": 39321600,
957
+ "byteOffset": 0
958
+ }
959
+ ],
960
+ "md5sum": "6ffb8e934fc2a9d0c34f79f70b96fe80"
961
+ },
962
+ {
963
+ "dataPath": "params_shard_29.bin",
964
+ "format": "raw-shard",
965
+ "nbytes": 26327040,
966
+ "records": [
967
+ {
968
+ "name": "transformer.h.6.mixer.out_proj.weight",
969
+ "shape": [
970
+ 2560,
971
+ 2560
972
+ ],
973
+ "dtype": "float16",
974
+ "format": "f32-to-bf16",
975
+ "nbytes": 13107200,
976
+ "byteOffset": 0
977
+ },
978
+ {
979
+ "name": "transformer.h.6.mixer.out_proj.bias",
980
+ "shape": [
981
+ 2560
982
+ ],
983
+ "dtype": "float16",
984
+ "format": "f32-to-bf16",
985
+ "nbytes": 5120,
986
+ "byteOffset": 13107200
987
+ },
988
+ {
989
+ "name": "transformer.h.6.mlp.fc1.bias",
990
+ "shape": [
991
+ 10240
992
+ ],
993
+ "dtype": "float16",
994
+ "format": "f32-to-bf16",
995
+ "nbytes": 20480,
996
+ "byteOffset": 13112320
997
+ },
998
+ {
999
+ "name": "transformer.h.6.mlp.fc2.bias",
1000
+ "shape": [
1001
+ 2560
1002
+ ],
1003
+ "dtype": "float16",
1004
+ "format": "f32-to-bf16",
1005
+ "nbytes": 5120,
1006
+ "byteOffset": 13132800
1007
+ },
1008
+ {
1009
+ "name": "transformer.h.6.ln.weight",
1010
+ "shape": [
1011
+ 2560
1012
+ ],
1013
+ "dtype": "float16",
1014
+ "format": "f32-to-bf16",
1015
+ "nbytes": 5120,
1016
+ "byteOffset": 13137920
1017
+ },
1018
+ {
1019
+ "name": "transformer.h.6.ln.bias",
1020
+ "shape": [
1021
+ 2560
1022
+ ],
1023
+ "dtype": "float16",
1024
+ "format": "f32-to-bf16",
1025
+ "nbytes": 5120,
1026
+ "byteOffset": 13143040
1027
+ },
1028
+ {
1029
+ "name": "transformer.h.7.mixer.Wqkv.bias",
1030
+ "shape": [
1031
+ 7680
1032
+ ],
1033
+ "dtype": "float16",
1034
+ "format": "f32-to-bf16",
1035
+ "nbytes": 15360,
1036
+ "byteOffset": 13148160
1037
+ },
1038
+ {
1039
+ "name": "transformer.h.7.mixer.out_proj.weight",
1040
+ "shape": [
1041
+ 2560,
1042
+ 2560
1043
+ ],
1044
+ "dtype": "float16",
1045
+ "format": "f32-to-bf16",
1046
+ "nbytes": 13107200,
1047
+ "byteOffset": 13163520
1048
+ },
1049
+ {
1050
+ "name": "transformer.h.7.mixer.out_proj.bias",
1051
+ "shape": [
1052
+ 2560
1053
+ ],
1054
+ "dtype": "float16",
1055
+ "format": "f32-to-bf16",
1056
+ "nbytes": 5120,
1057
+ "byteOffset": 26270720
1058
+ },
1059
+ {
1060
+ "name": "transformer.h.7.mlp.fc1.bias",
1061
+ "shape": [
1062
+ 10240
1063
+ ],
1064
+ "dtype": "float16",
1065
+ "format": "f32-to-bf16",
1066
+ "nbytes": 20480,
1067
+ "byteOffset": 26275840
1068
+ },
1069
+ {
1070
+ "name": "transformer.h.7.mlp.fc2.bias",
1071
+ "shape": [
1072
+ 2560
1073
+ ],
1074
+ "dtype": "float16",
1075
+ "format": "f32-to-bf16",
1076
+ "nbytes": 5120,
1077
+ "byteOffset": 26296320
1078
+ },
1079
+ {
1080
+ "name": "transformer.h.7.ln.weight",
1081
+ "shape": [
1082
+ 2560
1083
+ ],
1084
+ "dtype": "float16",
1085
+ "format": "f32-to-bf16",
1086
+ "nbytes": 5120,
1087
+ "byteOffset": 26301440
1088
+ },
1089
+ {
1090
+ "name": "transformer.h.7.ln.bias",
1091
+ "shape": [
1092
+ 2560
1093
+ ],
1094
+ "dtype": "float16",
1095
+ "format": "f32-to-bf16",
1096
+ "nbytes": 5120,
1097
+ "byteOffset": 26306560
1098
+ },
1099
+ {
1100
+ "name": "transformer.h.8.mixer.Wqkv.bias",
1101
+ "shape": [
1102
+ 7680
1103
+ ],
1104
+ "dtype": "float16",
1105
+ "format": "f32-to-bf16",
1106
+ "nbytes": 15360,
1107
+ "byteOffset": 26311680
1108
+ }
1109
+ ],
1110
+ "md5sum": "10b259f7b80334935ea33f9f0abdb01d"
1111
+ },
1112
+ {
1113
+ "dataPath": "params_shard_30.bin",
1114
+ "format": "raw-shard",
1115
+ "nbytes": 52428800,
1116
+ "records": [
1117
+ {
1118
+ "name": "transformer.h.8.mlp.fc1.weight",
1119
+ "shape": [
1120
+ 10240,
1121
+ 2560
1122
+ ],
1123
+ "dtype": "float16",
1124
+ "format": "f32-to-bf16",
1125
+ "nbytes": 52428800,
1126
+ "byteOffset": 0
1127
+ }
1128
+ ],
1129
+ "md5sum": "eb6126e1ec26e7cc1178ff32aa79ae2d"
1130
+ },
1131
+ {
1132
+ "dataPath": "params_shard_31.bin",
1133
+ "format": "raw-shard",
1134
+ "nbytes": 52428800,
1135
+ "records": [
1136
+ {
1137
+ "name": "transformer.h.8.mlp.fc2.weight",
1138
+ "shape": [
1139
+ 2560,
1140
+ 10240
1141
+ ],
1142
+ "dtype": "float16",
1143
+ "format": "f32-to-bf16",
1144
+ "nbytes": 52428800,
1145
+ "byteOffset": 0
1146
+ }
1147
+ ],
1148
+ "md5sum": "849b5b23fe85f1a319c4d777da925b8a"
1149
+ },
1150
+ {
1151
+ "dataPath": "params_shard_32.bin",
1152
+ "format": "raw-shard",
1153
+ "nbytes": 39321600,
1154
+ "records": [
1155
+ {
1156
+ "name": "transformer.h.9.mixer.Wqkv.weight",
1157
+ "shape": [
1158
+ 7680,
1159
+ 2560
1160
+ ],
1161
+ "dtype": "float16",
1162
+ "format": "f32-to-bf16",
1163
+ "nbytes": 39321600,
1164
+ "byteOffset": 0
1165
+ }
1166
+ ],
1167
+ "md5sum": "86f8b08366dee01d027a7103e71946f0"
1168
+ },
1169
+ {
1170
+ "dataPath": "params_shard_33.bin",
1171
+ "format": "raw-shard",
1172
+ "nbytes": 52428800,
1173
+ "records": [
1174
+ {
1175
+ "name": "transformer.h.9.mlp.fc1.weight",
1176
+ "shape": [
1177
+ 10240,
1178
+ 2560
1179
+ ],
1180
+ "dtype": "float16",
1181
+ "format": "f32-to-bf16",
1182
+ "nbytes": 52428800,
1183
+ "byteOffset": 0
1184
+ }
1185
+ ],
1186
+ "md5sum": "cd5d8fcd207089ee8ac08233d37ab831"
1187
+ },
1188
+ {
1189
+ "dataPath": "params_shard_34.bin",
1190
+ "format": "raw-shard",
1191
+ "nbytes": 52428800,
1192
+ "records": [
1193
+ {
1194
+ "name": "transformer.h.9.mlp.fc2.weight",
1195
+ "shape": [
1196
+ 2560,
1197
+ 10240
1198
+ ],
1199
+ "dtype": "float16",
1200
+ "format": "f32-to-bf16",
1201
+ "nbytes": 52428800,
1202
+ "byteOffset": 0
1203
+ }
1204
+ ],
1205
+ "md5sum": "d68a0eaa4eafc7ff64b28ee55dc0842c"
1206
+ },
1207
+ {
1208
+ "dataPath": "params_shard_35.bin",
1209
+ "format": "raw-shard",
1210
+ "nbytes": 39321600,
1211
+ "records": [
1212
+ {
1213
+ "name": "transformer.h.10.mixer.Wqkv.weight",
1214
+ "shape": [
1215
+ 7680,
1216
+ 2560
1217
+ ],
1218
+ "dtype": "float16",
1219
+ "format": "f32-to-bf16",
1220
+ "nbytes": 39321600,
1221
+ "byteOffset": 0
1222
+ }
1223
+ ],
1224
+ "md5sum": "fbf9982d6f0e89131bddc19cb49c1347"
1225
+ },
1226
+ {
1227
+ "dataPath": "params_shard_36.bin",
1228
+ "format": "raw-shard",
1229
+ "nbytes": 26327040,
1230
+ "records": [
1231
+ {
1232
+ "name": "transformer.h.8.mixer.out_proj.weight",
1233
+ "shape": [
1234
+ 2560,
1235
+ 2560
1236
+ ],
1237
+ "dtype": "float16",
1238
+ "format": "f32-to-bf16",
1239
+ "nbytes": 13107200,
1240
+ "byteOffset": 0
1241
+ },
1242
+ {
1243
+ "name": "transformer.h.8.mixer.out_proj.bias",
1244
+ "shape": [
1245
+ 2560
1246
+ ],
1247
+ "dtype": "float16",
1248
+ "format": "f32-to-bf16",
1249
+ "nbytes": 5120,
1250
+ "byteOffset": 13107200
1251
+ },
1252
+ {
1253
+ "name": "transformer.h.8.mlp.fc1.bias",
1254
+ "shape": [
1255
+ 10240
1256
+ ],
1257
+ "dtype": "float16",
1258
+ "format": "f32-to-bf16",
1259
+ "nbytes": 20480,
1260
+ "byteOffset": 13112320
1261
+ },
1262
+ {
1263
+ "name": "transformer.h.8.mlp.fc2.bias",
1264
+ "shape": [
1265
+ 2560
1266
+ ],
1267
+ "dtype": "float16",
1268
+ "format": "f32-to-bf16",
1269
+ "nbytes": 5120,
1270
+ "byteOffset": 13132800
1271
+ },
1272
+ {
1273
+ "name": "transformer.h.8.ln.weight",
1274
+ "shape": [
1275
+ 2560
1276
+ ],
1277
+ "dtype": "float16",
1278
+ "format": "f32-to-bf16",
1279
+ "nbytes": 5120,
1280
+ "byteOffset": 13137920
1281
+ },
1282
+ {
1283
+ "name": "transformer.h.8.ln.bias",
1284
+ "shape": [
1285
+ 2560
1286
+ ],
1287
+ "dtype": "float16",
1288
+ "format": "f32-to-bf16",
1289
+ "nbytes": 5120,
1290
+ "byteOffset": 13143040
1291
+ },
1292
+ {
1293
+ "name": "transformer.h.9.mixer.Wqkv.bias",
1294
+ "shape": [
1295
+ 7680
1296
+ ],
1297
+ "dtype": "float16",
1298
+ "format": "f32-to-bf16",
1299
+ "nbytes": 15360,
1300
+ "byteOffset": 13148160
1301
+ },
1302
+ {
1303
+ "name": "transformer.h.9.mixer.out_proj.weight",
1304
+ "shape": [
1305
+ 2560,
1306
+ 2560
1307
+ ],
1308
+ "dtype": "float16",
1309
+ "format": "f32-to-bf16",
1310
+ "nbytes": 13107200,
1311
+ "byteOffset": 13163520
1312
+ },
1313
+ {
1314
+ "name": "transformer.h.9.mixer.out_proj.bias",
1315
+ "shape": [
1316
+ 2560
1317
+ ],
1318
+ "dtype": "float16",
1319
+ "format": "f32-to-bf16",
1320
+ "nbytes": 5120,
1321
+ "byteOffset": 26270720
1322
+ },
1323
+ {
1324
+ "name": "transformer.h.9.mlp.fc1.bias",
1325
+ "shape": [
1326
+ 10240
1327
+ ],
1328
+ "dtype": "float16",
1329
+ "format": "f32-to-bf16",
1330
+ "nbytes": 20480,
1331
+ "byteOffset": 26275840
1332
+ },
1333
+ {
1334
+ "name": "transformer.h.9.mlp.fc2.bias",
1335
+ "shape": [
1336
+ 2560
1337
+ ],
1338
+ "dtype": "float16",
1339
+ "format": "f32-to-bf16",
1340
+ "nbytes": 5120,
1341
+ "byteOffset": 26296320
1342
+ },
1343
+ {
1344
+ "name": "transformer.h.9.ln.weight",
1345
+ "shape": [
1346
+ 2560
1347
+ ],
1348
+ "dtype": "float16",
1349
+ "format": "f32-to-bf16",
1350
+ "nbytes": 5120,
1351
+ "byteOffset": 26301440
1352
+ },
1353
+ {
1354
+ "name": "transformer.h.9.ln.bias",
1355
+ "shape": [
1356
+ 2560
1357
+ ],
1358
+ "dtype": "float16",
1359
+ "format": "f32-to-bf16",
1360
+ "nbytes": 5120,
1361
+ "byteOffset": 26306560
1362
+ },
1363
+ {
1364
+ "name": "transformer.h.10.mixer.Wqkv.bias",
1365
+ "shape": [
1366
+ 7680
1367
+ ],
1368
+ "dtype": "float16",
1369
+ "format": "f32-to-bf16",
1370
+ "nbytes": 15360,
1371
+ "byteOffset": 26311680
1372
+ }
1373
+ ],
1374
+ "md5sum": "266231919f5eaf3a31bfd0cea392b498"
1375
+ },
1376
+ {
1377
+ "dataPath": "params_shard_37.bin",
1378
+ "format": "raw-shard",
1379
+ "nbytes": 52428800,
1380
+ "records": [
1381
+ {
1382
+ "name": "transformer.h.10.mlp.fc1.weight",
1383
+ "shape": [
1384
+ 10240,
1385
+ 2560
1386
+ ],
1387
+ "dtype": "float16",
1388
+ "format": "f32-to-bf16",
1389
+ "nbytes": 52428800,
1390
+ "byteOffset": 0
1391
+ }
1392
+ ],
1393
+ "md5sum": "690d2340a046f40907fe625d1d751d19"
1394
+ },
1395
+ {
1396
+ "dataPath": "params_shard_38.bin",
1397
+ "format": "raw-shard",
1398
+ "nbytes": 52428800,
1399
+ "records": [
1400
+ {
1401
+ "name": "transformer.h.10.mlp.fc2.weight",
1402
+ "shape": [
1403
+ 2560,
1404
+ 10240
1405
+ ],
1406
+ "dtype": "float16",
1407
+ "format": "f32-to-bf16",
1408
+ "nbytes": 52428800,
1409
+ "byteOffset": 0
1410
+ }
1411
+ ],
1412
+ "md5sum": "c2ec913e8df76d589ca8d160dfd402db"
1413
+ },
1414
+ {
1415
+ "dataPath": "params_shard_39.bin",
1416
+ "format": "raw-shard",
1417
+ "nbytes": 39321600,
1418
+ "records": [
1419
+ {
1420
+ "name": "transformer.h.11.mixer.Wqkv.weight",
1421
+ "shape": [
1422
+ 7680,
1423
+ 2560
1424
+ ],
1425
+ "dtype": "float16",
1426
+ "format": "f32-to-bf16",
1427
+ "nbytes": 39321600,
1428
+ "byteOffset": 0
1429
+ }
1430
+ ],
1431
+ "md5sum": "68f4929215e67c28364d58d6843c001b"
1432
+ },
1433
+ {
1434
+ "dataPath": "params_shard_40.bin",
1435
+ "format": "raw-shard",
1436
+ "nbytes": 52428800,
1437
+ "records": [
1438
+ {
1439
+ "name": "transformer.h.11.mlp.fc1.weight",
1440
+ "shape": [
1441
+ 10240,
1442
+ 2560
1443
+ ],
1444
+ "dtype": "float16",
1445
+ "format": "f32-to-bf16",
1446
+ "nbytes": 52428800,
1447
+ "byteOffset": 0
1448
+ }
1449
+ ],
1450
+ "md5sum": "6c884087fbdbb1e779cd0e1093659625"
1451
+ },
1452
+ {
1453
+ "dataPath": "params_shard_41.bin",
1454
+ "format": "raw-shard",
1455
+ "nbytes": 52428800,
1456
+ "records": [
1457
+ {
1458
+ "name": "transformer.h.11.mlp.fc2.weight",
1459
+ "shape": [
1460
+ 2560,
1461
+ 10240
1462
+ ],
1463
+ "dtype": "float16",
1464
+ "format": "f32-to-bf16",
1465
+ "nbytes": 52428800,
1466
+ "byteOffset": 0
1467
+ }
1468
+ ],
1469
+ "md5sum": "98f5d251641571e4235cfd91909f6e3a"
1470
+ },
1471
+ {
1472
+ "dataPath": "params_shard_42.bin",
1473
+ "format": "raw-shard",
1474
+ "nbytes": 39321600,
1475
+ "records": [
1476
+ {
1477
+ "name": "transformer.h.12.mixer.Wqkv.weight",
1478
+ "shape": [
1479
+ 7680,
1480
+ 2560
1481
+ ],
1482
+ "dtype": "float16",
1483
+ "format": "f32-to-bf16",
1484
+ "nbytes": 39321600,
1485
+ "byteOffset": 0
1486
+ }
1487
+ ],
1488
+ "md5sum": "48d8412bcfbfd85027ed63f9dcdd6873"
1489
+ },
1490
+ {
1491
+ "dataPath": "params_shard_43.bin",
1492
+ "format": "raw-shard",
1493
+ "nbytes": 26327040,
1494
+ "records": [
1495
+ {
1496
+ "name": "transformer.h.10.mixer.out_proj.weight",
1497
+ "shape": [
1498
+ 2560,
1499
+ 2560
1500
+ ],
1501
+ "dtype": "float16",
1502
+ "format": "f32-to-bf16",
1503
+ "nbytes": 13107200,
1504
+ "byteOffset": 0
1505
+ },
1506
+ {
1507
+ "name": "transformer.h.10.mixer.out_proj.bias",
1508
+ "shape": [
1509
+ 2560
1510
+ ],
1511
+ "dtype": "float16",
1512
+ "format": "f32-to-bf16",
1513
+ "nbytes": 5120,
1514
+ "byteOffset": 13107200
1515
+ },
1516
+ {
1517
+ "name": "transformer.h.10.mlp.fc1.bias",
1518
+ "shape": [
1519
+ 10240
1520
+ ],
1521
+ "dtype": "float16",
1522
+ "format": "f32-to-bf16",
1523
+ "nbytes": 20480,
1524
+ "byteOffset": 13112320
1525
+ },
1526
+ {
1527
+ "name": "transformer.h.10.mlp.fc2.bias",
1528
+ "shape": [
1529
+ 2560
1530
+ ],
1531
+ "dtype": "float16",
1532
+ "format": "f32-to-bf16",
1533
+ "nbytes": 5120,
1534
+ "byteOffset": 13132800
1535
+ },
1536
+ {
1537
+ "name": "transformer.h.10.ln.weight",
1538
+ "shape": [
1539
+ 2560
1540
+ ],
1541
+ "dtype": "float16",
1542
+ "format": "f32-to-bf16",
1543
+ "nbytes": 5120,
1544
+ "byteOffset": 13137920
1545
+ },
1546
+ {
1547
+ "name": "transformer.h.10.ln.bias",
1548
+ "shape": [
1549
+ 2560
1550
+ ],
1551
+ "dtype": "float16",
1552
+ "format": "f32-to-bf16",
1553
+ "nbytes": 5120,
1554
+ "byteOffset": 13143040
1555
+ },
1556
+ {
1557
+ "name": "transformer.h.11.mixer.Wqkv.bias",
1558
+ "shape": [
1559
+ 7680
1560
+ ],
1561
+ "dtype": "float16",
1562
+ "format": "f32-to-bf16",
1563
+ "nbytes": 15360,
1564
+ "byteOffset": 13148160
1565
+ },
1566
+ {
1567
+ "name": "transformer.h.11.mixer.out_proj.weight",
1568
+ "shape": [
1569
+ 2560,
1570
+ 2560
1571
+ ],
1572
+ "dtype": "float16",
1573
+ "format": "f32-to-bf16",
1574
+ "nbytes": 13107200,
1575
+ "byteOffset": 13163520
1576
+ },
1577
+ {
1578
+ "name": "transformer.h.11.mixer.out_proj.bias",
1579
+ "shape": [
1580
+ 2560
1581
+ ],
1582
+ "dtype": "float16",
1583
+ "format": "f32-to-bf16",
1584
+ "nbytes": 5120,
1585
+ "byteOffset": 26270720
1586
+ },
1587
+ {
1588
+ "name": "transformer.h.11.mlp.fc1.bias",
1589
+ "shape": [
1590
+ 10240
1591
+ ],
1592
+ "dtype": "float16",
1593
+ "format": "f32-to-bf16",
1594
+ "nbytes": 20480,
1595
+ "byteOffset": 26275840
1596
+ },
1597
+ {
1598
+ "name": "transformer.h.11.mlp.fc2.bias",
1599
+ "shape": [
1600
+ 2560
1601
+ ],
1602
+ "dtype": "float16",
1603
+ "format": "f32-to-bf16",
1604
+ "nbytes": 5120,
1605
+ "byteOffset": 26296320
1606
+ },
1607
+ {
1608
+ "name": "transformer.h.11.ln.weight",
1609
+ "shape": [
1610
+ 2560
1611
+ ],
1612
+ "dtype": "float16",
1613
+ "format": "f32-to-bf16",
1614
+ "nbytes": 5120,
1615
+ "byteOffset": 26301440
1616
+ },
1617
+ {
1618
+ "name": "transformer.h.11.ln.bias",
1619
+ "shape": [
1620
+ 2560
1621
+ ],
1622
+ "dtype": "float16",
1623
+ "format": "f32-to-bf16",
1624
+ "nbytes": 5120,
1625
+ "byteOffset": 26306560
1626
+ },
1627
+ {
1628
+ "name": "transformer.h.12.mixer.Wqkv.bias",
1629
+ "shape": [
1630
+ 7680
1631
+ ],
1632
+ "dtype": "float16",
1633
+ "format": "f32-to-bf16",
1634
+ "nbytes": 15360,
1635
+ "byteOffset": 26311680
1636
+ }
1637
+ ],
1638
+ "md5sum": "0d90e5ac45d7ce00bedbb54a0c688cda"
1639
+ },
1640
+ {
1641
+ "dataPath": "params_shard_44.bin",
1642
+ "format": "raw-shard",
1643
+ "nbytes": 52428800,
1644
+ "records": [
1645
+ {
1646
+ "name": "transformer.h.12.mlp.fc1.weight",
1647
+ "shape": [
1648
+ 10240,
1649
+ 2560
1650
+ ],
1651
+ "dtype": "float16",
1652
+ "format": "f32-to-bf16",
1653
+ "nbytes": 52428800,
1654
+ "byteOffset": 0
1655
+ }
1656
+ ],
1657
+ "md5sum": "b7af2eed401eea46d396ac51d3acf274"
1658
+ },
1659
+ {
1660
+ "dataPath": "params_shard_45.bin",
1661
+ "format": "raw-shard",
1662
+ "nbytes": 52428800,
1663
+ "records": [
1664
+ {
1665
+ "name": "transformer.h.12.mlp.fc2.weight",
1666
+ "shape": [
1667
+ 2560,
1668
+ 10240
1669
+ ],
1670
+ "dtype": "float16",
1671
+ "format": "f32-to-bf16",
1672
+ "nbytes": 52428800,
1673
+ "byteOffset": 0
1674
+ }
1675
+ ],
1676
+ "md5sum": "96d1f7f7778192d9542442eee48c735e"
1677
+ },
1678
+ {
1679
+ "dataPath": "params_shard_46.bin",
1680
+ "format": "raw-shard",
1681
+ "nbytes": 39321600,
1682
+ "records": [
1683
+ {
1684
+ "name": "transformer.h.13.mixer.Wqkv.weight",
1685
+ "shape": [
1686
+ 7680,
1687
+ 2560
1688
+ ],
1689
+ "dtype": "float16",
1690
+ "format": "f32-to-bf16",
1691
+ "nbytes": 39321600,
1692
+ "byteOffset": 0
1693
+ }
1694
+ ],
1695
+ "md5sum": "d79d7c20fcdcb92a9b956a0a9a48026d"
1696
+ },
1697
+ {
1698
+ "dataPath": "params_shard_47.bin",
1699
+ "format": "raw-shard",
1700
+ "nbytes": 52428800,
1701
+ "records": [
1702
+ {
1703
+ "name": "transformer.h.13.mlp.fc1.weight",
1704
+ "shape": [
1705
+ 10240,
1706
+ 2560
1707
+ ],
1708
+ "dtype": "float16",
1709
+ "format": "f32-to-bf16",
1710
+ "nbytes": 52428800,
1711
+ "byteOffset": 0
1712
+ }
1713
+ ],
1714
+ "md5sum": "0d391b68314789a69d6c76e3c65c3538"
1715
+ },
1716
+ {
1717
+ "dataPath": "params_shard_48.bin",
1718
+ "format": "raw-shard",
1719
+ "nbytes": 52428800,
1720
+ "records": [
1721
+ {
1722
+ "name": "transformer.h.13.mlp.fc2.weight",
1723
+ "shape": [
1724
+ 2560,
1725
+ 10240
1726
+ ],
1727
+ "dtype": "float16",
1728
+ "format": "f32-to-bf16",
1729
+ "nbytes": 52428800,
1730
+ "byteOffset": 0
1731
+ }
1732
+ ],
1733
+ "md5sum": "0fbde014ac32a62e1d189f38ed4d0afe"
1734
+ },
1735
+ {
1736
+ "dataPath": "params_shard_49.bin",
1737
+ "format": "raw-shard",
1738
+ "nbytes": 39321600,
1739
+ "records": [
1740
+ {
1741
+ "name": "transformer.h.14.mixer.Wqkv.weight",
1742
+ "shape": [
1743
+ 7680,
1744
+ 2560
1745
+ ],
1746
+ "dtype": "float16",
1747
+ "format": "f32-to-bf16",
1748
+ "nbytes": 39321600,
1749
+ "byteOffset": 0
1750
+ }
1751
+ ],
1752
+ "md5sum": "98f570ce4858df7ca19475d32695dce9"
1753
+ },
1754
+ {
1755
+ "dataPath": "params_shard_50.bin",
1756
+ "format": "raw-shard",
1757
+ "nbytes": 26327040,
1758
+ "records": [
1759
+ {
1760
+ "name": "transformer.h.12.mixer.out_proj.weight",
1761
+ "shape": [
1762
+ 2560,
1763
+ 2560
1764
+ ],
1765
+ "dtype": "float16",
1766
+ "format": "f32-to-bf16",
1767
+ "nbytes": 13107200,
1768
+ "byteOffset": 0
1769
+ },
1770
+ {
1771
+ "name": "transformer.h.12.mixer.out_proj.bias",
1772
+ "shape": [
1773
+ 2560
1774
+ ],
1775
+ "dtype": "float16",
1776
+ "format": "f32-to-bf16",
1777
+ "nbytes": 5120,
1778
+ "byteOffset": 13107200
1779
+ },
1780
+ {
1781
+ "name": "transformer.h.12.mlp.fc1.bias",
1782
+ "shape": [
1783
+ 10240
1784
+ ],
1785
+ "dtype": "float16",
1786
+ "format": "f32-to-bf16",
1787
+ "nbytes": 20480,
1788
+ "byteOffset": 13112320
1789
+ },
1790
+ {
1791
+ "name": "transformer.h.12.mlp.fc2.bias",
1792
+ "shape": [
1793
+ 2560
1794
+ ],
1795
+ "dtype": "float16",
1796
+ "format": "f32-to-bf16",
1797
+ "nbytes": 5120,
1798
+ "byteOffset": 13132800
1799
+ },
1800
+ {
1801
+ "name": "transformer.h.12.ln.weight",
1802
+ "shape": [
1803
+ 2560
1804
+ ],
1805
+ "dtype": "float16",
1806
+ "format": "f32-to-bf16",
1807
+ "nbytes": 5120,
1808
+ "byteOffset": 13137920
1809
+ },
1810
+ {
1811
+ "name": "transformer.h.12.ln.bias",
1812
+ "shape": [
1813
+ 2560
1814
+ ],
1815
+ "dtype": "float16",
1816
+ "format": "f32-to-bf16",
1817
+ "nbytes": 5120,
1818
+ "byteOffset": 13143040
1819
+ },
1820
+ {
1821
+ "name": "transformer.h.13.mixer.Wqkv.bias",
1822
+ "shape": [
1823
+ 7680
1824
+ ],
1825
+ "dtype": "float16",
1826
+ "format": "f32-to-bf16",
1827
+ "nbytes": 15360,
1828
+ "byteOffset": 13148160
1829
+ },
1830
+ {
1831
+ "name": "transformer.h.13.mixer.out_proj.weight",
1832
+ "shape": [
1833
+ 2560,
1834
+ 2560
1835
+ ],
1836
+ "dtype": "float16",
1837
+ "format": "f32-to-bf16",
1838
+ "nbytes": 13107200,
1839
+ "byteOffset": 13163520
1840
+ },
1841
+ {
1842
+ "name": "transformer.h.13.mixer.out_proj.bias",
1843
+ "shape": [
1844
+ 2560
1845
+ ],
1846
+ "dtype": "float16",
1847
+ "format": "f32-to-bf16",
1848
+ "nbytes": 5120,
1849
+ "byteOffset": 26270720
1850
+ },
1851
+ {
1852
+ "name": "transformer.h.13.mlp.fc1.bias",
1853
+ "shape": [
1854
+ 10240
1855
+ ],
1856
+ "dtype": "float16",
1857
+ "format": "f32-to-bf16",
1858
+ "nbytes": 20480,
1859
+ "byteOffset": 26275840
1860
+ },
1861
+ {
1862
+ "name": "transformer.h.13.mlp.fc2.bias",
1863
+ "shape": [
1864
+ 2560
1865
+ ],
1866
+ "dtype": "float16",
1867
+ "format": "f32-to-bf16",
1868
+ "nbytes": 5120,
1869
+ "byteOffset": 26296320
1870
+ },
1871
+ {
1872
+ "name": "transformer.h.13.ln.weight",
1873
+ "shape": [
1874
+ 2560
1875
+ ],
1876
+ "dtype": "float16",
1877
+ "format": "f32-to-bf16",
1878
+ "nbytes": 5120,
1879
+ "byteOffset": 26301440
1880
+ },
1881
+ {
1882
+ "name": "transformer.h.13.ln.bias",
1883
+ "shape": [
1884
+ 2560
1885
+ ],
1886
+ "dtype": "float16",
1887
+ "format": "f32-to-bf16",
1888
+ "nbytes": 5120,
1889
+ "byteOffset": 26306560
1890
+ },
1891
+ {
1892
+ "name": "transformer.h.14.mixer.Wqkv.bias",
1893
+ "shape": [
1894
+ 7680
1895
+ ],
1896
+ "dtype": "float16",
1897
+ "format": "f32-to-bf16",
1898
+ "nbytes": 15360,
1899
+ "byteOffset": 26311680
1900
+ }
1901
+ ],
1902
+ "md5sum": "92d2c8113e788af21045a243cdb7325c"
1903
+ },
1904
+ {
1905
+ "dataPath": "params_shard_51.bin",
1906
+ "format": "raw-shard",
1907
+ "nbytes": 52428800,
1908
+ "records": [
1909
+ {
1910
+ "name": "transformer.h.14.mlp.fc1.weight",
1911
+ "shape": [
1912
+ 10240,
1913
+ 2560
1914
+ ],
1915
+ "dtype": "float16",
1916
+ "format": "f32-to-bf16",
1917
+ "nbytes": 52428800,
1918
+ "byteOffset": 0
1919
+ }
1920
+ ],
1921
+ "md5sum": "75b97abbacf685e51794e7256d610e3c"
1922
+ },
1923
+ {
1924
+ "dataPath": "params_shard_52.bin",
1925
+ "format": "raw-shard",
1926
+ "nbytes": 52428800,
1927
+ "records": [
1928
+ {
1929
+ "name": "transformer.h.14.mlp.fc2.weight",
1930
+ "shape": [
1931
+ 2560,
1932
+ 10240
1933
+ ],
1934
+ "dtype": "float16",
1935
+ "format": "f32-to-bf16",
1936
+ "nbytes": 52428800,
1937
+ "byteOffset": 0
1938
+ }
1939
+ ],
1940
+ "md5sum": "1d3e0557ab7aa60cbd6acd33c5070327"
1941
+ },
1942
+ {
1943
+ "dataPath": "params_shard_53.bin",
1944
+ "format": "raw-shard",
1945
+ "nbytes": 39321600,
1946
+ "records": [
1947
+ {
1948
+ "name": "transformer.h.15.mixer.Wqkv.weight",
1949
+ "shape": [
1950
+ 7680,
1951
+ 2560
1952
+ ],
1953
+ "dtype": "float16",
1954
+ "format": "f32-to-bf16",
1955
+ "nbytes": 39321600,
1956
+ "byteOffset": 0
1957
+ }
1958
+ ],
1959
+ "md5sum": "5e3feeee08b04e7af3a715c10b291db0"
1960
+ },
1961
+ {
1962
+ "dataPath": "params_shard_54.bin",
1963
+ "format": "raw-shard",
1964
+ "nbytes": 52428800,
1965
+ "records": [
1966
+ {
1967
+ "name": "transformer.h.15.mlp.fc1.weight",
1968
+ "shape": [
1969
+ 10240,
1970
+ 2560
1971
+ ],
1972
+ "dtype": "float16",
1973
+ "format": "f32-to-bf16",
1974
+ "nbytes": 52428800,
1975
+ "byteOffset": 0
1976
+ }
1977
+ ],
1978
+ "md5sum": "16783d3af4c068f88250e69b0588e17e"
1979
+ },
1980
+ {
1981
+ "dataPath": "params_shard_55.bin",
1982
+ "format": "raw-shard",
1983
+ "nbytes": 52428800,
1984
+ "records": [
1985
+ {
1986
+ "name": "transformer.h.15.mlp.fc2.weight",
1987
+ "shape": [
1988
+ 2560,
1989
+ 10240
1990
+ ],
1991
+ "dtype": "float16",
1992
+ "format": "f32-to-bf16",
1993
+ "nbytes": 52428800,
1994
+ "byteOffset": 0
1995
+ }
1996
+ ],
1997
+ "md5sum": "9333c61853fb4d63c389aba68ae5ad39"
1998
+ },
1999
+ {
2000
+ "dataPath": "params_shard_56.bin",
2001
+ "format": "raw-shard",
2002
+ "nbytes": 39321600,
2003
+ "records": [
2004
+ {
2005
+ "name": "transformer.h.16.mixer.Wqkv.weight",
2006
+ "shape": [
2007
+ 7680,
2008
+ 2560
2009
+ ],
2010
+ "dtype": "float16",
2011
+ "format": "f32-to-bf16",
2012
+ "nbytes": 39321600,
2013
+ "byteOffset": 0
2014
+ }
2015
+ ],
2016
+ "md5sum": "84d79f00c414566b46f1e620b31dc30b"
2017
+ },
2018
+ {
2019
+ "dataPath": "params_shard_57.bin",
2020
+ "format": "raw-shard",
2021
+ "nbytes": 26327040,
2022
+ "records": [
2023
+ {
2024
+ "name": "transformer.h.14.mixer.out_proj.weight",
2025
+ "shape": [
2026
+ 2560,
2027
+ 2560
2028
+ ],
2029
+ "dtype": "float16",
2030
+ "format": "f32-to-bf16",
2031
+ "nbytes": 13107200,
2032
+ "byteOffset": 0
2033
+ },
2034
+ {
2035
+ "name": "transformer.h.14.mixer.out_proj.bias",
2036
+ "shape": [
2037
+ 2560
2038
+ ],
2039
+ "dtype": "float16",
2040
+ "format": "f32-to-bf16",
2041
+ "nbytes": 5120,
2042
+ "byteOffset": 13107200
2043
+ },
2044
+ {
2045
+ "name": "transformer.h.14.mlp.fc1.bias",
2046
+ "shape": [
2047
+ 10240
2048
+ ],
2049
+ "dtype": "float16",
2050
+ "format": "f32-to-bf16",
2051
+ "nbytes": 20480,
2052
+ "byteOffset": 13112320
2053
+ },
2054
+ {
2055
+ "name": "transformer.h.14.mlp.fc2.bias",
2056
+ "shape": [
2057
+ 2560
2058
+ ],
2059
+ "dtype": "float16",
2060
+ "format": "f32-to-bf16",
2061
+ "nbytes": 5120,
2062
+ "byteOffset": 13132800
2063
+ },
2064
+ {
2065
+ "name": "transformer.h.14.ln.weight",
2066
+ "shape": [
2067
+ 2560
2068
+ ],
2069
+ "dtype": "float16",
2070
+ "format": "f32-to-bf16",
2071
+ "nbytes": 5120,
2072
+ "byteOffset": 13137920
2073
+ },
2074
+ {
2075
+ "name": "transformer.h.14.ln.bias",
2076
+ "shape": [
2077
+ 2560
2078
+ ],
2079
+ "dtype": "float16",
2080
+ "format": "f32-to-bf16",
2081
+ "nbytes": 5120,
2082
+ "byteOffset": 13143040
2083
+ },
2084
+ {
2085
+ "name": "transformer.h.15.mixer.Wqkv.bias",
2086
+ "shape": [
2087
+ 7680
2088
+ ],
2089
+ "dtype": "float16",
2090
+ "format": "f32-to-bf16",
2091
+ "nbytes": 15360,
2092
+ "byteOffset": 13148160
2093
+ },
2094
+ {
2095
+ "name": "transformer.h.15.mixer.out_proj.weight",
2096
+ "shape": [
2097
+ 2560,
2098
+ 2560
2099
+ ],
2100
+ "dtype": "float16",
2101
+ "format": "f32-to-bf16",
2102
+ "nbytes": 13107200,
2103
+ "byteOffset": 13163520
2104
+ },
2105
+ {
2106
+ "name": "transformer.h.15.mixer.out_proj.bias",
2107
+ "shape": [
2108
+ 2560
2109
+ ],
2110
+ "dtype": "float16",
2111
+ "format": "f32-to-bf16",
2112
+ "nbytes": 5120,
2113
+ "byteOffset": 26270720
2114
+ },
2115
+ {
2116
+ "name": "transformer.h.15.mlp.fc1.bias",
2117
+ "shape": [
2118
+ 10240
2119
+ ],
2120
+ "dtype": "float16",
2121
+ "format": "f32-to-bf16",
2122
+ "nbytes": 20480,
2123
+ "byteOffset": 26275840
2124
+ },
2125
+ {
2126
+ "name": "transformer.h.15.mlp.fc2.bias",
2127
+ "shape": [
2128
+ 2560
2129
+ ],
2130
+ "dtype": "float16",
2131
+ "format": "f32-to-bf16",
2132
+ "nbytes": 5120,
2133
+ "byteOffset": 26296320
2134
+ },
2135
+ {
2136
+ "name": "transformer.h.15.ln.weight",
2137
+ "shape": [
2138
+ 2560
2139
+ ],
2140
+ "dtype": "float16",
2141
+ "format": "f32-to-bf16",
2142
+ "nbytes": 5120,
2143
+ "byteOffset": 26301440
2144
+ },
2145
+ {
2146
+ "name": "transformer.h.15.ln.bias",
2147
+ "shape": [
2148
+ 2560
2149
+ ],
2150
+ "dtype": "float16",
2151
+ "format": "f32-to-bf16",
2152
+ "nbytes": 5120,
2153
+ "byteOffset": 26306560
2154
+ },
2155
+ {
2156
+ "name": "transformer.h.16.mixer.Wqkv.bias",
2157
+ "shape": [
2158
+ 7680
2159
+ ],
2160
+ "dtype": "float16",
2161
+ "format": "f32-to-bf16",
2162
+ "nbytes": 15360,
2163
+ "byteOffset": 26311680
2164
+ }
2165
+ ],
2166
+ "md5sum": "6b2bb3e094b2ad81f48990b9b0d8e830"
2167
+ },
2168
+ {
2169
+ "dataPath": "params_shard_58.bin",
2170
+ "format": "raw-shard",
2171
+ "nbytes": 52428800,
2172
+ "records": [
2173
+ {
2174
+ "name": "transformer.h.16.mlp.fc1.weight",
2175
+ "shape": [
2176
+ 10240,
2177
+ 2560
2178
+ ],
2179
+ "dtype": "float16",
2180
+ "format": "f32-to-bf16",
2181
+ "nbytes": 52428800,
2182
+ "byteOffset": 0
2183
+ }
2184
+ ],
2185
+ "md5sum": "e1248c1b0c1ebd29dcbd6eb7f4ec338a"
2186
+ },
2187
+ {
2188
+ "dataPath": "params_shard_59.bin",
2189
+ "format": "raw-shard",
2190
+ "nbytes": 52428800,
2191
+ "records": [
2192
+ {
2193
+ "name": "transformer.h.16.mlp.fc2.weight",
2194
+ "shape": [
2195
+ 2560,
2196
+ 10240
2197
+ ],
2198
+ "dtype": "float16",
2199
+ "format": "f32-to-bf16",
2200
+ "nbytes": 52428800,
2201
+ "byteOffset": 0
2202
+ }
2203
+ ],
2204
+ "md5sum": "040407b43bf3ff9aa561fbe29148c18b"
2205
+ },
2206
+ {
2207
+ "dataPath": "params_shard_60.bin",
2208
+ "format": "raw-shard",
2209
+ "nbytes": 39321600,
2210
+ "records": [
2211
+ {
2212
+ "name": "transformer.h.17.mixer.Wqkv.weight",
2213
+ "shape": [
2214
+ 7680,
2215
+ 2560
2216
+ ],
2217
+ "dtype": "float16",
2218
+ "format": "f32-to-bf16",
2219
+ "nbytes": 39321600,
2220
+ "byteOffset": 0
2221
+ }
2222
+ ],
2223
+ "md5sum": "720112a08108adab7ab733f50cbffeb3"
2224
+ },
2225
+ {
2226
+ "dataPath": "params_shard_61.bin",
2227
+ "format": "raw-shard",
2228
+ "nbytes": 52428800,
2229
+ "records": [
2230
+ {
2231
+ "name": "transformer.h.17.mlp.fc1.weight",
2232
+ "shape": [
2233
+ 10240,
2234
+ 2560
2235
+ ],
2236
+ "dtype": "float16",
2237
+ "format": "f32-to-bf16",
2238
+ "nbytes": 52428800,
2239
+ "byteOffset": 0
2240
+ }
2241
+ ],
2242
+ "md5sum": "3ae5a6ae85945525cb84d302a8026236"
2243
+ },
2244
+ {
2245
+ "dataPath": "params_shard_62.bin",
2246
+ "format": "raw-shard",
2247
+ "nbytes": 52428800,
2248
+ "records": [
2249
+ {
2250
+ "name": "transformer.h.17.mlp.fc2.weight",
2251
+ "shape": [
2252
+ 2560,
2253
+ 10240
2254
+ ],
2255
+ "dtype": "float16",
2256
+ "format": "f32-to-bf16",
2257
+ "nbytes": 52428800,
2258
+ "byteOffset": 0
2259
+ }
2260
+ ],
2261
+ "md5sum": "effb14500be7ce716373c4ec3239acf1"
2262
+ },
2263
+ {
2264
+ "dataPath": "params_shard_63.bin",
2265
+ "format": "raw-shard",
2266
+ "nbytes": 39321600,
2267
+ "records": [
2268
+ {
2269
+ "name": "transformer.h.18.mixer.Wqkv.weight",
2270
+ "shape": [
2271
+ 7680,
2272
+ 2560
2273
+ ],
2274
+ "dtype": "float16",
2275
+ "format": "f32-to-bf16",
2276
+ "nbytes": 39321600,
2277
+ "byteOffset": 0
2278
+ }
2279
+ ],
2280
+ "md5sum": "792b23c0059afdc26cf6ffdd82b84d00"
2281
+ },
2282
+ {
2283
+ "dataPath": "params_shard_64.bin",
2284
+ "format": "raw-shard",
2285
+ "nbytes": 26327040,
2286
+ "records": [
2287
+ {
2288
+ "name": "transformer.h.16.mixer.out_proj.weight",
2289
+ "shape": [
2290
+ 2560,
2291
+ 2560
2292
+ ],
2293
+ "dtype": "float16",
2294
+ "format": "f32-to-bf16",
2295
+ "nbytes": 13107200,
2296
+ "byteOffset": 0
2297
+ },
2298
+ {
2299
+ "name": "transformer.h.16.mixer.out_proj.bias",
2300
+ "shape": [
2301
+ 2560
2302
+ ],
2303
+ "dtype": "float16",
2304
+ "format": "f32-to-bf16",
2305
+ "nbytes": 5120,
2306
+ "byteOffset": 13107200
2307
+ },
2308
+ {
2309
+ "name": "transformer.h.16.mlp.fc1.bias",
2310
+ "shape": [
2311
+ 10240
2312
+ ],
2313
+ "dtype": "float16",
2314
+ "format": "f32-to-bf16",
2315
+ "nbytes": 20480,
2316
+ "byteOffset": 13112320
2317
+ },
2318
+ {
2319
+ "name": "transformer.h.16.mlp.fc2.bias",
2320
+ "shape": [
2321
+ 2560
2322
+ ],
2323
+ "dtype": "float16",
2324
+ "format": "f32-to-bf16",
2325
+ "nbytes": 5120,
2326
+ "byteOffset": 13132800
2327
+ },
2328
+ {
2329
+ "name": "transformer.h.16.ln.weight",
2330
+ "shape": [
2331
+ 2560
2332
+ ],
2333
+ "dtype": "float16",
2334
+ "format": "f32-to-bf16",
2335
+ "nbytes": 5120,
2336
+ "byteOffset": 13137920
2337
+ },
2338
+ {
2339
+ "name": "transformer.h.16.ln.bias",
2340
+ "shape": [
2341
+ 2560
2342
+ ],
2343
+ "dtype": "float16",
2344
+ "format": "f32-to-bf16",
2345
+ "nbytes": 5120,
2346
+ "byteOffset": 13143040
2347
+ },
2348
+ {
2349
+ "name": "transformer.h.17.mixer.Wqkv.bias",
2350
+ "shape": [
2351
+ 7680
2352
+ ],
2353
+ "dtype": "float16",
2354
+ "format": "f32-to-bf16",
2355
+ "nbytes": 15360,
2356
+ "byteOffset": 13148160
2357
+ },
2358
+ {
2359
+ "name": "transformer.h.17.mixer.out_proj.weight",
2360
+ "shape": [
2361
+ 2560,
2362
+ 2560
2363
+ ],
2364
+ "dtype": "float16",
2365
+ "format": "f32-to-bf16",
2366
+ "nbytes": 13107200,
2367
+ "byteOffset": 13163520
2368
+ },
2369
+ {
2370
+ "name": "transformer.h.17.mixer.out_proj.bias",
2371
+ "shape": [
2372
+ 2560
2373
+ ],
2374
+ "dtype": "float16",
2375
+ "format": "f32-to-bf16",
2376
+ "nbytes": 5120,
2377
+ "byteOffset": 26270720
2378
+ },
2379
+ {
2380
+ "name": "transformer.h.17.mlp.fc1.bias",
2381
+ "shape": [
2382
+ 10240
2383
+ ],
2384
+ "dtype": "float16",
2385
+ "format": "f32-to-bf16",
2386
+ "nbytes": 20480,
2387
+ "byteOffset": 26275840
2388
+ },
2389
+ {
2390
+ "name": "transformer.h.17.mlp.fc2.bias",
2391
+ "shape": [
2392
+ 2560
2393
+ ],
2394
+ "dtype": "float16",
2395
+ "format": "f32-to-bf16",
2396
+ "nbytes": 5120,
2397
+ "byteOffset": 26296320
2398
+ },
2399
+ {
2400
+ "name": "transformer.h.17.ln.weight",
2401
+ "shape": [
2402
+ 2560
2403
+ ],
2404
+ "dtype": "float16",
2405
+ "format": "f32-to-bf16",
2406
+ "nbytes": 5120,
2407
+ "byteOffset": 26301440
2408
+ },
2409
+ {
2410
+ "name": "transformer.h.17.ln.bias",
2411
+ "shape": [
2412
+ 2560
2413
+ ],
2414
+ "dtype": "float16",
2415
+ "format": "f32-to-bf16",
2416
+ "nbytes": 5120,
2417
+ "byteOffset": 26306560
2418
+ },
2419
+ {
2420
+ "name": "transformer.h.18.mixer.Wqkv.bias",
2421
+ "shape": [
2422
+ 7680
2423
+ ],
2424
+ "dtype": "float16",
2425
+ "format": "f32-to-bf16",
2426
+ "nbytes": 15360,
2427
+ "byteOffset": 26311680
2428
+ }
2429
+ ],
2430
+ "md5sum": "12e1f05efa6bb94064fb5910224468ea"
2431
+ },
2432
+ {
2433
+ "dataPath": "params_shard_65.bin",
2434
+ "format": "raw-shard",
2435
+ "nbytes": 52428800,
2436
+ "records": [
2437
+ {
2438
+ "name": "transformer.h.18.mlp.fc1.weight",
2439
+ "shape": [
2440
+ 10240,
2441
+ 2560
2442
+ ],
2443
+ "dtype": "float16",
2444
+ "format": "f32-to-bf16",
2445
+ "nbytes": 52428800,
2446
+ "byteOffset": 0
2447
+ }
2448
+ ],
2449
+ "md5sum": "8d7270666cc389cd116199ef3ad130ed"
2450
+ },
2451
+ {
2452
+ "dataPath": "params_shard_66.bin",
2453
+ "format": "raw-shard",
2454
+ "nbytes": 52428800,
2455
+ "records": [
2456
+ {
2457
+ "name": "transformer.h.18.mlp.fc2.weight",
2458
+ "shape": [
2459
+ 2560,
2460
+ 10240
2461
+ ],
2462
+ "dtype": "float16",
2463
+ "format": "f32-to-bf16",
2464
+ "nbytes": 52428800,
2465
+ "byteOffset": 0
2466
+ }
2467
+ ],
2468
+ "md5sum": "9597fde475154730ecb8429a52ee37b6"
2469
+ },
2470
+ {
2471
+ "dataPath": "params_shard_67.bin",
2472
+ "format": "raw-shard",
2473
+ "nbytes": 39321600,
2474
+ "records": [
2475
+ {
2476
+ "name": "transformer.h.19.mixer.Wqkv.weight",
2477
+ "shape": [
2478
+ 7680,
2479
+ 2560
2480
+ ],
2481
+ "dtype": "float16",
2482
+ "format": "f32-to-bf16",
2483
+ "nbytes": 39321600,
2484
+ "byteOffset": 0
2485
+ }
2486
+ ],
2487
+ "md5sum": "6d270b3ed05c6e4948e4099af7cfbd09"
2488
+ },
2489
+ {
2490
+ "dataPath": "params_shard_68.bin",
2491
+ "format": "raw-shard",
2492
+ "nbytes": 52428800,
2493
+ "records": [
2494
+ {
2495
+ "name": "transformer.h.19.mlp.fc1.weight",
2496
+ "shape": [
2497
+ 10240,
2498
+ 2560
2499
+ ],
2500
+ "dtype": "float16",
2501
+ "format": "f32-to-bf16",
2502
+ "nbytes": 52428800,
2503
+ "byteOffset": 0
2504
+ }
2505
+ ],
2506
+ "md5sum": "f2eb64215c23166b166ca5b7d9bc7223"
2507
+ },
2508
+ {
2509
+ "dataPath": "params_shard_69.bin",
2510
+ "format": "raw-shard",
2511
+ "nbytes": 52428800,
2512
+ "records": [
2513
+ {
2514
+ "name": "transformer.h.19.mlp.fc2.weight",
2515
+ "shape": [
2516
+ 2560,
2517
+ 10240
2518
+ ],
2519
+ "dtype": "float16",
2520
+ "format": "f32-to-bf16",
2521
+ "nbytes": 52428800,
2522
+ "byteOffset": 0
2523
+ }
2524
+ ],
2525
+ "md5sum": "1f2d6952a50b443abe22e14084353243"
2526
+ },
2527
+ {
2528
+ "dataPath": "params_shard_70.bin",
2529
+ "format": "raw-shard",
2530
+ "nbytes": 262144000,
2531
+ "records": [
2532
+ {
2533
+ "name": "lm_head.linear.weight",
2534
+ "shape": [
2535
+ 51200,
2536
+ 2560
2537
+ ],
2538
+ "dtype": "float16",
2539
+ "format": "f32-to-bf16",
2540
+ "nbytes": 262144000,
2541
+ "byteOffset": 0
2542
+ }
2543
+ ],
2544
+ "md5sum": "8feed96b8b5f10f70c5a66f554136517"
2545
+ },
2546
+ {
2547
+ "dataPath": "params_shard_71.bin",
2548
+ "format": "raw-shard",
2549
+ "nbytes": 26424320,
2550
+ "records": [
2551
+ {
2552
+ "name": "transformer.h.18.mixer.out_proj.weight",
2553
+ "shape": [
2554
+ 2560,
2555
+ 2560
2556
+ ],
2557
+ "dtype": "float16",
2558
+ "format": "f32-to-bf16",
2559
+ "nbytes": 13107200,
2560
+ "byteOffset": 0
2561
+ },
2562
+ {
2563
+ "name": "transformer.h.18.mixer.out_proj.bias",
2564
+ "shape": [
2565
+ 2560
2566
+ ],
2567
+ "dtype": "float16",
2568
+ "format": "f32-to-bf16",
2569
+ "nbytes": 5120,
2570
+ "byteOffset": 13107200
2571
+ },
2572
+ {
2573
+ "name": "transformer.h.18.mlp.fc1.bias",
2574
+ "shape": [
2575
+ 10240
2576
+ ],
2577
+ "dtype": "float16",
2578
+ "format": "f32-to-bf16",
2579
+ "nbytes": 20480,
2580
+ "byteOffset": 13112320
2581
+ },
2582
+ {
2583
+ "name": "transformer.h.18.mlp.fc2.bias",
2584
+ "shape": [
2585
+ 2560
2586
+ ],
2587
+ "dtype": "float16",
2588
+ "format": "f32-to-bf16",
2589
+ "nbytes": 5120,
2590
+ "byteOffset": 13132800
2591
+ },
2592
+ {
2593
+ "name": "transformer.h.18.ln.weight",
2594
+ "shape": [
2595
+ 2560
2596
+ ],
2597
+ "dtype": "float16",
2598
+ "format": "f32-to-bf16",
2599
+ "nbytes": 5120,
2600
+ "byteOffset": 13137920
2601
+ },
2602
+ {
2603
+ "name": "transformer.h.18.ln.bias",
2604
+ "shape": [
2605
+ 2560
2606
+ ],
2607
+ "dtype": "float16",
2608
+ "format": "f32-to-bf16",
2609
+ "nbytes": 5120,
2610
+ "byteOffset": 13143040
2611
+ },
2612
+ {
2613
+ "name": "transformer.h.19.mixer.Wqkv.bias",
2614
+ "shape": [
2615
+ 7680
2616
+ ],
2617
+ "dtype": "float16",
2618
+ "format": "f32-to-bf16",
2619
+ "nbytes": 15360,
2620
+ "byteOffset": 13148160
2621
+ },
2622
+ {
2623
+ "name": "transformer.h.19.mixer.out_proj.weight",
2624
+ "shape": [
2625
+ 2560,
2626
+ 2560
2627
+ ],
2628
+ "dtype": "float16",
2629
+ "format": "f32-to-bf16",
2630
+ "nbytes": 13107200,
2631
+ "byteOffset": 13163520
2632
+ },
2633
+ {
2634
+ "name": "transformer.h.19.mixer.out_proj.bias",
2635
+ "shape": [
2636
+ 2560
2637
+ ],
2638
+ "dtype": "float16",
2639
+ "format": "f32-to-bf16",
2640
+ "nbytes": 5120,
2641
+ "byteOffset": 26270720
2642
+ },
2643
+ {
2644
+ "name": "transformer.h.19.mlp.fc1.bias",
2645
+ "shape": [
2646
+ 10240
2647
+ ],
2648
+ "dtype": "float16",
2649
+ "format": "f32-to-bf16",
2650
+ "nbytes": 20480,
2651
+ "byteOffset": 26275840
2652
+ },
2653
+ {
2654
+ "name": "transformer.h.19.mlp.fc2.bias",
2655
+ "shape": [
2656
+ 2560
2657
+ ],
2658
+ "dtype": "float16",
2659
+ "format": "f32-to-bf16",
2660
+ "nbytes": 5120,
2661
+ "byteOffset": 26296320
2662
+ },
2663
+ {
2664
+ "name": "transformer.h.19.ln.weight",
2665
+ "shape": [
2666
+ 2560
2667
+ ],
2668
+ "dtype": "float16",
2669
+ "format": "f32-to-bf16",
2670
+ "nbytes": 5120,
2671
+ "byteOffset": 26301440
2672
+ },
2673
+ {
2674
+ "name": "transformer.h.19.ln.bias",
2675
+ "shape": [
2676
+ 2560
2677
+ ],
2678
+ "dtype": "float16",
2679
+ "format": "f32-to-bf16",
2680
+ "nbytes": 5120,
2681
+ "byteOffset": 26306560
2682
+ },
2683
+ {
2684
+ "name": "lm_head.ln.weight",
2685
+ "shape": [
2686
+ 2560
2687
+ ],
2688
+ "dtype": "float16",
2689
+ "format": "f32-to-bf16",
2690
+ "nbytes": 5120,
2691
+ "byteOffset": 26311680
2692
+ },
2693
+ {
2694
+ "name": "lm_head.ln.bias",
2695
+ "shape": [
2696
+ 2560
2697
+ ],
2698
+ "dtype": "float16",
2699
+ "format": "f32-to-bf16",
2700
+ "nbytes": 5120,
2701
+ "byteOffset": 26316800
2702
+ },
2703
+ {
2704
+ "name": "lm_head.linear.bias",
2705
+ "shape": [
2706
+ 51200
2707
+ ],
2708
+ "dtype": "float16",
2709
+ "format": "f32-to-bf16",
2710
+ "nbytes": 102400,
2711
+ "byteOffset": 26321920
2712
+ }
2713
+ ],
2714
+ "md5sum": "22ef24edb6636c86c84357b430fdc866"
2715
+ }
2716
+ ]
2717
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b6bd53aa3313123530993b40f2b3734b5d98d771302ddd90739dbcc5c532198
3
+ size 262144000
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8aee79a33adb578769f7d8113b65a259f2704b5980d9708963b7a8a497554a89
3
+ size 39321600
params_shard_10.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b35350dc160f3bc3d3bb91c5b49aa7960a47473bf5c9d2d506ac52dfc1344a3
3
+ size 52428800
params_shard_11.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b113006cd843e5f6a460af845cf5206ccf6577a5dd5971e115f129ba3c8e44f0
3
+ size 39321600
params_shard_12.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f4d747e050ca43696a0150e6cffdfb96c741762d76826a3a05fe533dd982bd3
3
+ size 52428800
params_shard_13.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d04ac2e7efcd017b5f103e87391c3de561ba1e0671aa5bb610e610f3a798833
3
+ size 52428800
params_shard_14.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e74b523bfe6e420bbb4b3aa9990dbdb9cbc6bf92c54e386c92b33ea9b3315cde
3
+ size 39321600
params_shard_15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:861a09dd9f62125e0407d7b38d2f170171713c743f9ca81b816f8c9a31eda2eb
3
+ size 26327040
params_shard_16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aef805d4c1c28808219317436b54c65689cc104865af161544d5035c6bcfbb4
3
+ size 52428800
params_shard_17.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:889310a30b409068462d0c4b20aa8c3019487da5d14557306366d7d5f72b24b7
3
+ size 52428800
params_shard_18.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:376ed5ee1d645ab5eb580703a19168d22bd163478aff6cc454f2de9e55ca716f
3
+ size 39321600
params_shard_19.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d04913de5c177e1e128fb12acb051e29a5c7742380d43cede02e8e65e86573f6
3
+ size 52428800
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f298527615d2e62c69c52e14ddbe130ce06518d6c42075e80a54ee978e6de94
3
+ size 52428800
params_shard_20.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6da35f0f15518211d339577de13e2d0900703a8a1fe0dca3d0fba7c94ac74f0d
3
+ size 52428800
params_shard_21.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a7826893a2fd641d243376318ece0406e6c4e53643df6abd6684f3392a8bce4
3
+ size 39321600
params_shard_22.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4fc47db6e8e7b3a9f3ad9466a2b70bfc76b0aa139a61d398d1f82cc0318af4b
3
+ size 26327040
params_shard_23.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a19c1df02d39c270a13f02861afb5c923fcd00682e53e746f5c96f9e306d412
3
+ size 52428800
params_shard_24.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e535ecfa48e972396e1073dcf74568ba84aaf1a1385daa45624161b8d4fcdbf
3
+ size 52428800
params_shard_25.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bc193b98b6f1085c6299a7c1cde8e774d89e14654cbbcfc0513633163732775
3
+ size 39321600
params_shard_26.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88a8032246a48fd6519254c41e0d8d3eba1f9fc8e946b37e5ee723bd3358cabd
3
+ size 52428800
params_shard_27.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:512a4013dc88cc2c0d873eb232a6529523e7c93213d7f0bd7c3084af26bdaca6
3
+ size 52428800
params_shard_28.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7c57c6005fe22a09cf8db99628bb4fec070c2a1a5bc4e234d7cf910b3ab1790
3
+ size 39321600
params_shard_29.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2308f52b6541d241a7709b45c20ebdd386b2f31b61d745e184b337fcc58e36f3
3
+ size 26327040
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d1eb83e63bcd5ce03dd062e482612e5a0bca29d5b0f1fbce471c33bdf99ae59
3
+ size 52428800
params_shard_30.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b9b442af815c19e315b640bc4136599438d4676b271a0073e24d3f9ee6cca87
3
+ size 52428800
params_shard_31.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:986edd16937ed42939b7c70dbeb8ee4ca11725b4e6b477188e1ba3b9f3e6785f
3
+ size 52428800
params_shard_32.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:556c9a63e2c04ef4d41b43c0c5377c3764b4d6554bc906c09792a7575b80eeca
3
+ size 39321600
params_shard_33.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35b623a7d484f15d3a037b15c632b5bdf9509092c5d5b96d2eac57899ec4a8b3
3
+ size 52428800
params_shard_34.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dbb237c029c62d057a3dca9e85e51cb2e898dd85f9436a7e99c249ea9847c0d
3
+ size 52428800
params_shard_35.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:077907058b6251aa60054ad05a851a512bbe2f8054cc481558e6c6811b43f6d3
3
+ size 39321600
params_shard_36.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fee9cafabd5ec2e78dcbef78c199ae785a24270b5e4c7af7d656f5826f4107b8
3
+ size 26327040
params_shard_37.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:971083ae28ec408f215d8a1ac1c0e86d8a83a1904aba72e9e25a40438fe521dd
3
+ size 52428800
params_shard_38.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:298270a764011b2a0d29e27a7e06a54211353ab8f96cca3c0b7a9599f1083f1b
3
+ size 52428800
params_shard_39.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7bbd4cc90e1e5d91730e3e4b54846193318afd8f95c5fe633ba7d57990905ad
3
+ size 39321600
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b820576d3c00b555a06ae1a9c0c0ef18d093417ac8f83b2003628c1548759f5
3
+ size 39321600
params_shard_40.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f567261a576bf1959baf20ea3d2e2944ba07236422409e3343c26ce40c278ea
3
+ size 52428800
params_shard_41.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13b6aaa04e9192ba841bf44a216f8e9d6005cdfd39b17b025597976143e24212
3
+ size 52428800
params_shard_42.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87eec956c9034b0f8c85709557130c6f4856ae45af8c80b8abf10ed7a1cff5f7
3
+ size 39321600
params_shard_43.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f944c07eb976b0bbb566b07ba8352b523fe572510cf533e988828367a83f2f4e
3
+ size 26327040
params_shard_44.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c0e20ff515e946e94764751566d7ffb12332b131561cef0e50fcb8a6e8c8fa2
3
+ size 52428800
params_shard_45.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5050808416639da8bd38ccba7b6ddb8bde30f8b2b04e834dc51fc2af5158ef8b
3
+ size 52428800
params_shard_46.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0005c63788aa2c983170e1ecd1725c7a0cfa24d79963ad1107c13d95582e2f65
3
+ size 39321600
params_shard_47.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9e654345527761d9d5c72c32848747c1ce250396d3be48f4383dadf6bb85510
3
+ size 52428800
params_shard_48.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:554311157434c8dbaf06cc0d61a2d1487c73d2fffea4d219313c2ce5ca8adc2c
3
+ size 52428800
params_shard_49.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02b37c605d8984f34c6f131acc8df436f6277c32f64c9d558cb39d042e5aafca
3
+ size 39321600
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed10da2b4307ec0b36386b9705a1b2f604a7000dcfab300d489c97e5043353db
3
+ size 52428800