renqiux0302 commited on
Commit
3939db2
·
verified ·
1 Parent(s): 6833f81

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,107 +1,21 @@
1
- # ChartX & ChartVLM
2
-
3
- <div align=center>
4
- <img src="https://github.com/UniModal4Reasoning/ChartVLM/blob/main/assets/ChartX-logo.png?raw=true" height="40%">
5
- </div>
6
-
7
-
8
- 所提出的ChartX有两个主要贡献:
9
- - (1) 为了全面评价目前多模态大模型在图表领域的能力,ChartX包含了多模态(图片,代码,CSV统计数据,文本描述),多任务(感知,图表信息提取,图表问答,图表描述,图表总结,图表重新渲染等),多学科(包含22个大类学科)的42K个高质量数据,并且我们验证了目前很多多模态大模型在ChartX评测集上的全面性能(包括,图表结构化信息提取,图表类别识别,图表转CSV Table,图表问答,图表洞察,图表重新渲染等任务)。
10
- - (2) ChartVLM作为在图表领域定制化开发的大模型,其利用结构化提取后的图表数据信息来辅助复杂推理任务,极大增加了大模型在图表问答等复杂推理任务上的**可解释性**,利用Instruction Adapter来根据用户指令动态选择需要执行的任务以及对应的模型模块。
11
-
12
- ChartX presents two primary contributions.
13
-
14
- - (1) To comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in chart domain, we construct an evaluation set covering multi-modal (image, code, csv, text description), multi-task, multi-disciplinary, high-quality chart data, and evaluate the performance of mainstream MLLMs.
15
- - (2) We develop ChartVLM, offering a new perspective on handling the multi-modal tasks that strongly depend on interpretable patterns such as reasoning tasks in the field of chart or geometric images. To augment the model’s interpretability in cognition tasks in chart domain, ChartVLM incorporates the intermediate chart representations such as CSV data, chart title, chart type, etc.
16
-
17
- <div align=center>
18
- <img src="https://github.com/UniModal4Reasoning/ChartVLM/blob/main/assets/motivation.png?raw=true" height="85%">
19
- </div>
20
-
21
- ------------------------
22
-
23
- ## 评估集介绍(Overall of Evaluation Set)
24
- 我们构建了包含4.8万条高质量、多模态、22学科、18类别、7任务的图表数据。每个图表数据都包含4种模态:图像,CSV数据,绘图python代码,任务和图表问答描述
25
-
26
- We collected 48K multi-modal chart data covering **22 topics**, **18 chart types**, and **7 tasks**. Each chart data within this dataset includes four modalities: image, CSV, python code, and text description.
27
-
28
-
29
- ## ChartX下载(ChartX Download)
30
-
31
- <details>
32
- <summary> Data Download</summary>
33
-
34
- Please download the official [ChartX Evaluation Set](https://drive.google.com/file/d/1d6zyH3kIwgepTqR0fc67xzyUtblrvOIX/view?usp=sharing) dataset and organize the downloaded files as follows:
35
- ```
36
- ChartX
37
- ├── 3D-Bar
38
- │ ├── code
39
- | ├── csv
40
- | ├── png
41
- | ├── txt
42
- ├── area_chart
43
- │ ├── code
44
- | ├── csv
45
- | ├── png
46
- | ├── txt
47
- ....
48
- ....
49
- ├── rose
50
- │ ├── code
51
- | ├── csv
52
- | ├── png
53
- | ├── txt
54
- ```
55
- </details>
56
-
57
-
58
- <details>
59
- <summary> Visualization of Data Distribution</summary>
60
-
61
- <div align=center>
62
- <img src="https://raw.githubusercontent.com/UniModal4Reasoning/ChartVLM/main/assets/tsne.png" height="85%">
63
- </div>
64
-
65
- </details>
66
-
67
- ------------------------
68
-
69
- <div align="center">
70
- <h1>ChartVLM<br></h1>
71
- </div>
72
-
73
- ## ChartVLM介绍(ChartVLM Overall):
74
- - **(1)** 为了提升图表模型在认知复杂推理任何上的可解释性,ChartVLM首先需要执行图表感知任务(目的是将一个图表形式的图表数据映射到结构化文本描述空间),然后基于提取到的结构化文本描述来进行下游的复杂推理和认知任务。
75
-
76
- - **(2)** 为了在上述pipeline系统中实现用户可交互性,ChartVLM提出了指令适配器,其可以根据用户指令来让ChartVLM知道用户目前的任务倾向是基础感知任务还是下游认知推理任务,基于此来动态地进行网络路由。
77
-
78
-
79
- - **(1)** To enhance the interpretability of the chart model in cognition tasks (e.g. answer questions based on chart image), ChartVLM first performs the base perception task (e.g. structural extraction from the given chart image to a predicted CSV data), and then, finishes other cognition tasks (e.g. chart redrawing, description, summary, and QA) based on the extracted structural data.
80
- - **(2)** To choose the task that users expect to perform according to the prompts they used, the instruction adapter is designed, which can cover a variety of user instructions as illustrated in this figure
81
-
82
- <div align=center>
83
- <img src="https://github.com/UniModal4Reasoning/ChartVLM/blob/main/assets/chartvlm.png?raw=true" height="85%">
84
- </div>
85
-
86
- ## 快速开始(Qiuckstart)
87
-
88
- ### 依赖项安装(Dependencies)
89
-
90
- ```base
91
- pip install torch==2.1.0 transformers==4.31.0 accelerate==0.24.1 sentencepiece==0.1.99 einops==0.6.1 triton==2.0.0
92
- ```
93
-
94
- ## 代码示例(example)
95
- ```
96
- from tools.ChartVLM import infer_ChartVLM
97
-
98
- if __name__ == '__main__':
99
- model = '${PATH_TO_PRETRAINED_MODEL}/ChartVLM/base/' #${PATH_TO_PRETRAINED_MODEL}
100
- image = './base_decoder/train/data/test.png'
101
- text = 'who has the largest value?'
102
-
103
- output = infer_ChartVLM(image, text, model)
104
-
105
- print(output)
106
-
107
- ```
 
1
+ ---
2
+ library_name: peft
3
+ ---
4
+ ## Training procedure
5
+
6
+
7
+ The following `bitsandbytes` quantization config was used during training:
8
+ - quant_method: bitsandbytes
9
+ - load_in_8bit: True
10
+ - load_in_4bit: False
11
+ - llm_int8_threshold: 6.0
12
+ - llm_int8_skip_modules: None
13
+ - llm_int8_enable_fp32_cpu_offload: False
14
+ - llm_int8_has_fp16_weight: False
15
+ - bnb_4bit_quant_type: fp4
16
+ - bnb_4bit_use_double_quant: False
17
+ - bnb_4bit_compute_dtype: float32
18
+ ### Framework versions
19
+
20
+
21
+ - PEFT 0.4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adapter_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_mapping": null,
3
+ "base_model_name_or_path": "/cpfs01/shared/ADLab/hug_ckpts/vicuna-7b-v1.5",
4
+ "bias": "none",
5
+ "fan_in_fan_out": false,
6
+ "inference_mode": true,
7
+ "init_lora_weights": true,
8
+ "layers_pattern": null,
9
+ "layers_to_transform": null,
10
+ "lora_alpha": 32,
11
+ "lora_dropout": 0.05,
12
+ "modules_to_save": null,
13
+ "peft_type": "LORA",
14
+ "r": 16,
15
+ "revision": null,
16
+ "target_modules": [
17
+ "q_proj",
18
+ "v_proj"
19
+ ],
20
+ "task_type": "CAUSAL_LM"
21
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5e1621f48d9ad8feb1d6d31050275f0aafd080c5c07153301fe2f48411f4406
3
+ size 443
base/README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ license: llama2
4
+ ---
5
+
6
+ # Vicuna Model Card
7
+
8
+ ## Model Details
9
+
10
+ Vicuna is a chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT.
11
+
12
+ - **Developed by:** [LMSYS](https://lmsys.org/)
13
+ - **Model type:** An auto-regressive language model based on the transformer architecture
14
+ - **License:** Llama 2 Community License Agreement
15
+ - **Finetuned from model:** [Llama 2](https://arxiv.org/abs/2307.09288)
16
+
17
+ ### Model Sources
18
+
19
+ - **Repository:** https://github.com/lm-sys/FastChat
20
+ - **Blog:** https://lmsys.org/blog/2023-03-30-vicuna/
21
+ - **Paper:** https://arxiv.org/abs/2306.05685
22
+ - **Demo:** https://chat.lmsys.org/
23
+
24
+ ## Uses
25
+
26
+ The primary use of Vicuna is research on large language models and chatbots.
27
+ The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
28
+
29
+ ## How to Get Started with the Model
30
+
31
+ - Command line interface: https://github.com/lm-sys/FastChat#vicuna-weights
32
+ - APIs (OpenAI API, Huggingface API): https://github.com/lm-sys/FastChat/tree/main#api
33
+
34
+ ## Training Details
35
+
36
+ Vicuna v1.5 is fine-tuned from Llama 2 with supervised instruction fine-tuning.
37
+ The training data is around 125K conversations collected from ShareGPT.com.
38
+ See more details in the "Training Details of Vicuna Models" section in the appendix of this [paper](https://arxiv.org/pdf/2306.05685.pdf).
39
+
40
+ ## Evaluation
41
+
42
+ ![Evaluation Results](https://github.com/lm-sys/lm-sys.github.io/blob/main/public/images/webdata/vicuna_v1.5_eval.png?raw=true)
43
+
44
+ Vicuna is evaluated with standard benchmarks, human preference, and LLM-as-a-judge. See more details in this [paper](https://arxiv.org/pdf/2306.05685.pdf) and [leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard).
45
+
46
+ ## Difference between different versions of Vicuna
47
+
48
+ See [vicuna_weights_version.md](https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md)
base/config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "vicuna-7b-v1.5",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 11008,
12
+ "max_position_embeddings": 4096,
13
+ "model_type": "llama",
14
+ "num_attention_heads": 32,
15
+ "num_hidden_layers": 32,
16
+ "num_key_value_heads": 32,
17
+ "pad_token_id": 0,
18
+ "pretraining_tp": 1,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_scaling": null,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.31.0",
24
+ "use_cache": true,
25
+ "vocab_size": 32000
26
+ }
base/generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "max_length": 4096,
5
+ "pad_token_id": 0,
6
+ "temperature": 0.9,
7
+ "top_p": 0.6,
8
+ "transformers_version": "4.31.0"
9
+ }
base/gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
base/pytorch_model-00001-of-00002.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4133d2fcc5f31286881ea50806d95b721d016b533036a99dedce3f8fe88520e6
3
+ size 9976634558
base/pytorch_model-00002-of-00002.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d261d3c35e92d3070d1e61ed821ebfca812a847d2a880757d82728acf005c5ac
3
+ size 3500315539
base/pytorch_model.bin.index.json ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 13476839424
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00002-of-00002.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00002.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
16
+ "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
17
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
18
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
19
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
20
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
21
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
22
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
23
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
24
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
25
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
26
+ "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
27
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
28
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
29
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
30
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
31
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
32
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
33
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
34
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
35
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
36
+ "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
37
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
38
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
39
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
40
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
41
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
42
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
43
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
44
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
45
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
46
+ "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
47
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
48
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
49
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
50
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
51
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
52
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
53
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
54
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
55
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
56
+ "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
57
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
58
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
59
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
60
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
61
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
62
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
63
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
64
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
65
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
66
+ "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
67
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
68
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
69
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
70
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
71
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
72
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
73
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
74
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
75
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
76
+ "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
77
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
78
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
79
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
80
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
81
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
82
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
83
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
84
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
85
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
86
+ "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
87
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
88
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
89
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
90
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
91
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
92
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
93
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
94
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
95
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
96
+ "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
97
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
98
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
99
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
100
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
101
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
102
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
103
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
104
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
105
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
106
+ "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
107
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
108
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
109
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
110
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
111
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
112
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
113
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
114
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
115
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
116
+ "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
117
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
118
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
119
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
120
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
121
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
122
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
123
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
124
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
125
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
126
+ "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
127
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
128
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
129
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
130
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
131
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
132
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
133
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
134
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
135
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
136
+ "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
137
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
138
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
139
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
140
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
141
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
142
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
143
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
144
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
145
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
146
+ "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
147
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
148
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
149
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
150
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
151
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
152
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
153
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
154
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
155
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
156
+ "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
157
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
158
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
159
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
160
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
161
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
162
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
163
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
164
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
165
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
166
+ "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
167
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
168
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
169
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
170
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
171
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
172
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
173
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
174
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
175
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
176
+ "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
177
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
178
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
179
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
180
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
181
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
182
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
183
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
184
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
185
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
186
+ "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00002.bin",
187
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
188
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
189
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
190
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
191
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
192
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
193
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
194
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
195
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
196
+ "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00002.bin",
197
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
198
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
199
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
200
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
201
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
202
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
203
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
204
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
205
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
206
+ "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00002.bin",
207
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
208
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
209
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
210
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
211
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
212
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
213
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
214
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
215
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
216
+ "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00002.bin",
217
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
218
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
219
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
220
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
221
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
222
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
223
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
224
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
225
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
226
+ "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00002.bin",
227
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
228
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
229
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
230
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
231
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
232
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
233
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
234
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
235
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
236
+ "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00002.bin",
237
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
238
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
239
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
240
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
241
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
242
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
243
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
244
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
245
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
246
+ "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
247
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
248
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
249
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
250
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
251
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
252
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
253
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
254
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
255
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
256
+ "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00002.bin",
257
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
258
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
259
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
260
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
261
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
262
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
263
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
264
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
265
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
266
+ "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00002.bin",
267
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
268
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
269
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
270
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
271
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
272
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
273
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
274
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
275
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
276
+ "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
277
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
278
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
279
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
280
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
281
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
282
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
283
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
284
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
285
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
286
+ "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
287
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
288
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
289
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
290
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
291
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
292
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
293
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
294
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
295
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
296
+ "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
297
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
298
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
299
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
300
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
301
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
302
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
303
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
304
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
305
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
306
+ "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
307
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
308
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
309
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
310
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
311
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
312
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
313
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
314
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
315
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
316
+ "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
317
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
318
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
319
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
320
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
321
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
322
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
323
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
324
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
325
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
326
+ "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00002.bin",
327
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
328
+ "model.norm.weight": "pytorch_model-00002-of-00002.bin"
329
+ }
330
+ }
base/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<unk>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
base/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
base/tokenizer_config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<s>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "legacy": false,
22
+ "model_max_length": 4096,
23
+ "pad_token": null,
24
+ "padding_side": "right",
25
+ "sp_model_kwargs": {},
26
+ "tokenizer_class": "LlamaTokenizer",
27
+ "unk_token": {
28
+ "__type": "AddedToken",
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ }
35
+ }
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbcdd53f217ad937d0347e62c442748acb981f5b3e963284f09bd1317f16546f
3
+ size 67216517
rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99929bd03b48e29ef637de6ede9dd8f967124847190195d30b291479474bd0b6
3
+ size 21687
rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba9110abd238d877fe40302d3a99ab2a07d0a08a3046bcf8659666a97cf3c4bd
3
+ size 21687
rng_state_2.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4691041a42ddadbc0735b74ddb085e04a5e4636f3f3a625dff055f67734a886
3
+ size 21687
rng_state_3.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93211ed628a3ffb6234f03a6de1cacf8d171d9c8f5ba068610b72fcf413643e7
3
+ size 21687
rng_state_4.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44ee2aab0f3ad1ccb9a537bac135dd2b5a88a9c5bfb76d04070e78438ba8eb8e
3
+ size 21687
rng_state_5.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c3524f72d69da29ddc7e1b7d2020eaf3c17fdb7760b26b572369cd0febad4de
3
+ size 21687
rng_state_6.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aec123ea87d4cc11d173df7c3aee5d70636a30ffaf7da927e3c3bde634702b9d
3
+ size 21687
rng_state_7.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb9f4d99b6fdf30377370b405acacb904ee76b422c17d47ee47f0e09827fb2af
3
+ size 21687
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c767e531f5cc92dcf2b7622466a92b93376b191b8847a14de57d9649808db98
3
+ size 627
trainer_state.json ADDED
@@ -0,0 +1,787 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.3883955776691437,
3
+ "best_model_checkpoint": "exp/vicuna-7b-lora-sft-code_qa_desc_summ_triplet_r_16_alpha_32_8GPUs-0116/checkpoint-1200",
4
+ "epoch": 4.375569735642662,
5
+ "eval_steps": 200,
6
+ "global_step": 1200,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.04,
13
+ "learning_rate": 2.9999999999999997e-05,
14
+ "loss": 1.4343,
15
+ "step": 10
16
+ },
17
+ {
18
+ "epoch": 0.07,
19
+ "learning_rate": 5.9999999999999995e-05,
20
+ "loss": 1.4848,
21
+ "step": 20
22
+ },
23
+ {
24
+ "epoch": 0.11,
25
+ "learning_rate": 8.999999999999999e-05,
26
+ "loss": 1.1941,
27
+ "step": 30
28
+ },
29
+ {
30
+ "epoch": 0.15,
31
+ "learning_rate": 0.00011999999999999999,
32
+ "loss": 0.8226,
33
+ "step": 40
34
+ },
35
+ {
36
+ "epoch": 0.18,
37
+ "learning_rate": 0.00015,
38
+ "loss": 0.6671,
39
+ "step": 50
40
+ },
41
+ {
42
+ "epoch": 0.22,
43
+ "learning_rate": 0.00017999999999999998,
44
+ "loss": 0.5676,
45
+ "step": 60
46
+ },
47
+ {
48
+ "epoch": 0.26,
49
+ "learning_rate": 0.00020999999999999998,
50
+ "loss": 0.5655,
51
+ "step": 70
52
+ },
53
+ {
54
+ "epoch": 0.29,
55
+ "learning_rate": 0.00023999999999999998,
56
+ "loss": 0.5251,
57
+ "step": 80
58
+ },
59
+ {
60
+ "epoch": 0.33,
61
+ "learning_rate": 0.00027,
62
+ "loss": 0.4845,
63
+ "step": 90
64
+ },
65
+ {
66
+ "epoch": 0.36,
67
+ "learning_rate": 0.0003,
68
+ "loss": 0.481,
69
+ "step": 100
70
+ },
71
+ {
72
+ "epoch": 0.4,
73
+ "learning_rate": 0.0002976377952755905,
74
+ "loss": 0.4565,
75
+ "step": 110
76
+ },
77
+ {
78
+ "epoch": 0.44,
79
+ "learning_rate": 0.0002952755905511811,
80
+ "loss": 0.4625,
81
+ "step": 120
82
+ },
83
+ {
84
+ "epoch": 0.47,
85
+ "learning_rate": 0.00029291338582677163,
86
+ "loss": 0.4584,
87
+ "step": 130
88
+ },
89
+ {
90
+ "epoch": 0.51,
91
+ "learning_rate": 0.00029055118110236217,
92
+ "loss": 0.4425,
93
+ "step": 140
94
+ },
95
+ {
96
+ "epoch": 0.55,
97
+ "learning_rate": 0.0002881889763779527,
98
+ "loss": 0.4573,
99
+ "step": 150
100
+ },
101
+ {
102
+ "epoch": 0.58,
103
+ "learning_rate": 0.0002858267716535433,
104
+ "loss": 0.4361,
105
+ "step": 160
106
+ },
107
+ {
108
+ "epoch": 0.62,
109
+ "learning_rate": 0.00028346456692913383,
110
+ "loss": 0.4396,
111
+ "step": 170
112
+ },
113
+ {
114
+ "epoch": 0.66,
115
+ "learning_rate": 0.00028110236220472436,
116
+ "loss": 0.4391,
117
+ "step": 180
118
+ },
119
+ {
120
+ "epoch": 0.69,
121
+ "learning_rate": 0.00027874015748031495,
122
+ "loss": 0.418,
123
+ "step": 190
124
+ },
125
+ {
126
+ "epoch": 0.73,
127
+ "learning_rate": 0.0002763779527559055,
128
+ "loss": 0.4469,
129
+ "step": 200
130
+ },
131
+ {
132
+ "epoch": 0.73,
133
+ "eval_loss": 0.4269736409187317,
134
+ "eval_runtime": 19.352,
135
+ "eval_samples_per_second": 103.348,
136
+ "eval_steps_per_second": 1.654,
137
+ "step": 200
138
+ },
139
+ {
140
+ "epoch": 0.77,
141
+ "learning_rate": 0.0002740157480314961,
142
+ "loss": 0.4149,
143
+ "step": 210
144
+ },
145
+ {
146
+ "epoch": 0.8,
147
+ "learning_rate": 0.00027165354330708656,
148
+ "loss": 0.428,
149
+ "step": 220
150
+ },
151
+ {
152
+ "epoch": 0.84,
153
+ "learning_rate": 0.00026929133858267715,
154
+ "loss": 0.4248,
155
+ "step": 230
156
+ },
157
+ {
158
+ "epoch": 0.88,
159
+ "learning_rate": 0.0002669291338582677,
160
+ "loss": 0.4249,
161
+ "step": 240
162
+ },
163
+ {
164
+ "epoch": 0.91,
165
+ "learning_rate": 0.0002645669291338582,
166
+ "loss": 0.4331,
167
+ "step": 250
168
+ },
169
+ {
170
+ "epoch": 0.95,
171
+ "learning_rate": 0.0002622047244094488,
172
+ "loss": 0.4192,
173
+ "step": 260
174
+ },
175
+ {
176
+ "epoch": 0.98,
177
+ "learning_rate": 0.00025984251968503934,
178
+ "loss": 0.4204,
179
+ "step": 270
180
+ },
181
+ {
182
+ "epoch": 1.02,
183
+ "learning_rate": 0.00025748031496062993,
184
+ "loss": 0.4318,
185
+ "step": 280
186
+ },
187
+ {
188
+ "epoch": 1.06,
189
+ "learning_rate": 0.00025511811023622047,
190
+ "loss": 0.4229,
191
+ "step": 290
192
+ },
193
+ {
194
+ "epoch": 1.09,
195
+ "learning_rate": 0.000252755905511811,
196
+ "loss": 0.4214,
197
+ "step": 300
198
+ },
199
+ {
200
+ "epoch": 1.13,
201
+ "learning_rate": 0.00025039370078740154,
202
+ "loss": 0.416,
203
+ "step": 310
204
+ },
205
+ {
206
+ "epoch": 1.17,
207
+ "learning_rate": 0.00024803149606299207,
208
+ "loss": 0.4199,
209
+ "step": 320
210
+ },
211
+ {
212
+ "epoch": 1.2,
213
+ "learning_rate": 0.00024566929133858266,
214
+ "loss": 0.4218,
215
+ "step": 330
216
+ },
217
+ {
218
+ "epoch": 1.24,
219
+ "learning_rate": 0.0002433070866141732,
220
+ "loss": 0.4113,
221
+ "step": 340
222
+ },
223
+ {
224
+ "epoch": 1.28,
225
+ "learning_rate": 0.00024094488188976376,
226
+ "loss": 0.4185,
227
+ "step": 350
228
+ },
229
+ {
230
+ "epoch": 1.31,
231
+ "learning_rate": 0.00023858267716535432,
232
+ "loss": 0.4168,
233
+ "step": 360
234
+ },
235
+ {
236
+ "epoch": 1.35,
237
+ "learning_rate": 0.00023622047244094488,
238
+ "loss": 0.4162,
239
+ "step": 370
240
+ },
241
+ {
242
+ "epoch": 1.39,
243
+ "learning_rate": 0.0002338582677165354,
244
+ "loss": 0.4175,
245
+ "step": 380
246
+ },
247
+ {
248
+ "epoch": 1.42,
249
+ "learning_rate": 0.00023149606299212595,
250
+ "loss": 0.4045,
251
+ "step": 390
252
+ },
253
+ {
254
+ "epoch": 1.46,
255
+ "learning_rate": 0.00022913385826771652,
256
+ "loss": 0.4152,
257
+ "step": 400
258
+ },
259
+ {
260
+ "epoch": 1.46,
261
+ "eval_loss": 0.4086858630180359,
262
+ "eval_runtime": 19.2818,
263
+ "eval_samples_per_second": 103.725,
264
+ "eval_steps_per_second": 1.66,
265
+ "step": 400
266
+ },
267
+ {
268
+ "epoch": 1.49,
269
+ "learning_rate": 0.00022677165354330705,
270
+ "loss": 0.415,
271
+ "step": 410
272
+ },
273
+ {
274
+ "epoch": 1.53,
275
+ "learning_rate": 0.00022440944881889761,
276
+ "loss": 0.4091,
277
+ "step": 420
278
+ },
279
+ {
280
+ "epoch": 1.57,
281
+ "learning_rate": 0.00022204724409448818,
282
+ "loss": 0.4132,
283
+ "step": 430
284
+ },
285
+ {
286
+ "epoch": 1.6,
287
+ "learning_rate": 0.00021968503937007874,
288
+ "loss": 0.3985,
289
+ "step": 440
290
+ },
291
+ {
292
+ "epoch": 1.64,
293
+ "learning_rate": 0.00021732283464566927,
294
+ "loss": 0.4056,
295
+ "step": 450
296
+ },
297
+ {
298
+ "epoch": 1.68,
299
+ "learning_rate": 0.0002149606299212598,
300
+ "loss": 0.4005,
301
+ "step": 460
302
+ },
303
+ {
304
+ "epoch": 1.71,
305
+ "learning_rate": 0.00021259842519685037,
306
+ "loss": 0.4059,
307
+ "step": 470
308
+ },
309
+ {
310
+ "epoch": 1.75,
311
+ "learning_rate": 0.0002102362204724409,
312
+ "loss": 0.409,
313
+ "step": 480
314
+ },
315
+ {
316
+ "epoch": 1.79,
317
+ "learning_rate": 0.00020787401574803147,
318
+ "loss": 0.4031,
319
+ "step": 490
320
+ },
321
+ {
322
+ "epoch": 1.82,
323
+ "learning_rate": 0.00020551181102362203,
324
+ "loss": 0.4097,
325
+ "step": 500
326
+ },
327
+ {
328
+ "epoch": 1.86,
329
+ "learning_rate": 0.0002031496062992126,
330
+ "loss": 0.4017,
331
+ "step": 510
332
+ },
333
+ {
334
+ "epoch": 1.9,
335
+ "learning_rate": 0.00020078740157480313,
336
+ "loss": 0.4026,
337
+ "step": 520
338
+ },
339
+ {
340
+ "epoch": 1.93,
341
+ "learning_rate": 0.0001984251968503937,
342
+ "loss": 0.4106,
343
+ "step": 530
344
+ },
345
+ {
346
+ "epoch": 1.97,
347
+ "learning_rate": 0.00019606299212598423,
348
+ "loss": 0.395,
349
+ "step": 540
350
+ },
351
+ {
352
+ "epoch": 2.01,
353
+ "learning_rate": 0.0001937007874015748,
354
+ "loss": 0.3988,
355
+ "step": 550
356
+ },
357
+ {
358
+ "epoch": 2.04,
359
+ "learning_rate": 0.00019133858267716532,
360
+ "loss": 0.409,
361
+ "step": 560
362
+ },
363
+ {
364
+ "epoch": 2.08,
365
+ "learning_rate": 0.00018897637795275589,
366
+ "loss": 0.3997,
367
+ "step": 570
368
+ },
369
+ {
370
+ "epoch": 2.11,
371
+ "learning_rate": 0.00018661417322834645,
372
+ "loss": 0.4007,
373
+ "step": 580
374
+ },
375
+ {
376
+ "epoch": 2.15,
377
+ "learning_rate": 0.000184251968503937,
378
+ "loss": 0.3905,
379
+ "step": 590
380
+ },
381
+ {
382
+ "epoch": 2.19,
383
+ "learning_rate": 0.00018188976377952755,
384
+ "loss": 0.4005,
385
+ "step": 600
386
+ },
387
+ {
388
+ "epoch": 2.19,
389
+ "eval_loss": 0.40032637119293213,
390
+ "eval_runtime": 19.2818,
391
+ "eval_samples_per_second": 103.725,
392
+ "eval_steps_per_second": 1.66,
393
+ "step": 600
394
+ },
395
+ {
396
+ "epoch": 2.22,
397
+ "learning_rate": 0.0001795275590551181,
398
+ "loss": 0.3983,
399
+ "step": 610
400
+ },
401
+ {
402
+ "epoch": 2.26,
403
+ "learning_rate": 0.00017716535433070864,
404
+ "loss": 0.3881,
405
+ "step": 620
406
+ },
407
+ {
408
+ "epoch": 2.3,
409
+ "learning_rate": 0.00017480314960629918,
410
+ "loss": 0.4008,
411
+ "step": 630
412
+ },
413
+ {
414
+ "epoch": 2.33,
415
+ "learning_rate": 0.00017244094488188974,
416
+ "loss": 0.3927,
417
+ "step": 640
418
+ },
419
+ {
420
+ "epoch": 2.37,
421
+ "learning_rate": 0.0001700787401574803,
422
+ "loss": 0.4005,
423
+ "step": 650
424
+ },
425
+ {
426
+ "epoch": 2.41,
427
+ "learning_rate": 0.00016771653543307086,
428
+ "loss": 0.3962,
429
+ "step": 660
430
+ },
431
+ {
432
+ "epoch": 2.44,
433
+ "learning_rate": 0.0001653543307086614,
434
+ "loss": 0.3902,
435
+ "step": 670
436
+ },
437
+ {
438
+ "epoch": 2.48,
439
+ "learning_rate": 0.00016299212598425196,
440
+ "loss": 0.3911,
441
+ "step": 680
442
+ },
443
+ {
444
+ "epoch": 2.52,
445
+ "learning_rate": 0.00016062992125984252,
446
+ "loss": 0.3891,
447
+ "step": 690
448
+ },
449
+ {
450
+ "epoch": 2.55,
451
+ "learning_rate": 0.00015826771653543303,
452
+ "loss": 0.3939,
453
+ "step": 700
454
+ },
455
+ {
456
+ "epoch": 2.59,
457
+ "learning_rate": 0.0001559055118110236,
458
+ "loss": 0.4001,
459
+ "step": 710
460
+ },
461
+ {
462
+ "epoch": 2.63,
463
+ "learning_rate": 0.00015354330708661416,
464
+ "loss": 0.3918,
465
+ "step": 720
466
+ },
467
+ {
468
+ "epoch": 2.66,
469
+ "learning_rate": 0.00015118110236220472,
470
+ "loss": 0.3979,
471
+ "step": 730
472
+ },
473
+ {
474
+ "epoch": 2.7,
475
+ "learning_rate": 0.00014881889763779525,
476
+ "loss": 0.3793,
477
+ "step": 740
478
+ },
479
+ {
480
+ "epoch": 2.73,
481
+ "learning_rate": 0.00014645669291338582,
482
+ "loss": 0.3879,
483
+ "step": 750
484
+ },
485
+ {
486
+ "epoch": 2.77,
487
+ "learning_rate": 0.00014409448818897635,
488
+ "loss": 0.3915,
489
+ "step": 760
490
+ },
491
+ {
492
+ "epoch": 2.81,
493
+ "learning_rate": 0.00014173228346456691,
494
+ "loss": 0.3831,
495
+ "step": 770
496
+ },
497
+ {
498
+ "epoch": 2.84,
499
+ "learning_rate": 0.00013937007874015748,
500
+ "loss": 0.3838,
501
+ "step": 780
502
+ },
503
+ {
504
+ "epoch": 2.88,
505
+ "learning_rate": 0.00013700787401574804,
506
+ "loss": 0.3734,
507
+ "step": 790
508
+ },
509
+ {
510
+ "epoch": 2.92,
511
+ "learning_rate": 0.00013464566929133857,
512
+ "loss": 0.3872,
513
+ "step": 800
514
+ },
515
+ {
516
+ "epoch": 2.92,
517
+ "eval_loss": 0.3944130539894104,
518
+ "eval_runtime": 19.2596,
519
+ "eval_samples_per_second": 103.844,
520
+ "eval_steps_per_second": 1.662,
521
+ "step": 800
522
+ },
523
+ {
524
+ "epoch": 2.95,
525
+ "learning_rate": 0.0001322834645669291,
526
+ "loss": 0.386,
527
+ "step": 810
528
+ },
529
+ {
530
+ "epoch": 2.99,
531
+ "learning_rate": 0.00012992125984251967,
532
+ "loss": 0.3799,
533
+ "step": 820
534
+ },
535
+ {
536
+ "epoch": 3.03,
537
+ "learning_rate": 0.00012755905511811023,
538
+ "loss": 0.3895,
539
+ "step": 830
540
+ },
541
+ {
542
+ "epoch": 3.06,
543
+ "learning_rate": 0.00012519685039370077,
544
+ "loss": 0.3852,
545
+ "step": 840
546
+ },
547
+ {
548
+ "epoch": 3.1,
549
+ "learning_rate": 0.00012283464566929133,
550
+ "loss": 0.3879,
551
+ "step": 850
552
+ },
553
+ {
554
+ "epoch": 3.14,
555
+ "learning_rate": 0.00012047244094488188,
556
+ "loss": 0.3892,
557
+ "step": 860
558
+ },
559
+ {
560
+ "epoch": 3.17,
561
+ "learning_rate": 0.00011811023622047244,
562
+ "loss": 0.3801,
563
+ "step": 870
564
+ },
565
+ {
566
+ "epoch": 3.21,
567
+ "learning_rate": 0.00011574803149606298,
568
+ "loss": 0.3802,
569
+ "step": 880
570
+ },
571
+ {
572
+ "epoch": 3.25,
573
+ "learning_rate": 0.00011338582677165353,
574
+ "loss": 0.3863,
575
+ "step": 890
576
+ },
577
+ {
578
+ "epoch": 3.28,
579
+ "learning_rate": 0.00011102362204724409,
580
+ "loss": 0.3792,
581
+ "step": 900
582
+ },
583
+ {
584
+ "epoch": 3.32,
585
+ "learning_rate": 0.00010866141732283464,
586
+ "loss": 0.3923,
587
+ "step": 910
588
+ },
589
+ {
590
+ "epoch": 3.35,
591
+ "learning_rate": 0.00010629921259842519,
592
+ "loss": 0.3753,
593
+ "step": 920
594
+ },
595
+ {
596
+ "epoch": 3.39,
597
+ "learning_rate": 0.00010393700787401573,
598
+ "loss": 0.3777,
599
+ "step": 930
600
+ },
601
+ {
602
+ "epoch": 3.43,
603
+ "learning_rate": 0.0001015748031496063,
604
+ "loss": 0.3849,
605
+ "step": 940
606
+ },
607
+ {
608
+ "epoch": 3.46,
609
+ "learning_rate": 9.921259842519685e-05,
610
+ "loss": 0.3775,
611
+ "step": 950
612
+ },
613
+ {
614
+ "epoch": 3.5,
615
+ "learning_rate": 9.68503937007874e-05,
616
+ "loss": 0.3853,
617
+ "step": 960
618
+ },
619
+ {
620
+ "epoch": 3.54,
621
+ "learning_rate": 9.448818897637794e-05,
622
+ "loss": 0.3719,
623
+ "step": 970
624
+ },
625
+ {
626
+ "epoch": 3.57,
627
+ "learning_rate": 9.21259842519685e-05,
628
+ "loss": 0.3779,
629
+ "step": 980
630
+ },
631
+ {
632
+ "epoch": 3.61,
633
+ "learning_rate": 8.976377952755905e-05,
634
+ "loss": 0.3921,
635
+ "step": 990
636
+ },
637
+ {
638
+ "epoch": 3.65,
639
+ "learning_rate": 8.740157480314959e-05,
640
+ "loss": 0.3776,
641
+ "step": 1000
642
+ },
643
+ {
644
+ "epoch": 3.65,
645
+ "eval_loss": 0.3908761739730835,
646
+ "eval_runtime": 19.2678,
647
+ "eval_samples_per_second": 103.8,
648
+ "eval_steps_per_second": 1.661,
649
+ "step": 1000
650
+ },
651
+ {
652
+ "epoch": 3.68,
653
+ "learning_rate": 8.503937007874015e-05,
654
+ "loss": 0.3889,
655
+ "step": 1010
656
+ },
657
+ {
658
+ "epoch": 3.72,
659
+ "learning_rate": 8.26771653543307e-05,
660
+ "loss": 0.3819,
661
+ "step": 1020
662
+ },
663
+ {
664
+ "epoch": 3.76,
665
+ "learning_rate": 8.031496062992126e-05,
666
+ "loss": 0.3758,
667
+ "step": 1030
668
+ },
669
+ {
670
+ "epoch": 3.79,
671
+ "learning_rate": 7.79527559055118e-05,
672
+ "loss": 0.3753,
673
+ "step": 1040
674
+ },
675
+ {
676
+ "epoch": 3.83,
677
+ "learning_rate": 7.559055118110236e-05,
678
+ "loss": 0.3737,
679
+ "step": 1050
680
+ },
681
+ {
682
+ "epoch": 3.87,
683
+ "learning_rate": 7.322834645669291e-05,
684
+ "loss": 0.3833,
685
+ "step": 1060
686
+ },
687
+ {
688
+ "epoch": 3.9,
689
+ "learning_rate": 7.086614173228346e-05,
690
+ "loss": 0.3625,
691
+ "step": 1070
692
+ },
693
+ {
694
+ "epoch": 3.94,
695
+ "learning_rate": 6.850393700787402e-05,
696
+ "loss": 0.3809,
697
+ "step": 1080
698
+ },
699
+ {
700
+ "epoch": 3.97,
701
+ "learning_rate": 6.614173228346455e-05,
702
+ "loss": 0.3751,
703
+ "step": 1090
704
+ },
705
+ {
706
+ "epoch": 4.01,
707
+ "learning_rate": 6.377952755905512e-05,
708
+ "loss": 0.3776,
709
+ "step": 1100
710
+ },
711
+ {
712
+ "epoch": 4.05,
713
+ "learning_rate": 6.141732283464567e-05,
714
+ "loss": 0.3748,
715
+ "step": 1110
716
+ },
717
+ {
718
+ "epoch": 4.08,
719
+ "learning_rate": 5.905511811023622e-05,
720
+ "loss": 0.3636,
721
+ "step": 1120
722
+ },
723
+ {
724
+ "epoch": 4.12,
725
+ "learning_rate": 5.669291338582676e-05,
726
+ "loss": 0.372,
727
+ "step": 1130
728
+ },
729
+ {
730
+ "epoch": 4.16,
731
+ "learning_rate": 5.433070866141732e-05,
732
+ "loss": 0.3795,
733
+ "step": 1140
734
+ },
735
+ {
736
+ "epoch": 4.19,
737
+ "learning_rate": 5.196850393700787e-05,
738
+ "loss": 0.3632,
739
+ "step": 1150
740
+ },
741
+ {
742
+ "epoch": 4.23,
743
+ "learning_rate": 4.960629921259842e-05,
744
+ "loss": 0.3806,
745
+ "step": 1160
746
+ },
747
+ {
748
+ "epoch": 4.27,
749
+ "learning_rate": 4.724409448818897e-05,
750
+ "loss": 0.3732,
751
+ "step": 1170
752
+ },
753
+ {
754
+ "epoch": 4.3,
755
+ "learning_rate": 4.488188976377953e-05,
756
+ "loss": 0.3818,
757
+ "step": 1180
758
+ },
759
+ {
760
+ "epoch": 4.34,
761
+ "learning_rate": 4.2519685039370076e-05,
762
+ "loss": 0.3766,
763
+ "step": 1190
764
+ },
765
+ {
766
+ "epoch": 4.38,
767
+ "learning_rate": 4.015748031496063e-05,
768
+ "loss": 0.3587,
769
+ "step": 1200
770
+ },
771
+ {
772
+ "epoch": 4.38,
773
+ "eval_loss": 0.3883955776691437,
774
+ "eval_runtime": 19.3219,
775
+ "eval_samples_per_second": 103.51,
776
+ "eval_steps_per_second": 1.656,
777
+ "step": 1200
778
+ }
779
+ ],
780
+ "logging_steps": 10,
781
+ "max_steps": 1370,
782
+ "num_train_epochs": 5,
783
+ "save_steps": 200,
784
+ "total_flos": 2.2400975031249142e+18,
785
+ "trial_name": null,
786
+ "trial_params": null
787
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f06e9a16a0ff4f4a3e1848a5ee5ca4d1a18dde6f70aaad64595a234fd7300b7f
3
+ size 4155