gate369 commited on
Commit
c6fb303
·
verified ·
1 Parent(s): 2015c27

Model save

Browse files
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: liminerity/Bitnet-Mistral.0.2-v3
3
+ tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: Bitnet-Mistral.0.2-v5
7
+ results: []
8
+ ---
9
+
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
+
13
+ # Bitnet-Mistral.0.2-v5
14
+
15
+ This model is a fine-tuned version of [liminerity/Bitnet-Mistral.0.2-v3](https://huggingface.co/liminerity/Bitnet-Mistral.0.2-v3) on an unknown dataset.
16
+ It achieves the following results on the evaluation set:
17
+ - Loss: 4.4220
18
+
19
+ ## Model description
20
+
21
+ More information needed
22
+
23
+ ## Intended uses & limitations
24
+
25
+ More information needed
26
+
27
+ ## Training and evaluation data
28
+
29
+ More information needed
30
+
31
+ ## Training procedure
32
+
33
+ ### Training hyperparameters
34
+
35
+ The following hyperparameters were used during training:
36
+ - learning_rate: 5e-05
37
+ - train_batch_size: 4
38
+ - eval_batch_size: 16
39
+ - seed: 42
40
+ - gradient_accumulation_steps: 16
41
+ - total_train_batch_size: 64
42
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
43
+ - lr_scheduler_type: cosine
44
+ - training_steps: 1000
45
+
46
+ ### Training results
47
+
48
+ | Training Loss | Epoch | Step | Validation Loss |
49
+ |:-------------:|:------:|:----:|:---------------:|
50
+ | 4.6207 | 0.0226 | 25 | 4.5769 |
51
+ | 4.3648 | 0.0451 | 50 | 4.5716 |
52
+ | 4.4694 | 0.0677 | 75 | 4.5629 |
53
+ | 4.1782 | 0.0903 | 100 | 4.5555 |
54
+ | 4.4231 | 0.1128 | 125 | 4.5486 |
55
+ | 4.2572 | 0.1354 | 150 | 4.5546 |
56
+ | 4.3837 | 0.1580 | 175 | 4.5466 |
57
+ | 4.1915 | 0.1805 | 200 | 4.5419 |
58
+ | 4.3894 | 0.2031 | 225 | 4.5257 |
59
+ | 4.2447 | 0.2257 | 250 | 4.5235 |
60
+ | 4.0355 | 0.2483 | 275 | 4.5236 |
61
+ | 4.1873 | 0.2708 | 300 | 4.5211 |
62
+ | 4.3891 | 0.2934 | 325 | 4.5078 |
63
+ | 4.1322 | 0.3160 | 350 | 4.5019 |
64
+ | 4.0357 | 0.3385 | 375 | 4.5051 |
65
+ | 4.3401 | 0.3611 | 400 | 4.4921 |
66
+ | 4.3848 | 0.3837 | 425 | 4.4903 |
67
+ | 4.305 | 0.4062 | 450 | 4.4789 |
68
+ | 4.2776 | 0.4288 | 475 | 4.4765 |
69
+ | 4.1802 | 0.4514 | 500 | 4.4727 |
70
+ | 4.0785 | 0.4739 | 525 | 4.4674 |
71
+ | 4.0607 | 0.4965 | 550 | 4.4623 |
72
+ | 3.9385 | 0.5191 | 575 | 4.4611 |
73
+ | 4.194 | 0.5416 | 600 | 4.4565 |
74
+ | 4.277 | 0.5642 | 625 | 4.4478 |
75
+ | 4.1751 | 0.5868 | 650 | 4.4457 |
76
+ | 4.0422 | 0.6093 | 675 | 4.4428 |
77
+ | 4.1503 | 0.6319 | 700 | 4.4406 |
78
+ | 4.0552 | 0.6545 | 725 | 4.4366 |
79
+ | 4.4017 | 0.6770 | 750 | 4.4327 |
80
+ | 4.2394 | 0.6996 | 775 | 4.4300 |
81
+ | 4.1975 | 0.7222 | 800 | 4.4277 |
82
+ | 4.2378 | 0.7448 | 825 | 4.4279 |
83
+ | 4.078 | 0.7673 | 850 | 4.4256 |
84
+ | 4.4727 | 0.7899 | 875 | 4.4235 |
85
+ | 4.1667 | 0.8125 | 900 | 4.4224 |
86
+ | 4.4079 | 0.8350 | 925 | 4.4223 |
87
+ | 4.3179 | 0.8576 | 950 | 4.4221 |
88
+ | 4.0479 | 0.8802 | 975 | 4.4220 |
89
+ | 4.0943 | 0.9027 | 1000 | 4.4220 |
90
+
91
+
92
+ ### Framework versions
93
+
94
+ - Transformers 4.41.2
95
+ - Pytorch 2.3.0+cu121
96
+ - Datasets 2.20.0
97
+ - Tokenizers 0.19.1
final_model/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "liminerity/Bitnet-Mistral.0.2-v5",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "liminerity/Bitnet-Mistral.0.2-v3",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
final_model/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2723b55dc532e31d62acf4ec0eeada693872696fd7e62b7c462b38a2802c41e2
3
  size 128524840
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc314570a84931d72186e6dd236fbba46c68749909b5ad0826316e72549a8f2b
3
  size 128524840
final_model/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6fe39a7e59cfc34bf787359c8621875a8af28b822ffcc77eace41356553656a4
3
- size 5112
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b8375d79a12b46f0f80fef61798e6dc83ad0397be36c50baa6a2c4a13d08214
3
+ size 5176