yalhessi commited on
Commit
c233913
·
verified ·
1 Parent(s): dee96c4

End of training

Browse files
README.md CHANGED
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.2065
20
 
21
  ## Model description
22
 
@@ -35,7 +35,7 @@ More information needed
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
- - learning_rate: 0.0002
39
  - train_batch_size: 2
40
  - eval_batch_size: 2
41
  - seed: 42
@@ -52,35 +52,35 @@ The following hyperparameters were used during training:
52
 
53
  | Training Loss | Epoch | Step | Validation Loss |
54
  |:-------------:|:------:|:-----:|:---------------:|
55
- | 0.5153 | 0.2001 | 629 | 0.3804 |
56
- | 0.3737 | 0.4001 | 1258 | 0.3328 |
57
- | 0.339 | 0.6002 | 1887 | 0.3059 |
58
- | 0.3011 | 0.8003 | 2516 | 0.2860 |
59
- | 0.2889 | 1.0003 | 3145 | 0.2774 |
60
- | 0.2715 | 1.2004 | 3774 | 0.2685 |
61
- | 0.2638 | 1.4004 | 4403 | 0.2573 |
62
- | 0.2513 | 1.6005 | 5032 | 0.2510 |
63
- | 0.2493 | 1.8006 | 5661 | 0.2448 |
64
- | 0.2416 | 2.0006 | 6290 | 0.2400 |
65
- | 0.2359 | 2.2007 | 6919 | 0.2365 |
66
- | 0.2247 | 2.4008 | 7548 | 0.2334 |
67
- | 0.2204 | 2.6008 | 8177 | 0.2292 |
68
- | 0.2208 | 2.8009 | 8806 | 0.2235 |
69
- | 0.2157 | 3.0010 | 9435 | 0.2226 |
70
- | 0.1976 | 3.2010 | 10064 | 0.2208 |
71
- | 0.1991 | 3.4011 | 10693 | 0.2209 |
72
- | 0.1982 | 3.6011 | 11322 | 0.2157 |
73
- | 0.1977 | 3.8012 | 11951 | 0.2140 |
74
- | 0.1949 | 4.0013 | 12580 | 0.2121 |
75
- | 0.1821 | 4.2013 | 13209 | 0.2135 |
76
- | 0.1791 | 4.4014 | 13838 | 0.2106 |
77
- | 0.1829 | 4.6015 | 14467 | 0.2089 |
78
- | 0.177 | 4.8015 | 15096 | 0.2085 |
79
- | 0.1789 | 5.0016 | 15725 | 0.2063 |
80
- | 0.1704 | 5.2017 | 16354 | 0.2083 |
81
- | 0.1667 | 5.4017 | 16983 | 0.2074 |
82
- | 0.1641 | 5.6018 | 17612 | 0.2068 |
83
- | 0.1642 | 5.8018 | 18241 | 0.2065 |
84
 
85
 
86
  ### Framework versions
 
16
 
17
  This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.1887
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 0.001
39
  - train_batch_size: 2
40
  - eval_batch_size: 2
41
  - seed: 42
 
52
 
53
  | Training Loss | Epoch | Step | Validation Loss |
54
  |:-------------:|:------:|:-----:|:---------------:|
55
+ | 0.4557 | 0.2001 | 629 | 0.3633 |
56
+ | 0.3618 | 0.4001 | 1258 | 0.3296 |
57
+ | 0.3388 | 0.6002 | 1887 | 0.3132 |
58
+ | 0.3131 | 0.8003 | 2516 | 0.2975 |
59
+ | 0.3022 | 1.0003 | 3145 | 0.2878 |
60
+ | 0.2884 | 1.2004 | 3774 | 0.2849 |
61
+ | 0.2806 | 1.4004 | 4403 | 0.2791 |
62
+ | 0.2695 | 1.6005 | 5032 | 0.2651 |
63
+ | 0.2684 | 1.8006 | 5661 | 0.2560 |
64
+ | 0.261 | 2.0006 | 6290 | 0.2564 |
65
+ | 0.2544 | 2.2007 | 6919 | 0.2513 |
66
+ | 0.2437 | 2.4008 | 7548 | 0.2441 |
67
+ | 0.2393 | 2.6008 | 8177 | 0.2406 |
68
+ | 0.2375 | 2.8009 | 8806 | 0.2338 |
69
+ | 0.2326 | 3.0010 | 9435 | 0.2257 |
70
+ | 0.2124 | 3.2010 | 10064 | 0.2227 |
71
+ | 0.2137 | 3.4011 | 10693 | 0.2215 |
72
+ | 0.2102 | 3.6011 | 11322 | 0.2127 |
73
+ | 0.2079 | 3.8012 | 11951 | 0.2103 |
74
+ | 0.2034 | 4.0013 | 12580 | 0.2070 |
75
+ | 0.1862 | 4.2013 | 13209 | 0.2049 |
76
+ | 0.1831 | 4.4014 | 13838 | 0.2029 |
77
+ | 0.185 | 4.6015 | 14467 | 0.1987 |
78
+ | 0.1754 | 4.8015 | 15096 | 0.1975 |
79
+ | 0.1753 | 5.0016 | 15725 | 0.1937 |
80
+ | 0.1622 | 5.2017 | 16354 | 0.1959 |
81
+ | 0.155 | 5.4017 | 16983 | 0.1912 |
82
+ | 0.1501 | 5.6018 | 17612 | 0.1897 |
83
+ | 0.1481 | 5.8018 | 18241 | 0.1887 |
84
 
85
 
86
  ### Framework versions
adapter_config.json CHANGED
@@ -23,8 +23,8 @@
23
  "rank_pattern": {},
24
  "revision": null,
25
  "target_modules": [
26
- "v_proj",
27
- "q_proj"
28
  ],
29
  "task_type": "CAUSAL_LM",
30
  "use_dora": false,
 
23
  "rank_pattern": {},
24
  "revision": null,
25
  "target_modules": [
26
+ "q_proj",
27
+ "v_proj"
28
  ],
29
  "task_type": "CAUSAL_LM",
30
  "use_dora": false,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6d7fa2e6cfb334c7f6a36d74bfe3faffcf185d3fc18298385b13e0cd83641fc1
3
  size 6304096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ce2053ad077db869539c5c08a26aa44feec65cfde1c470aeb7db1903450f803
3
  size 6304096
loss_plot.png CHANGED
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ec3dd80851aa26765a820668b208523944538d3b7fc7cb8f4251e24631017bce
3
- size 5432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d20ce33783e39becb10645745809c683f93de955b233b042ba54b8c22ca97589
3
+ size 5496