Update README.md
Browse files
README.md
CHANGED
@@ -13,14 +13,13 @@ tags:
|
|
13 |
base_model: cstr/phi-3-orpo-v8_16
|
14 |
---
|
15 |
|
16 |
-
#
|
17 |
|
18 |
-
-
|
19 |
-
- **License:** apache-2.0
|
20 |
-
- **Finetuned from model :** cstr/phi-3-orpo-v8_16
|
21 |
|
22 |
-
|
23 |
-
|
|
|
24 |
|
25 |
| Metric |Value|
|
26 |
|---------------------------------|----:|
|
@@ -32,12 +31,12 @@ On english benchmarks:
|
|
32 |
|Winogrande (5-shot) |70.24|
|
33 |
|GSM8k (5-shot) |62.32|
|
34 |
|
35 |
-
On german EQ-Bench (v2_de) 51.82 (insignificant over 51.41
|
36 |
|
37 |
-
Note: We can improve the
|
38 |
All that was quickly done with bnb and q4 quants only, which might, in theory, affect especially such small dense models significantly.
|
39 |
But it served the intention for both proof-of-concept-experiments at least. Probably it would easily be possible to further improve results, but that what take some time and compute.
|
40 |
|
41 |
-
|
42 |
|
43 |
-
|
|
|
13 |
base_model: cstr/phi-3-orpo-v8_16
|
14 |
---
|
15 |
|
16 |
+
# Model details
|
17 |
|
18 |
+
This is a quick experiment on llamafied phi-3 with only 1000 orpo steps from an azureml translated german orca binarized-dataset (johannhartmann/mistralorpo), with original phi-3 prompt template. The immediate result is not really good, but also not bad enough to disencourage further experiments.
|
|
|
|
|
19 |
|
20 |
+
# Benchmark results
|
21 |
+
|
22 |
+
This was an experiment on a german dataset snippet which, as expected, worsened results on english benchmarks:
|
23 |
|
24 |
| Metric |Value|
|
25 |
|---------------------------------|----:|
|
|
|
31 |
|Winogrande (5-shot) |70.24|
|
32 |
|GSM8k (5-shot) |62.32|
|
33 |
|
34 |
+
On german EQ-Bench (v2_de) 51.82 (insignificant over 51.41 for original llamafied but significantly better than intermediate cstr/phi-3-orpo-v8_16 which after initial 150 test steps achieved 46.38) but with still only 164/171 correctly parsed.
|
35 |
|
36 |
+
Note: We can improve the correctness of parsing, i.a., by only a few SFT steps, as shown with cas/phi3-mini-4k-llamafied-sft-v3 (170/171 correct but with then only 39.46 score in v2_de, which was also an experiment in changing the prompt template).
|
37 |
All that was quickly done with bnb and q4 quants only, which might, in theory, affect especially such small dense models significantly.
|
38 |
But it served the intention for both proof-of-concept-experiments at least. Probably it would easily be possible to further improve results, but that what take some time and compute.
|
39 |
|
40 |
+
# Training setup
|
41 |
|
42 |
+
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|