Kearm commited on
Commit
cf7ecc8
·
verified ·
1 Parent(s): 86b0e68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ base_model:
24
 
25
  <p>
26
  <b>Version 0.1 notes:</b><br> Dataset was deduped and cleaned from version 0.0, sequence length was also increased. Resulting model seems to be stabler, and 0.0 problems with handling short inputs and min_p sampling seem to be gone.<br>
27
- This version seems to be more or less optimal for the current data. It (again) started crashing on each checkpoint after some point, but it was less of a problem this time, as eval/loss already flatlined by that time. This is epoch 2.7 checkpoint.
28
  </p>
29
 
30
  <p>Note: using quantized KV cache with Qwen2.5 <b>is not recommended</b> and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.</p>
 
24
 
25
  <p>
26
  <b>Version 0.1 notes:</b><br> Dataset was deduped and cleaned from version 0.0, sequence length was also increased. Resulting model seems to be stabler, and 0.0 problems with handling short inputs and min_p sampling seem to be gone.<br>
27
+ This version seems to be more or less optimal for the current data and available compute.
28
  </p>
29
 
30
  <p>Note: using quantized KV cache with Qwen2.5 <b>is not recommended</b> and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.</p>