EVA-UNIT-01
/

EVA-Qwen2.5-14B-v0.1

Model card Files Files and versions Community

Kearm commited on Oct 8, 2024

Commit

cf7ecc8

·

verified ·

1 Parent(s): 86b0e68

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ base_model:
 <p>
   <b>Version 0.1 notes:</b><br> Dataset was deduped and cleaned from version 0.0, sequence length was also increased. Resulting model seems to be stabler, and 0.0 problems with handling short inputs and min_p sampling seem to be gone.<br>
-  This version seems to be more or less optimal for the current data. It (again) started crashing on each checkpoint after some point, but it was less of a problem this time, as eval/loss already flatlined by that time. This is epoch 2.7 checkpoint.
 </p>
 <p>Note: using quantized KV cache with Qwen2.5 <b>is not recommended</b> and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.</p>

 <p>
   <b>Version 0.1 notes:</b><br> Dataset was deduped and cleaned from version 0.0, sequence length was also increased. Resulting model seems to be stabler, and 0.0 problems with handling short inputs and min_p sampling seem to be gone.<br>
+  This version seems to be more or less optimal for the current data and available compute.
 </p>
 <p>Note: using quantized KV cache with Qwen2.5 <b>is not recommended</b> and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.</p>