uncensoredai
/

Mistral-Small-24B-Instruct-2501-35k

Model card Files Files and versions Community

LHC88 commited on about 21 hours ago

Commit

33c56da

·

verified ·

1 Parent(s): 9a6d420

Update README.md

Files changed (1) hide show

README.md +58 -0

README.md CHANGED Viewed

@@ -451,3 +451,61 @@ FP16:
 ```
 ollama run mistral-small:24b-instruct-2501-fp16
 ```

 ```
 ollama run mistral-small:24b-instruct-2501-fp16
 ```
+### Fine-Tuning & context expansion
+This model is an (untested) fine-funed using [unsloth](https://github.com/unslothai/unsloth)'s PEFT SFT.
+#### Datasets
+SFT was done on the following datasets:
+1. 40% of [cognitivecomputations/dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1/viewer/nonreasoning) dataset and
+2. 2% of the [fireworks-ai/long-chat](https://huggingface.co/datasets/fireworks-ai/long-chat?row=0) dataset for context expansion
+### Training configuration
+Context expansion to max. 35k with unsloth's [RoPE-](https://arxiv.org/abs/2310.05209) scaling capabilities.
+#### Chat template
+Mistral chat template format was used.
+#### PEFT settings
+1% of base model's hidden parameters resulting in
+```bash
+==((====))==  Unsloth 2025.1.8: Fast Mistral patching. Transformers: 4.48.2.
+   \\   /|    GPU: NVIDIA H200. Max memory: 139.827 GB. Platform: Linux.
+O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 9.0. CUDA Toolkit: 12.4. Triton: 3.1.0
+\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29. FA2 = False]
+ "-____-"     Free Apache license: http://github.com/unslothai/unsloth
+Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
+Unsloth: uncensoredai/Mistral-Small-24B-Instruct-2501 can only handle sequence lengths of at most 32768.
+But with kaiokendev's RoPE scaling of 1.094, it can be magically be extended to 35840!
+Loading checkpoint shards:   0%|          | 0/10 [00:00<?, ?it/s]
+Total model parameters: 13,799,674,880
+Total hidden parameters: 12,457,497,600
+Total LM Head parameters: 671,088,640
+Total Embedding parameters: 671,088,640
+Hidden Size: 5120
+# Hidden Layers: 40
+Training Fraction: 0.01
+Number of Training Parameters: 124,574,976.0
+LoRA Rank (r): 304.00
+LoRA Alpha (alpha_lora): 608.00
+...
+==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
+   \\   /|    Num examples = 64,992 | Num Epochs = 1
+O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
+\        /    Total batch size = 8 | Total steps = 8,124
+ "-____-"     Number of trainable parameters = 1,755,709,440
+```