Safetensors
Transformers
vllm
LHC88 commited on
Commit
33c56da
·
verified ·
1 Parent(s): 9a6d420

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -451,3 +451,61 @@ FP16:
451
  ```
452
  ollama run mistral-small:24b-instruct-2501-fp16
453
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
451
  ```
452
  ollama run mistral-small:24b-instruct-2501-fp16
453
  ```
454
+
455
+ ### Fine-Tuning & context expansion
456
+
457
+ This model is an (untested) fine-funed using [unsloth](https://github.com/unslothai/unsloth)'s PEFT SFT.
458
+
459
+ #### Datasets
460
+ SFT was done on the following datasets:
461
+
462
+ 1. 40% of [cognitivecomputations/dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1/viewer/nonreasoning) dataset and
463
+ 2. 2% of the [fireworks-ai/long-chat](https://huggingface.co/datasets/fireworks-ai/long-chat?row=0) dataset for context expansion
464
+
465
+ ### Training configuration
466
+ Context expansion to max. 35k with unsloth's [RoPE-](https://arxiv.org/abs/2310.05209) scaling capabilities.
467
+
468
+ #### Chat template
469
+ Mistral chat template format was used.
470
+
471
+ #### PEFT settings
472
+
473
+ 1% of base model's hidden parameters resulting in
474
+
475
+ ```bash
476
+ ==((====))== Unsloth 2025.1.8: Fast Mistral patching. Transformers: 4.48.2.
477
+ \\ /| GPU: NVIDIA H200. Max memory: 139.827 GB. Platform: Linux.
478
+ O^O/ \_/ \ Torch: 2.5.1+cu124. CUDA: 9.0. CUDA Toolkit: 12.4. Triton: 3.1.0
479
+ \ / Bfloat16 = TRUE. FA [Xformers = 0.0.29. FA2 = False]
480
+ "-____-" Free Apache license: http://github.com/unslothai/unsloth
481
+ Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
482
+
483
+ Unsloth: uncensoredai/Mistral-Small-24B-Instruct-2501 can only handle sequence lengths of at most 32768.
484
+ But with kaiokendev's RoPE scaling of 1.094, it can be magically be extended to 35840!
485
+
486
+ Loading checkpoint shards: 0%| | 0/10 [00:00<?, ?it/s]
487
+
488
+
489
+ Total model parameters: 13,799,674,880
490
+
491
+ Total hidden parameters: 12,457,497,600
492
+
493
+ Total LM Head parameters: 671,088,640
494
+
495
+ Total Embedding parameters: 671,088,640
496
+ Hidden Size: 5120
497
+ # Hidden Layers: 40
498
+ Training Fraction: 0.01
499
+
500
+ Number of Training Parameters: 124,574,976.0
501
+ LoRA Rank (r): 304.00
502
+ LoRA Alpha (alpha_lora): 608.00
503
+ ...
504
+
505
+ ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
506
+ \\ /| Num examples = 64,992 | Num Epochs = 1
507
+ O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
508
+ \ / Total batch size = 8 | Total steps = 8,124
509
+ "-____-" Number of trainable parameters = 1,755,709,440
510
+ ```
511
+