See axolotl config
axolotl version: 0.5.2
base_model: mistralai/Mistral-7B-v0.1
model_type: AutoModelForCausalLM
tokenizer_config: Open-Orca/Mistral-7B-OpenOrca
tokenizer_type: AutoTokenizer
tokenizer_use_fast: false
resize_token_embeddings_to_32x: false
flash_attention: true
xformers_attention:
load_in_8bit: false
load_in_4bit: false
strict: false
chat_template: chatml
datasets:
- path: skymizer/Sonnet3.5-SlimOrcaDedupCleaned-train
type: chat_template
field_messages: messages
test_datasets:
- path: skymizer/Sonnet3.5-SlimOrcaDedupCleaned-test
type: chat_template
field_messages: messages
split: train
hf_use_auth_token: true
dataset_prepared_path: pretokenized/slim-orca
output_dir: ./exp_output_artifacts
sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
# eval_causal_lm_metrics: ["perplexity"]
wandb_project: "axolotl_mistral_sft"
wandb_entity:
wandb_watch:
wandb_name: "mistral-7B-v0.1-q-spase-v6-on-slim-orca"
wandb_log_model:
gradient_accumulation_steps: 2
micro_batch_size: 16
eval_batch_size: 1
num_epochs: 1
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.000005
warmup_ratio: 0.03
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.95
adam_eps: 0.000001
max_grad_norm: 1.0
train_on_inputs: false
group_by_length: false
bf16: true
fp16:
tf32: false
hub_model_id: "skymizer/mistral-7B-v0.1-q-sparse-v6-on-slim-orca"
save_strategy: "steps"
save_steps: 50
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
eval_steps: 50
eval_table_size:
eval_max_new_tokens: 2048
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
fsdp:
fsdp_config:
seed: 42
mistral-7B-v0.1-q-sparse-v6-on-slim-orca
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.7583
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 16
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 11
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
11.586 | 0.0026 | 1 | 11.5238 |
7.2128 | 0.1277 | 50 | 7.1436 |
6.3388 | 0.2554 | 100 | 6.1285 |
5.4596 | 0.3831 | 150 | 5.3648 |
4.4506 | 0.5109 | 200 | 4.4040 |
3.5754 | 0.6386 | 250 | 3.5615 |
3.0413 | 0.7663 | 300 | 2.9740 |
2.8014 | 0.8940 | 350 | 2.7583 |
Framework versions
- Transformers 4.46.3
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 24
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for skymizer/mistral-7B-v0.1-q-sparse-v6-on-slim-orca
Base model
mistralai/Mistral-7B-v0.1